Re: solr+hadoop = next solr

2007-06-06 Thread Yonik Seeley

On 6/6/07, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:

In terms of the FederatedSearch wiki entry (updated last year), has there
been any progress made this year on this topic, at least something worthy of
being added or updated to the wiki page?


Priorities shifted, and I dropped it for a while.
I recently started working with a CNET group that may need it, so I
could start working on it again in the next few months.  Don't wait
for me if you have ideas though... I'll try to follow along and chime
in.

-Yonik


Re: solr+hadoop = next solr

2007-06-06 Thread James liu

2007/6/7, Yonik Seeley <[EMAIL PROTECTED]>:


On 6/6/07, James liu <[EMAIL PROTECTED]> wrote:
> anyone agree?

No ;-)

At least not if you mean using map-reduce for queries.

When I started looking at distributed search, I immediately went and
read the map-reduce paper (easier concept than it first appeared), and
realized it's really more for the indexing side of things (big batch
jobs, making data from data, etc).  Nutch uses map reduce for
crawling/indexing, but not for querying.



Yes, nutch use map reduce only for crawling/indexing, not for querying.


http://www.nabble.com/something-i-think-important-and-should-be-added-tf3813838.html#a10796136

map-reduce just for indexing to decrease "Master solr query *instance" *index
size and increase query speed.

It will cost many time to index and merge but it will increase query
accuracy.

index and data not in same box. so we just only sure master query server
hardware is powerful and
slave query server hardware is not very important.

Master index server should support multi index.

If solr support it.

I think user who use solr will quick setup their search.


It just my thought.

how do u think, yonik,,,and how do u think next solr?


-Yonik






--
regards
jl


Re: solr+hadoop = next solr

2007-06-06 Thread Jeff Rodenburg

I've been exploring distributed search, as of late.  I don't know about the
"next solr" but I could certainly see a "distributed solr" grow out of such
an expansion.

In terms of the FederatedSearch wiki entry (updated last year), has there
been any progress made this year on this topic, at least something worthy of
being added or updated to the wiki page?  Not to splinter efforts here, but
maybe a working group that was focused on that topic could help to move
things forward a bit.

- j

On 6/6/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:


On 6/6/07, James liu <[EMAIL PROTECTED]> wrote:
> anyone agree?

No ;-)

At least not if you mean using map-reduce for queries.

When I started looking at distributed search, I immediately went and
read the map-reduce paper (easier concept than it first appeared), and
realized it's really more for the indexing side of things (big batch
jobs, making data from data, etc).  Nutch uses map reduce for
crawling/indexing, but not for querying.

-Yonik



Re: Wildcard not working as expected?

2007-06-06 Thread Yonik Seeley

On 6/6/07, Nigel McNie <[EMAIL PROTECTED]> wrote:

I'm having trouble using a * wildcard after a term in a search. It does not
seem to match "0 or more", but rather "something more, as long as it's not
nothing". This is using the standard query handler, by the way.

Examples:

Search for theatr* => returns 112 results, for things named 'theatre'
Search for theatre* => returns 0 results

Anyone know why this would be?


My guess would be stemming.
The indexed form of theatre is probably theatr after it goes through
the porter stemmer.

Perhaps ou could index another variant of the field (via copyField)
that just splits on whitespace and lowercases.

-Yonik


Re: solr+hadoop = next solr

2007-06-06 Thread Yonik Seeley

On 6/6/07, James liu <[EMAIL PROTECTED]> wrote:

anyone agree?


No ;-)

At least not if you mean using map-reduce for queries.

When I started looking at distributed search, I immediately went and
read the map-reduce paper (easier concept than it first appeared), and
realized it's really more for the indexing side of things (big batch
jobs, making data from data, etc).  Nutch uses map reduce for
crawling/indexing, but not for querying.

-Yonik


solr+hadoop = next solr

2007-06-06 Thread James liu

anyone agree?

Next solr's development 's plan is? anyone know?


--
regards
jl


Wildcard not working as expected?

2007-06-06 Thread Nigel McNie
Hi,

I'm having trouble using a * wildcard after a term in a search. It does not
seem to match "0 or more", but rather "something more, as long as it's not
nothing". This is using the standard query handler, by the way.

Examples:

Search for theatr* => returns 112 results, for things named 'theatre'
Search for theatre* => returns 0 results

Anyone know why this would be?

-- 
Regards,
Nigel McNie
Catalyst IT Ltd.
DDI: +64 4 803 2203


signature.asc
Description: Digital signature


Re: SOLVED Re: custom writer, working but... a strange exception in logs

2007-06-06 Thread Erik Hatcher


On Jun 6, 2007, at 5:32 PM, Chris Hostetter wrote:



: It's the favicon.ico effect.
: Nothing in logs when the class is resquested from curl, but with a
: browser (here Opera), begin a response with , and it  
requests for

: favicon.ico.

HA HA HA HA that's freaking hilarious.

One way to avoid that might be to register a NOOP request handler  
with the

name "/favicon.ico"


:)  maybe we should build one in that redirects to a solr.ico or  
something.





RE: Where to put my plugins?

2007-06-06 Thread Teruhiko Kurosaka
Never mind.  My mistake.  I still had a copy of the jar in ext dir.
After cleaning it up, it's now loading my plugin.

THANK YOU VERY MUCH!

> -Original Message-
> From: Teruhiko Kurosaka [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, June 06, 2007 5:58 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Where to put my plugins?
> 
> Ryan,
> Thank you.
> 
> But creating lib under example/solr and placing my plugin jar 
> there yielded the same error of not able to locate 
> org/apache/solr/analysis/BaseTokenizerFactory
> 
> How can this be
> -kuro  
> 
> 


RE: Where to put my plugins?

2007-06-06 Thread Teruhiko Kurosaka
Ryan,
Thank you.

But creating lib under example/solr and placing
my plugin jar there yielded the same error of 
not able to locate 
org/apache/solr/analysis/BaseTokenizerFactory

How can this be
-kuro  



Re: Where to put my plugins?

2007-06-06 Thread Ryan McKinley

If the example is in:
C:\workspace\solr\example

Try putting you custom .jar in:
C:\workspace\solr\example\solr\lib

Check the README in solr home:
C:\workspace\solr\example\solr\README.txt

 This directory is optional.  If it exists, Solr will load any Jars
 found in this directory and use them to resolve any "plugins"
 specified in your solrconfig.xml or schema.xml (ie: Analyzers,
 Request Handlers, etc...)



Teruhiko Kurosaka wrote:

I made a plugin that has a Tokenizer, its Factory, a
Filter and its Factory.  I modified example/solr/conf/schema.xml
to use these Factories.

Following
http://wiki.apache.org/solr/SolrPlugins

I placed the plugin jar in the top level lib and ran
the start.jar.  I got:
org.mortbay.util.MultiException[org.apache.solr.core.SolrException:
Error loading class 'com.basistech.rlp.solr.RLPTokenizerFactory']

Clearly, Jetty cannot locate my plugin.

I put the jar in example/lib and got the same error.

After taking look at jetty document:
http://docs.codehaus.org/display/JETTY/Classloading
but not fully understanding it, I put the plugin jar
in example/ext.  Then I got:

java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.mortbay.start.Main.invokeMain(Main.java:151)
at org.mortbay.start.Main.start(Main.java:476)
at org.mortbay.start.Main.main(Main.java:94)
Caused by: java.lang.NoClassDefFoundError:
org/apache/solr/analysis/BaseTokenizerFactory


Better, Jetty can find my plugin, but it cannot load
one of the the Solr classes for it?

I also tried "the old way" described in the first doc,
expanding the war file, 
but the result was same as above. (Can't find

org/apache/solr/analysis/BaseTokenizerFactory)


Where am I supposed to put my Tokenizer/Filter plugin?

-kuro





RE: Where to put my plugins?

2007-06-06 Thread Teruhiko Kurosaka
This is about Solr 1.1.0 running on Win XP w/JDK 1.5.
Thank you.

> -Original Message-
> From: Teruhiko Kurosaka [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, June 06, 2007 5:32 PM
> To: solr-user@lucene.apache.org
> Subject: Where to put my plugins?
> 
> I made a plugin that has a Tokenizer, its Factory, a Filter 
> and its Factory.  I modified example/solr/conf/schema.xml to 
> use these Factories.
> 
> Following
> http://wiki.apache.org/solr/SolrPlugins
> 
> I placed the plugin jar in the top level lib and ran the 
> start.jar.  I got:
> org.mortbay.util.MultiException[org.apache.solr.core.SolrException:
> Error loading class 'com.basistech.rlp.solr.RLPTokenizerFactory']
> 
> Clearly, Jetty cannot locate my plugin.
> 
> I put the jar in example/lib and got the same error.
> 
> After taking look at jetty document:
> http://docs.codehaus.org/display/JETTY/Classloading
> but not fully understanding it, I put the plugin jar in 
> example/ext.  Then I got:
> 
> java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccess
> orImpl.jav
> a:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMeth
> odAccessor
> Impl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:585)
> at org.mortbay.start.Main.invokeMain(Main.java:151)
> at org.mortbay.start.Main.start(Main.java:476)
> at org.mortbay.start.Main.main(Main.java:94)
> Caused by: java.lang.NoClassDefFoundError:
> org/apache/solr/analysis/BaseTokenizerFactory
> 
> 
> Better, Jetty can find my plugin, but it cannot load one of 
> the the Solr classes for it?
> 
> I also tried "the old way" described in the first doc, 
> expanding the war file, but the result was same as above. (Can't find
> org/apache/solr/analysis/BaseTokenizerFactory)
> 
> 
> Where am I supposed to put my Tokenizer/Filter plugin?


Where to put my plugins?

2007-06-06 Thread Teruhiko Kurosaka
I made a plugin that has a Tokenizer, its Factory, a
Filter and its Factory.  I modified example/solr/conf/schema.xml
to use these Factories.

Following
http://wiki.apache.org/solr/SolrPlugins

I placed the plugin jar in the top level lib and ran
the start.jar.  I got:
org.mortbay.util.MultiException[org.apache.solr.core.SolrException:
Error loading class 'com.basistech.rlp.solr.RLPTokenizerFactory']

Clearly, Jetty cannot locate my plugin.

I put the jar in example/lib and got the same error.

After taking look at jetty document:
http://docs.codehaus.org/display/JETTY/Classloading
but not fully understanding it, I put the plugin jar
in example/ext.  Then I got:

java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.mortbay.start.Main.invokeMain(Main.java:151)
at org.mortbay.start.Main.start(Main.java:476)
at org.mortbay.start.Main.main(Main.java:94)
Caused by: java.lang.NoClassDefFoundError:
org/apache/solr/analysis/BaseTokenizerFactory


Better, Jetty can find my plugin, but it cannot load
one of the the Solr classes for it?

I also tried "the old way" described in the first doc,
expanding the war file, 
but the result was same as above. (Can't find
org/apache/solr/analysis/BaseTokenizerFactory)


Where am I supposed to put my Tokenizer/Filter plugin?

-kuro


RE: tomcat context fragment

2007-06-06 Thread Chris Hostetter
: Here is what I found on the Apache site about this:

...i think you are refering to...

http://tomcat.apache.org/tomcat-5.0-doc/config/context.html

..correct?  it definitely seems to be something that was changed in 5.5.
Note the added sentence in the 5.5 docs...

   "The value of this field must not be set except when statically
defining a Context in server.xml, as it will be inferred from the
filenames used for either the .xml context file or the docBase."

I've updated the wiki with a small note about this...

http://wiki.apache.org/solr/SolrTomcat


-Hoss



Re: Wildcards / Binary searches

2007-06-06 Thread J.J. Larrea
Hi, Hoss.

I have a number of things I'd like to post... but the generally-useful stuff is 
unfortunately a bit interwoven with the special-case stuff, and I need to get 
out of breathing-down-my-back deadline mode to find the time to separate them, 
clean up and comment, make test cases, etc.  Hopefully next week I can post at 
least a modest contribution including this.

- J.J.

At 11:31 AM -0700 6/6/07, Chris Hostetter wrote:
>: I have a local version of DisMax which parameterizes the escaping so
>: certain operators can be allowed through, which I'd be happy to
>: contribute to you or the codebase, but I expect SimpleRH may be a better
>
>That sounds like it would be a really usefull patch if you be interested
>in posting it to Jira.
>
>
>
>-Hoss



Re: SOLVED Re: custom writer, working but... a strange exception in logs

2007-06-06 Thread Chris Hostetter

: It's the favicon.ico effect.
: Nothing in logs when the class is resquested from curl, but with a
: browser (here Opera), begin a response with , and it requests for
: favicon.ico.

HA HA HA HA that's freaking hilarious.

One way to avoid that might be to register a NOOP request handler with the
name "/favicon.ico"



-Hoss



Re: SOLVED Re: custom writer, working but... a strange exception in logs

2007-06-06 Thread Frédéric Glorieux

Frédéric Glorieux a écrit :


 > I'm baffled.

[Yonic]
 > I don't know why that would be... what is the client sending the 
request?

 > If it gets an error, does it retry or something?

Good !


Nothing in logs when the class is resquested from curl, 


Sorry, same idea, but it's a CSS link.

--
Frédéric Glorieux
École nationale des chartes
direction des nouvelles technologies et de l'informatique


SOLVED Re: custom writer, working but... a strange exception in logs

2007-06-06 Thread Frédéric Glorieux


> I'm baffled.

[Yonic]
> I don't know why that would be... what is the client sending the request?
> If it gets an error, does it retry or something?

Good !
It's the favicon.ico effect.
Nothing in logs when the class is resquested from curl, but with a 
browser (here Opera), begin a response with , and it requests for 
favicon.ico.




--
Frédéric Glorieux
École nationale des chartes
direction des nouvelles technologies et de l'informatique


RE: tomcat context fragment

2007-06-06 Thread Park, Michael
Hi Chris,

No.  I set up a separate file, same as the wiki.  

It's either a tomcat version issue or a difference between how tomcat on
my Win laptop is configured vs. the configuration on our tomcat Unix
machine. 

I intend to run multiple instances of solr in production and wanted to
use the context fragments.

I have 3 test instances of solr running now (with 3 context files) and
found that whatever you set the path attribute to becomes the name of
the deployed web app (it doesn't have to match the name of the context
file, but cleaner to keep the names the same).

Here is what I found on the Apache site about this:
"The context path of this web application, which is matched
against
the beginning of each request URI to select the appropriate web
application for processing. All of the context paths within a
particular Host must be unique. If you specify a context path of an
empty string (""), you are defining the default web application for
this Host, which will process all requests not assigned to other
Contexts."

~Mike

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, June 06, 2007 2:53 PM
To: solr-user@lucene.apache.org
Subject: RE: tomcat context fragment

: I've found the problem.
:
: The Context attribute path needed to be set:
:
: 

RE: tomcat context fragment

2007-06-06 Thread Chris Hostetter
: I've found the problem.
:
: The Context attribute path needed to be set:
:
: 

Re[2]: Multiple doc types in schema

2007-06-06 Thread Chris Hostetter

Ah  i was missunderstanding your goal of "doctypes" ... the use case i
was thinking is that you have "book" documents and "movie" documents
and you frequently only query on one type of the other but sometime you do
a generic query on all of them using the fields they have in common.

this is clearly not the situation you are describing, since you suggest
storing them in completley seperate indexes that can be blown away
independently.

there is a patch in Jira to support multiple SolrCore's in a single JVM
"context" ... as i understand it this would achieve your goal (but i
havne't really had a chance to look at it so i can't really speak to it.

in general, running multiple Solr isnt'ances is actaully wuite easy and
not as bad as you make it out to be ... the overhead of running multiple
Solr webapp instances in a single JVM doesn't really take up that much
more memory or CPU ... yes the classes are all loaded twice, but that
typically pales in comparison to the amount of data involved in your index
(unelss you've got hundrads of tiny indexes or something like that)

: - more difficult to maintain the index. If I want to delete
:   all docs of a doc type, I can use deletet by query but it's
:   always easier to wipe out the whole index directory if doctypes
:   are kept separate but maintained by the same solr instance.
:   I can run two separate solr instances to achieve this then this
:   takes more memory/CPU/maintaince effort.
:
: One schema file with doctypes defined, and separate index directories
: would be perfect, in my opinion :) or even separate schema files :)

-Hoss



RE: question about highlight field

2007-06-06 Thread Chris Hostetter

: Yes, I'm using 1.1. The example in my last email is an expected result,
: not the real result. Indeed I didn't see the arr element in the
: highlighting element when either prefix wildcard or true wildcard query

Hmmm... yes, i'm sorry i wasn't thinking clearly -- that makes sense since
in 1.1 the queries weren't being rewritten at all and so extractTerms
wouldn't work.



-Hoss



Re: Wildcards / Binary searches

2007-06-06 Thread Chris Hostetter

Side Note: It's my opinion that "type ahead" or "auto complete' style
functionality is best addressed by customized logic (most likely using
specially built fields containing all of the prefixes of the key words up
to N characters as seperate tokens).  simple uses of PrefixQueries are
only going ot get you so far particularly under heavy load or in an index
with a large number of unique terms.


: If I can alter this I think sorted.. what's idf and docFreq?

people who really want to get into the nitty gritty of scoring should
really familiarize themselves with the details of the Lucene scoring
mechanisms...

   http://lucene.apache.org/java/docs/scoring.html

(this is linked to from the question "How are documents scored" in the
SolrRelevancyFAQ .. any edits from users to improve this FAQ would be
greatly appreciated:  http://wiki.apache.org/solr/SolrRelevancyFAQ  )

NOTE: in a "type ahead" style situation, you may actaully want an IDF
function that's the inverse of typical search usages (which i guess would
make it just a "DF" function) since unique terms really aren't "better" in
this usecase.


-Hoss



Re: Wildcards / Binary searches

2007-06-06 Thread Chris Hostetter

: I have a local version of DisMax which parameterizes the escaping so
: certain operators can be allowed through, which I'd be happy to
: contribute to you or the codebase, but I expect SimpleRH may be a better

That sounds like it would be a really usefull patch if you be interested
in posting it to Jira.



-Hoss



Re: custom writer, working but... a strange exception in logs

2007-06-06 Thread Chris Hostetter

I'm baffled.

Would it be possible for you to send a scaled down (but compilable)
version of your response writer that demonstrates the problem, along with
a snippet that can be added to the example solrconfig.xml to register it
and and example request URL that triggers the problem?

that way we can all try ait and see if we can reproduce your results (for
all we know, it may be an artifact of your debugger)



-Hoss



RE: tomcat context fragment

2007-06-06 Thread Park, Michael
I've found the problem. 

The Context attribute path needed to be set:


   



-Original Message-
From: Park, Michael [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, June 05, 2007 5:28 PM
To: solr-user@lucene.apache.org
Subject: tomcat context fragment

Hello All,

 

I've been working with solr on Tomcat 5.5/Windows and had success
setting my solr home using the context fragment.  However, I cannot get
it to work on Tomcat 5.028/Unix.  I've read and re-read the Apache
Tomcat documentation and cannot find a solution.  Has anyone run into
this issue?  Is there some Tomcat setting that is preventing this from
working?

 

Thanks,

Mike



Re: post.jar is absent in Solr distribution

2007-06-06 Thread Chris Hostetter

: I am an absolute noob to solr and I am trying out the Solr tutorial
: present at http://lucene.apache.org/solr/tutorial.html

there is a big blurb on that tutorial URL attempting to point out that it
is for a nightly release (version "1.1.2007.05.29.12.05.29") and that you
should refer to the version of the tutorial that was distributed with the
release you are using.


-Hoss



Re: Wildcards / Binary searches

2007-06-06 Thread galo

Ok further to my email below i've been testing with q=radioh?*

Basically the problem is, searching artists even with Radiohead having a 
big boost, it's returning stuff with less boost before like 
"Radiohead+Ani Di Franco" or "Radiohead+Michael Stipe"


The debug output is below, but basically, for Radiohead and one of the 
others we get this:


radiohead+ani - 655391.5  * 0.046359334
radiohead - 1150991.9 * 0.025442434

So it's fairly clear where is the difference. Looking at the numbers, 
the cause seems to be in this line:


8.781371 = idf(docFreq=4096)

While Radiohead+Ani is getting

16.000769 = idf(docFreq=2)

If I can alter this I think sorted.. what's idf and docFreq?


  
30383.514 = (MATCH) sum of:
  30383.514 = (MATCH) weight(text:radiohead+ani in 159496), product of:
0.046359334 = queryWeight(text:radiohead+ani), product of:
  16.000769 = idf(docFreq=2)
  0.0028973192 = queryNorm
655391.5 = (MATCH) fieldWeight(text:radiohead+ani in 159496), 
product of:

  1.0 = tf(termFreq(text:radiohead+ani)=1)
  16.000769 = idf(docFreq=2)
  40960.0 = fieldNorm(field=text, doc=159496)

  
29284.035 = (MATCH) sum of:
  29284.035 = (MATCH) weight(text:radiohead in 9799640), product of:
0.025442434 = queryWeight(text:radiohead), product of:
  8.781371 = idf(docFreq=4096)
  0.0028973192 = queryNorm
1150991.9 = (MATCH) fieldWeight(text:radiohead in 9799640), product of:
  1.0 = tf(termFreq(text:radiohead)=1)
  8.781371 = idf(docFreq=4096)
  131072.0 = fieldNorm(field=text, doc=9799640)


Thanks a lot,

galo


galo wrote:
I was doing a different trick, basically searching q=radioh*+radioh~, 
and the results are slightly better than ?*, but not great. By the way, 
the case sensitiveness of wildcards affects here of course.


I'd like to have a look to that DisMax you have if you can post it, at 
least to compare results. The way I get to do scoring as I say is far 
from perfect.


By the way, I'm seeing the highlighting dissapears when using these 
wildcards, is that normal??


Thanks for your help,

galo


At 4:40 PM +0100 6/6/07, galo wrote:
 >1. I want to use solr for some sort of live search, querying with 
incomplete terms + wildcard and getting any similar results. Radioh* 
would return anything containing that string. The DisMax req. hander 
doesn't accept wildcards in the q param so i'm trying the simple one 
and still have problems as all my results are coming back with score = 
1 and I need them sorted by relevance.. Is there a way of doing this? 
Why doesn't * work in dismax (nor ~ by the way)??


DisMax was written with the intent of supporting a simple search box 
in which one could type or paste some text, e.g. a title like


Santa Clause: Is he Real (and if so, what is "real")?

and get meaningful results.  To do that it pre-processes the query 
string by removing unbalanced quotation marks and escaping characters 
that would otherwise be treated by the query parser as operators:


\ ! ( ) : ^ [ ] { } ~ * ?

I have a local version of DisMax which parameterizes the escaping so 
certain operators can be allowed through, which I'd be happy to 
contribute to you or the codebase, but I expect SimpleRH may be a 
better tool for your application than DisMaxRH, as long as you get it 
to score as you wish.


Both Standard and DisMax request handlers use SolrQueryParser, an 
extension of the Lucene query parser which introduces a small number 
of changes, one of which is that prefix queries e.g. Radioh* are 
evaluated with ConstantScorePrefixQuery rather than the standard 
PrefixQuery.


In issue SOLR-218 developers have been discussing per-field control of 
query parser options (some of it Solr's, some of it Lucene's).  When 
that is implemented there should additionally be a property 
useConstantScorePrefixQuery analogous to the unfortunately-named 
QueryParser useOldRangeQuery, but handled by SolrQueryParser (until 
CSPQs are implemented as an option in Lucene QP).


Until that time, well, Chris H. posted a clever and rather timely 
workaround on the solr-dev list:


 >one work arround people may want to consider ... is to force the use 
of a WildCardQuery in what would otherwise be interpreted as a 
PrefixQuery by putting a "?" before the "*"

 >
 >ie: auto?* instead of auto*
 >
 >(yes, this does require that at least one character follow the prefix)

Perhaps that would help in your case?

- J.J.










Re[2]: Multiple doc types in schema

2007-06-06 Thread Jack L
Hello Chris,

Thanks for the reply. I understand that a mixed-type index will work
just fine. Just to bring up a topic for discussion/new features though:
there seem to be downsides of not having a doctype:

- name space conflict when two doctypes are not related. In
  this case the developer will have to be careful with names

- more difficult to maintain the index. If I want to delete
  all docs of a doc type, I can use deletet by query but it's
  always easier to wipe out the whole index directory if doctypes
  are kept separate but maintained by the same solr instance.
  I can run two separate solr instances to achieve this then this
  takes more memory/CPU/maintaince effort.

One schema file with doctypes defined, and separate index directories
would be perfect, in my opinion :) or even separate schema files :)

-- 
Best regards,
Jack

Tuesday, June 5, 2007, 9:58:10 PM, you wrote:


> : This is based on my understanding that solr/lucene does not
> : have the concept of document type. It only sees fields.
> :
> : Is my understanding correct?

> it is.

> : It seems a bit unclean to mix fields of all document types
> : in the same schema though. Or, is there a way to allow multiple
> : document types in the schema, and specify what type to use
> : when indexing and searching?

> it's really just an issue of semantics ... the schema.xml is where you
> list all of the fields you need in your index, any notion of doctype is
> entire artificial ... you could group all of the
> fields relating to doctypeA in one section of the schema.xml, then have a
> big  line and then list the fields in doctypeB, etc... but
> wat if there are fields you use in both "doctypes" ? .. how much you "mix"
> them is entirely up to you.



> -Hoss



Re: custom writer, working but... a strange exception in logs

2007-06-06 Thread Frédéric Glorieux


Thanks for answer,

I'm feeling less guilty.

> I don't see a non-null default for HEAD/FOOT... perhaps
> do   if (HEAD!=null) writer.write(HEAD);
> There may be an issue with how you register in solrconfig.xml

I get every thing I want from solrconfig.xml, I was suspecting some 
classloader mystery. Following your advice from another post, I will 
write a specific request Handler, so it would be easier to trace the 
problem, with a very simple first solution, stop sending exception (to 
avoid gigabytes of logs).


--
Frédéric Glorieux
École nationale des chartes
direction des nouvelles technologies et de l'informatique


Re: Wildcards / Binary searches

2007-06-06 Thread galo

Yeah i thought of that solution but this is a 20G index with each
document having around 300 or those numbers so i was a bit worried about
the performance.. I'll try anyway, thanks!

On 06/06/07, *Yonik Seeley* <[EMAIL PROTECTED] > 
wrote:


On 6/6/07, galo <[EMAIL PROTECTED] > wrote:
>  3. I'm trying to implement another index where I store a number of
int
>  values for each document. Everything works ok as integers but i'd
like
>  to have some sort of fuzzy searches based on the bit representation of
>  the numbers. Essentially, this number:
>
>  1001001010100
>
>  would be compared to these two
>
>  1011001010100
>  1001001010111
>
>  And the first would get a bigger score than the second, as it has
only 1
>  flipped bit while the second has 2.

You could store the numbers as a string field with the binary
representation,
then try a fuzzy search.

  myfield:1001001010100~

-Yonik






Re: Wildcards / Binary searches

2007-06-06 Thread J.J. Larrea
At 4:40 PM +0100 6/6/07, galo wrote:
>1. I want to use solr for some sort of live search, querying with incomplete 
>terms + wildcard and getting any similar results. Radioh* would return 
>anything containing that string. The DisMax req. hander doesn't accept 
>wildcards in the q param so i'm trying the simple one and still have problems 
>as all my results are coming back with score = 1 and I need them sorted by 
>relevance.. Is there a way of doing this? Why doesn't * work in dismax (nor ~ 
>by the way)??

DisMax was written with the intent of supporting a simple search box in which 
one could type or paste some text, e.g. a title like

Santa Clause: Is he Real (and if so, what is "real")?

and get meaningful results.  To do that it pre-processes the query string by 
removing unbalanced quotation marks and escaping characters that would 
otherwise be treated by the query parser as operators:

\ ! ( ) : ^ [ ] { } ~ * ?

I have a local version of DisMax which parameterizes the escaping so certain 
operators can be allowed through, which I'd be happy to contribute to you or 
the codebase, but I expect SimpleRH may be a better tool for your application 
than DisMaxRH, as long as you get it to score as you wish.

Both Standard and DisMax request handlers use SolrQueryParser, an extension of 
the Lucene query parser which introduces a small number of changes, one of 
which is that prefix queries e.g. Radioh* are evaluated with 
ConstantScorePrefixQuery rather than the standard PrefixQuery.

In issue SOLR-218 developers have been discussing per-field control of query 
parser options (some of it Solr's, some of it Lucene's).  When that is 
implemented there should additionally be a property useConstantScorePrefixQuery 
analogous to the unfortunately-named QueryParser useOldRangeQuery, but handled 
by SolrQueryParser (until CSPQs are implemented as an option in Lucene QP).

Until that time, well, Chris H. posted a clever and rather timely workaround on 
the solr-dev list:

>one work arround people may want to consider ... is to force the use of a 
>WildCardQuery in what would otherwise be interpreted as a PrefixQuery by 
>putting a "?" before the "*"
>
>ie: auto?* instead of auto*
>
>(yes, this does require that at least one character follow the prefix)

Perhaps that would help in your case?

- J.J.



Re: Wildcards / Binary searches

2007-06-06 Thread Yonik Seeley

On 6/6/07, galo <[EMAIL PROTECTED]> wrote:

3. I'm trying to implement another index where I store a number of int
values for each document. Everything works ok as integers but i'd like
to have some sort of fuzzy searches based on the bit representation of
the numbers. Essentially, this number:

1001001010100

would be compared to these two

1011001010100
1001001010111

And the first would get a bigger score than the second, as it has only 1
flipped bit while the second has 2.


You could store the numbers as a string field with the binary representation,
then try a fuzzy search.

 myfield:1001001010100~

-Yonik


RE: Wildcards / Binary searches

2007-06-06 Thread Xuesong Luo
I have a similar question about dismax, here is what Chris said:

the dismax handler uses a much more simplified query syntax then the
standard request handler.  Only +, -, and " are special characters so
wildcards are not supported.


HTH

-Original Message-
From: galo [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, June 06, 2007 8:41 AM
To: solr-user@lucene.apache.org
Subject: Wildcards / Binary searches

Hi,

Three questions:

1. I want to use solr for some sort of live search, querying with 
incomplete terms + wildcard and getting any similar results. Radioh* 
would return anything containing that string. The DisMax req. hander 
doesn't accept wildcards in the q param so i'm trying the simple one and

still have problems as all my results are coming back with score = 1 and

I need them sorted by relevance.. Is there a way of doing this? Why 
doesn't * work in dismax (nor ~ by the way)??

2. What do the phrase slop params do?

3. I'm trying to implement another index where I store a number of int 
values for each document. Everything works ok as integers but i'd like 
to have some sort of fuzzy searches based on the bit representation of 
the numbers. Essentially, this number:

1001001010100

would be compared to these two

1011001010100
1001001010111

And the first would get a bigger score than the second, as it has only 1

flipped bit while the second has 2.

Is it possible to implement this in solr?


Cheers,
galo




Wildcards / Binary searches

2007-06-06 Thread galo

Hi,

Three questions:

1. I want to use solr for some sort of live search, querying with 
incomplete terms + wildcard and getting any similar results. Radioh* 
would return anything containing that string. The DisMax req. hander 
doesn't accept wildcards in the q param so i'm trying the simple one and 
still have problems as all my results are coming back with score = 1 and 
I need them sorted by relevance.. Is there a way of doing this? Why 
doesn't * work in dismax (nor ~ by the way)??


2. What do the phrase slop params do?

3. I'm trying to implement another index where I store a number of int 
values for each document. Everything works ok as integers but i'd like 
to have some sort of fuzzy searches based on the bit representation of 
the numbers. Essentially, this number:


1001001010100

would be compared to these two

1011001010100
1001001010111

And the first would get a bigger score than the second, as it has only 1 
flipped bit while the second has 2.


Is it possible to implement this in solr?


Cheers,
galo



Re: custom writer, working but... a strange exception in logs

2007-06-06 Thread Yonik Seeley

On 6/6/07, Frédéric Glorieux <[EMAIL PROTECTED]> wrote:

I can't figure why, but when writer.write(HEAD) is executed, I see code
from StandardRequestHandler executed 2 times in the debugger, first is
OK, second hasn't the q parameter.


I don't know why that would be... what is the client sending the request?
If it gets an error, does it retry or something?


Displaying results is always OK.
Without such lines, there is only one call to StandardRequestHandler, no
exception in log, but no more head or foot. When HEAD and FOOT values
are hard coded and not configured, there's no exception. If HEAD and
FOOT are not static, problem is the same.


I don't see a non-null default for HEAD/FOOT... perhaps
do   if (HEAD!=null) writer.write(HEAD);
There may be an issue with how you register in solrconfig.xml

-Yonik


Re: Highlight in a response writer, bad practice ?

2007-06-06 Thread Yonik Seeley

On 6/6/07, Frédéric Glorieux <[EMAIL PROTECTED]> wrote:

Tests are going very well with sorl. I'm working for an academic
project, with not a lot of users, but with high demands, this will
explain the background of my question. For linguistic activities,
searching is a goal by itself, retrieving a document may be second.
That's why it's common to serve thousands of "highlighted snippets" from
results (in solr terms), like a "concordancer".

In such cases, it seems memory expensive to prepare really big snippets
lists from StandardRequestHandler. I'm beginning to make it work from a
ResponseWriter (thanks for all needed code already in
HighlightingUtils), so that snippets are directly written to the
response, without storing.
Before working too much on this code, is it good practice ? Did I miss
an important reason ?


Simplicity.  The memory usage for highlight fields in normal responses
is not an issue.
If it becomes an issue for you, then you're roughly taking the right approach.

However, rather than write your own response writer to solve your
issue, you might consider
just your own response handler, and insert an Iterable (which will be
written as an array in the response writer).  This way, all response
writers (xml, json, etc) will work.

-Yonik


Highlight in a response writer, bad practice ?

2007-06-06 Thread Frédéric Glorieux

Hi all,

Tests are going very well with sorl. I'm working for an academic 
project, with not a lot of users, but with high demands, this will 
explain the background of my question. For linguistic activities, 
searching is a goal by itself, retrieving a document may be second. 
That's why it's common to serve thousands of "highlighted snippets" from 
results (in solr terms), like a "concordancer".


In such cases, it seems memory expensive to prepare really big snippets 
lists from StandardRequestHandler. I'm beginning to make it work from a 
ResponseWriter (thanks for all needed code already in 
HighlightingUtils), so that snippets are directly written to the 
response, without storing.


Before working too much on this code, is it good practice ? Did I miss 
an important reason ? I understand the choice of StandardRequestHandler 
for a normal usage of a search engine (paged results), to avoid code 
replication for each ResponseWriter (XML, Json...). Am I wrong ?


If solr/lucene gurus have time to listen, I will also need some infos 
about highlighter, for another post.


--
Frédéric Glorieux
École nationale des chartes
direction des nouvelles technologies et de l'informatique


Re: post.jar is absent in Solr distribution

2007-06-06 Thread Erik Hatcher
post.jar was added since then.  Solr 1.2 is on its way, but you can  
also get a nightly build here:





On Jun 6, 2007, at 7:03 AM, Manoharam Reddy wrote:


I am an absolute noob to solr and I am trying out the Solr tutorial
present at http://lucene.apache.org/solr/tutorial.html

In the tutorial, post.jar is mentioned but I don't find post.jar
anywhere. I downloaded the solr tarball from
http://www.eu.apache.org/dist/lucene/solr/1.1/apache-solr-1.1.0- 
incubating.tgz


What do I do now?




post.jar is absent in Solr distribution

2007-06-06 Thread Manoharam Reddy

I am an absolute noob to solr and I am trying out the Solr tutorial
present at http://lucene.apache.org/solr/tutorial.html

In the tutorial, post.jar is mentioned but I don't find post.jar
anywhere. I downloaded the solr tarball from
http://www.eu.apache.org/dist/lucene/solr/1.1/apache-solr-1.1.0-incubating.tgz

What do I do now?


custom writer, working but... a strange exception in logs

2007-06-06 Thread Frédéric Glorieux


Hi all,

At first, lucene user for years, I should really thanks you for Solr.

For a start, I wrote a little results writer for an app. It works like 
what I understand of Solr, except a strange exception I'm not able to 
puzzle.


Version : fresh subversion.
 1. Class
 2. stacktrace
 3. maybe ?

1. Class


public class HTMLResponseWriter implements QueryResponseWriter {
  public static String CONTENT_TYPE_HTML_UTF8 = "text/html; charset=UTF-8";
  /** A custom HTML header configured from solrconfig.xml */
  static String HEAD;
  /** A custom HTML footer configured from solrconfig.xml */
  static String FOOT;

  /** get some snippets from conf */
  public void init(NamedList n) {
String s=(String)n.get("head");
if (s != null && !"".equals(s)) HEAD = s;
s=(String)n.get("foot");
if (s != null && !"".equals(s)) FOOT = s;
  }

  public void write(Writer writer, SolrQueryRequest req, 
SolrQueryResponse rsp)

  throws IOException {
// cause the exception below
writer.write(HEAD);
/* loop on my results, working like it should */
// cause the exception below
writer.write(FOOT);
  }

  public String getContentType(SolrQueryRequest request, 
SolrQueryResponse response) {

return CONTENT_TYPE_HTML_UTF8;
  }

}

2. Stacktrace
=

GRAVE: org.apache.solr.core.SolrException: Missing required parameter: q
	at 
org.apache.solr.request.RequiredSolrParams.get(RequiredSolrParams.java:50)
	at 
org.apache.solr.request.StandardRequestHandler.handleRequestBody(StandardRequestHandler.java:72)
	at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:66)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
...

3. Maybe ?
==

I can't figure why, but when writer.write(HEAD) is executed, I see code 
from StandardRequestHandler executed 2 times in the debugger, first is 
OK, second hasn't the q parameter. Displaying results is always OK. 
Without such lines, there is only one call to StandardRequestHandler, no 
exception in log, but no more head or foot. When HEAD and FOOT values 
are hard coded and not configured, there's no exception. If HEAD and 
FOOT are not static, problem is the same.


Is it a mistake in my code ? Every piece of advice welcome, and if I 
touch a bug, be sure I will do my best to help.


--
Frédéric Glorieux
École nationale des chartes
direction des nouvelles technologies et de l'informatique