Re: Performance optimization of Proximity/Wildcard searches

2011-02-06 Thread Salman Akram
Only couple of thousand documents are added daily so the old OS cache should
still be useful since old documents remain same, right?

Also can you please comment on my other thread related to Term Vectors?
Thanks!

On Sat, Feb 5, 2011 at 8:40 PM, Otis Gospodnetic  wrote:

> Yes, OS cache mostly remains (obviously index files that are no longer
> around
> are going to remain the OS cache for a while, but will be useless and
> gradually
> replaced by new index files).
> How long warmup takes is not relevant here, but what queries you use to
> warm up
> the index and how much you auto-warm the caches.
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
> > From: Salman Akram 
> > To: solr-user@lucene.apache.org
> > Sent: Sat, February 5, 2011 4:06:54 AM
> > Subject: Re: Performance optimization of Proximity/Wildcard searches
> >
> > Correct me if I am wrong.
> >
> > Commit in index flushes SOLR cache but of  course OS cache would still be
> > useful? If a an index is updated every hour  then a warm up that takes
> less
> > than 5 mins should be more than enough,  right?
> >
> > On Sat, Feb 5, 2011 at 7:42 AM, Otis Gospodnetic <
> otis_gospodne...@yahoo.com
> > >  wrote:
> >
> > > Salman,
> > >
> > > Warming up may be useful if your  caches are getting decent hit ratios.
> > > Plus, you
> > > are warming up  the OS cache when you warm up.
> > >
> > > Otis
> > > 
> > >  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > > Lucene ecosystem  search :: http://search-lucene.com/
> > >
> > >
> > >
> > > - Original  Message 
> > > > From: Salman Akram 
> > >  > To: solr-user@lucene.apache.org
> > >  > Sent: Fri, February 4, 2011 3:33:41 PM
> > > > Subject: Re:  Performance optimization of Proximity/Wildcard searches
> > > >
> > >  > I know so we are not really using it for regular warm-ups (in any
>  case
> > >  index
> > > > is updated on hourly basis). Just tried  few times to compare
> results.
> > >  The
> > > > issue is I am not  even sure if warming up is useful for such
>  regular
> > > >  updates.
> > > >
> > > >
> > > >
> > > > On Fri, Feb 4, 2011  at 5:16 PM, Otis  Gospodnetic <
> > > otis_gospodne...@yahoo.com
> > >  > >  wrote:
> > > >
> > > > > Salman,
> > > >  >
> > > > > I only skimmed your email, but wanted  to say that  this part
> sounds a
> > > little
> > > > > suspicious:
> > > >  >
> > > > > >  Our warm up script currently  executes  all distinct queries in
> our
> > >  logs
> > > > > > having  count > 5. It was run  yesterday (with all the  indexing
> > >  update
> > > > > every
> > > > >
> > > > > It sounds  like this will make  warmup take a long time,
> assuming
> > >  you
> > > > > have
> > > > > more than a  handful distinct  queries in your logs.
> > > > >
> > > > > Otis
> > > > >  
> > > > >  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > > >  > Lucene ecosystem  search :: http://search-lucene.com/
> > > > >
> > > >  >
> > > > >
> > > > > - Original  Message  
> > > > > > From: Salman Akram 
> > >  > >  > To: solr-user@lucene.apache.org; t...@statsbiblioteket.dk
> > > >  > >  Sent: Tue, January 25, 2011 6:32:48 AM
> > > > > >  Subject: Re: Performance  optimization of Proximity/Wildcard
> searches
> > > > > >
> > > > > > By warmed  index you  only mean warming the SOLR cache or OS
> cache? As
> > > I
> > > >  >   said
> > > > > > our index is updated every hour so I am  not sure how much SOLR
>  cache
> > > > >  would
> > > >  > > be helpful but OS cache should still be  helpful, right?
> > >  > > >
> > > > > > I  haven't compared the results   with a proper script but from
> manual
> > > > >  testing
> > >  > > > here are  some of the observations.
> > > > >  >
> > > > > > 'Recent' queries which  are  in cache of  course return
> immediately
> > > (only
> > > > > if
> > > > >  >  they are exactly same - even  if they took 3-4 mins first
> time).  I
> > >  will
> > > > > need
> > > > > > to test how  many recent  queries stay in  cache but still this
> would
> > >  work
> > > > > only
> > > > > > for very common   queries.  User can run different queries and I
> want
> > > at
> > > >  >  least
> > > > > > them to be at 'acceptable'  level  (5-10 secs) even if  not very
> fast.
> > > > > >
> > > >  > > Our warm up script currently   executes all distinct queries in
>  our
> > > logs
> > > > > > having count > 5. It  was  run  yesterday (with all the indexing
> > > update
> > > > >  every
> > > > > >  hour after that) and today when  I  executed some of the same
> > >  queries
> > > > > again
> > >  > > > their time seemed a little less  (around  15-20%), I am  not
> sure if
> > > this
> > > > > means
> > > > > >  anything. However,  still their  time is not acceptable.
> > > >  > >
> > > > > > What do you  think is the best way to  compare  results? First
> run all
> > > the
> > > > >   warm
> > > > > > up queries and then 

Re: AND operator and dismax request handler

2011-02-06 Thread Grijesh

Hi Bagesh,

I think Hossman and Erick have given you the path that can you choos
and found the desired result.
Try mm value set to 0 to dismax work for your operators "AND" OR and NOT.

Thanx:
Grijesh
Lucid Imagination Inc.

On Sat, Feb 5, 2011 at 8:17 PM, Bagesh Sharma [via Lucene]
 wrote:
> Hi friends, Please suggest me that how can i set query operator to AND for
> dismax request handler case.
>
> My problem is that i am searching a string "water treatment plant" using
> dismax request handler . The query formed is of such type 
>
> http://localhost:8884/solr/select/?q=water+treatment+plant&q.alt=*:*&start=0&rows=5&sort=score%20desc&qt=dismax&omitHeader=true
>
> My handling for dismax request handler in solrConfig.xml is -
>
>  default="true">
>         
>                 true
>                 explicit
>                 0.2
>
>                 
>                         TDR_SUBIND_SUBTDR_SHORT^3
>                         TDR_SUBIND_SUBTDR_DETAILS^2
>                         TDR_SUBIND_COMP_NAME^1.5
>                         TDR_SUBIND_LOC_STATE^3
>                         TDR_SUBIND_PROD_NAMES^2.5
>                         TDR_SUBIND_LOC_CITY^3
>                         TDR_SUBIND_LOC_ZIP^2.5
>                         TDR_SUBIND_NAME^1.5
>                         TDR_SUBIND_TENDER_NO^1
>                 
>
>                 
>                         TDR_SUBIND_SUBTDR_SHORT^15
>                         TDR_SUBIND_SUBTDR_DETAILS^10
>                         TDR_SUBIND_COMP_NAME^20
>                 
>
>                 1
>                 0
>                 20%
>         
> 
>
>
> In the final parsed query it is like
>
> +((TDR_SUBIND_PROD_NAMES:water^2.5 | TDR_SUBIND_LOC_ZIP:water^2.5 |
> TDR_SUBIND_COMP_NAME:water^1.5 | TDR_SUBIND_TENDER_NO:water |
> TDR_SUBIND_SUBTDR_SHORT:water^3.0 | TDR_SUBIND_SUBTDR_DETAILS:water^2.0 |
> TDR_SUBIND_LOC_CITY:water^3.0 | TDR_SUBIND_LOC_STATE:water^3.0 |
> TDR_SUBIND_NAME:water^1.5)~0.2 (TDR_SUBIND_PROD_NAMES:treatment^2.5 |
> TDR_SUBIND_LOC_ZIP:treatment^2.5 | TDR_SUBIND_COMP_NAME:treatment^1.5 |
> TDR_SUBIND_TENDER_NO:treatment | TDR_SUBIND_SUBTDR_SHORT:treatment^3.0 |
> TDR_SUBIND_SUBTDR_DETAILS:treatment^2.0 | TDR_SUBIND_LOC_CITY:treatment^3.0
> | TDR_SUBIND_LOC_STATE:treatment^3.0 | TDR_SUBIND_NAME:treatment^1.5)~0.2
> (TDR_SUBIND_PROD_NAMES:plant^2.5 | TDR_SUBIND_LOC_ZIP:plant^2.5 |
> TDR_SUBIND_COMP_NAME:plant^1.5 | TDR_SUBIND_TENDER_NO:plant |
> TDR_SUBIND_SUBTDR_SHORT:plant^3.0 | TDR_SUBIND_SUBTDR_DETAILS:plant^2.0 |
> TDR_SUBIND_LOC_CITY:plant^3.0 | TDR_SUBIND_LOC_STATE:plant^3.0 |
> TDR_SUBIND_NAME:plant^1.5)~0.2) (TDR_SUBIND_SUBTDR_DETAILS:"water treatment
> plant"^10.0 | TDR_SUBIND_COMP_NAME:"water treatment plant"^20.0 |
> TDR_SUBIND_SUBTDR_SHORT:"water treatment plant"^15.0)~0.2
>
>
>
> Now it gives me results if any of the word is found from text "water
> treatment plant". I think here OR operator is working which finally combines
> the results.
>
> Now i want only those results for which only complete text should be
> matching "water treatment plant".
>
> 1. I do not want to make any change in solrConfig.xml dismax handler. If
> possible then suggest any other handler to deal with it.
>
> 2. Does there is really or operator is working in query. basically when i
> query like this
>
> q=%2Bwater%2Btreatment%2Bplant&q.alt=*:*&q.op=AND&start=0&rows=5&sort=score
> desc,TDR_SUBIND_SUBTDR_OPEN_DATE
> asc&omitHeader=true&debugQuery=true&qt=dismax
>
> OR
>
> q=water+AND+treatment+AND+plant&q.alt=*:*&q.op=AND&start=0&rows=5&sort=score
> desc,TDR_SUBIND_SUBTDR_OPEN_DATE
> asc&omitHeader=true&debugQuery=true&qt=dismax
>
>
> Then it is giving different results. Can you suggest what is the difference
> between above two queries.
>
> Please suggest me for full text search "water treatment plant".
>
> Thanks for your response.
>
>
> 
> This email was sent by Bagesh Sharma (via Nabble)
> Your replies will appear at
> http://lucene.472066.n3.nabble.com/AND-operator-and-dismax-request-handler-tp2431391p2431391.html
> To receive all replies by email, subscribe to this discussion
>
>


-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/AND-operator-and-dismax-request-handler-tp2431391p2441363.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: HTTP ERROR 400 undefined field: *

2011-02-06 Thread Otis Gospodnetic
Yup, here it is, warning about needing to reindex:

http://twitter.com/#!/lucene/status/28694113180192768

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Erick Erickson 
> To: solr-user@lucene.apache.org
> Sent: Sun, February 6, 2011 9:43:00 AM
> Subject: Re: HTTP ERROR 400 undefined field: *
> 
> I *think* that there was a post a while ago saying that if you were
> using  trunk 3_x one of the recent changes required re-indexing, but don't
> quote me  on that.
> Have you tried that?
> 
> Best
> Erick
> 
> On Fri, Feb 4, 2011  at 2:04 PM, Jed Glazner 
>wrote:
> 
> >  Sorry for the lack of details.
> >
> > It's all clear in my head..  :)
> >
> > We checked out the head revision from the 3.x branch a few  weeks ago (
> > https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/).  We
> > picked up r1058326.
> >
> > We upgraded from a previous  checkout (r960098). I am using our customized
> > schema.xml and the  solrconfig.xml from the old revision with the new
> >  checkout.
> >
> > After upgrading I just copied the data folders from  each core into the new
> > checkout (hoping I wouldn't have to re-index the  content, as this takes
> > days).  Everything seems to work fine,  except that now I can't get the 
score
> > to return.
> >
> > The  stack trace is attached.  I also saw this warning in the logs not  sure
> > exactly what it's talking about:
> >
> > Feb 3, 2011  8:14:10 PM org.apache.solr.core.Config getLuceneVersion
> > WARNING: the  luceneMatchVersion is not specified, defaulting to LUCENE_24
> > emulation.  You should at some point declare and reindex to at least 3.0,
> > because  2.4 emulation is deprecated and will be removed in 4.0. This
> > parameter  will be mandatory in 4.0.
> >
> > Here is my request handler, the actual  fields here are different than what
> > is in mine, but I'm a little  uncomfortable publishing how our companies
> > search service works to the  world:
> >
> > 
> > 
> > explicit
> > edismax
> > true
> > 
> > field_a^2 field_b^2 field_c^4  
> >
> > 
> >  field_d^10
> >
> > 
> > 
> > 
> > 0.1
> > 
> > 
> > tvComponent
> >  
> > 
> >
> > Anyway   Hopefully this is enough info, let me know if you need more.
> >
> >  Jed.
> >
> >
> >
> >
> >
> >
> > On 02/03/2011 10:29  PM, Chris Hostetter wrote:
> >
> >> : I was working on an checkout of  the 3.x branch from about 6 months ago.
> >> : Everything was working  pretty well, but we decided that we should update
> >> and
> >> :  get what was at the head.  However after upgrading, I am now  getting
> >> this
> >>
> >> FWIW: please be specific.   "head" of what? the 3x branch? or trunk?  what
> >> revision in svn  does that corrispond to? (the "svnversion" command will
> >> tell  you)
> >>
> >> : HTTP ERROR 400 undefined field: *
> >>  :
> >> : If I clear the fl parameter (default is set to *, score) then it  works
> >> fine
> >> : with one big problem, no score data.   If I try and set fl=score I get
> >> the same
> >> : error except  it says undefined field: score?!
> >> :
> >> : This works great in  the older version, what changed?  I've googled for
> >>  about
> >> : an hour now and I can't seem to find  anything.
> >>
> >> i can't reproduce this using either trunk  (r1067044) or 3x (r1067045)
> >>
> >> all of these queries work  just fine...
> >>
> >>http://localhost:8983/solr/select/?q=*
> >> http://localhost:8983/solr/select/?q=solr&fl=*,score
> >> http://localhost:8983/solr/select/?q=solr&fl=score
> >> http://localhost:8983/solr/select/?q=solr
> >>
> >> ...you'll  have to proivde us with a *lot* more details to help understand
> >> why  you might be getting an error (like: what your configs look like,  
what
> >> the request looks like, what the full stack trace of your error  is in the
> >> logs,  etc...)
> >>
> >>
> >>
> >>
> >>  -Hoss
> >>
> >
> >
> 


Re: Optimize seaches; business is progressing with my Solr site

2011-02-06 Thread Dennis Gearon
Hmmm, my default distance for geospatial was excluding the results, I  believe. 
I have to check to see if I was actually looking at the desired  return result 
for 'ballroom' alone. Mabye I wasn't.

But I saw a lot to learn when I applied the techniques you gave me. Thank you 
:-)

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.





From: Erick Erickson 
To: solr-user@lucene.apache.org
Sent: Sun, February 6, 2011 8:21:15 AM
Subject: Re: Optimize seaches; business is progressing with my Solr site

What does &debugQuery=on give you? Second, what optimizatons are you doing?
What shows up in they analysis page? does your admin page show the terms in
your copyfield you expect?

Best
Erick

On Sun, Feb 6, 2011 at 2:03 AM, Dennis Gearon  wrote:

> Thanks to LOTS of information from you guys, my site is up and working.
> It's
> only an API now, I need to work on my OWN front end, LOL!
>
> I have my second customer. My general purpose repository API is very useful
> I'm
> finding. I will soon be in the business of optimizing the search engine
> part.
>
>
> For example. I have a copy field that has the words, 'boogie woogie
> ballroom' on
> lots of records in the copy field. I cannot find those records using
> 'boogie/boogi/boog', or the woogie versions of those, but I can with
> ballroom.
> For my VERY first lesson in optimization of search, what might be causing
> that,
> and where are the places to read on the Solr site on this?
>
> All the best on a Sunday, guys and gals.
>
>  Dennis Gearon
>
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a
> better
> idea to learn from others’ mistakes, so you do not have to make them
> yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>


Re: Separating Index Reader and Writer

2011-02-06 Thread Em

Hi Peter,

I must jump in this discussion: From a logical point of view what you are
saying makes only sense if both instances do not run on the same machine or
at least not on the same drive.

When both run on the same machine and the same drive, the overall used
memory should be equal plus I do not understand why this setup should affect
cache warming etc., since the process of rewarming should be the same.

Well, my knowledge about the internals is not very deep. But from just a
logical point of view - to me - the same is happening as if I would do it in
a single solr-instance. So what is the difference, what do I overlook?

Another thing: While W is committing and writing to the index, is there any
inconsistency in R or isn't there any, because W is writing a new Segment
and so for R there isn't anything different until the commit finished?
Are there problems during optimizing an index?

How do you inform R about the finished commit?

Thank you for your explanation, it's a really interesting topic!

Regards,
Em

Peter Sturge-2 wrote:
> 
> Hi,
> 
> We use this scenario in production where we have one write-only Solr
> instance and 1 read-only, pointing to the same data.
> We do this so we can optimize caching/etc. for each instance for
> write/read. The main performance gain is in cache warming and
> associated parameters.
> For your Index W, it's worth turning off cache warming altogether, so
> commits aren't slowed down by warming.
> 
> Peter
> 
> 
> On Sun, Feb 6, 2011 at 3:25 PM, Isan Fulia 
> wrote:
>> Hi all,
>> I have setup two indexes one for reading(R) and other for
>> writing(W).Index R
>> refers to the same data dir of W (defined in solrconfig via ).
>> To make sure the R index sees the indexed documents of W , i am firing an
>> empty commit on R.
>> With this , I am getting performance improvement as compared to using the
>> same index for reading and writing .
>> Can anyone help me in knowing why this performance improvement is taking
>> place even though both the indexeses are pointing to the same data
>> directory.
>>
>> --
>> Thanks & Regards,
>> Isan Fulia.
>>
> 
> 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Separating-Index-Reader-and-Writer-tp2437666p2438730.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Separating Index Reader and Writer

2011-02-06 Thread Isan Fulia
Hi peter ,
Can you elaborate a little on how performance gain is in cache warming.I am
getting a good improvement on search time.

On 6 February 2011 23:29, Peter Sturge  wrote:

> Hi,
>
> We use this scenario in production where we have one write-only Solr
> instance and 1 read-only, pointing to the same data.
> We do this so we can optimize caching/etc. for each instance for
> write/read. The main performance gain is in cache warming and
> associated parameters.
> For your Index W, it's worth turning off cache warming altogether, so
> commits aren't slowed down by warming.
>
> Peter
>
>
> On Sun, Feb 6, 2011 at 3:25 PM, Isan Fulia 
> wrote:
> > Hi all,
> > I have setup two indexes one for reading(R) and other for
> writing(W).Index R
> > refers to the same data dir of W (defined in solrconfig via ).
> > To make sure the R index sees the indexed documents of W , i am firing an
> > empty commit on R.
> > With this , I am getting performance improvement as compared to using the
> > same index for reading and writing .
> > Can anyone help me in knowing why this performance improvement is taking
> > place even though both the indexeses are pointing to the same data
> > directory.
> >
> > --
> > Thanks & Regards,
> > Isan Fulia.
> >
>



-- 
Thanks & Regards,
Isan Fulia.


Re: Separating Index Reader and Writer

2011-02-06 Thread Peter Sturge
Hi,

We use this scenario in production where we have one write-only Solr
instance and 1 read-only, pointing to the same data.
We do this so we can optimize caching/etc. for each instance for
write/read. The main performance gain is in cache warming and
associated parameters.
For your Index W, it's worth turning off cache warming altogether, so
commits aren't slowed down by warming.

Peter


On Sun, Feb 6, 2011 at 3:25 PM, Isan Fulia  wrote:
> Hi all,
> I have setup two indexes one for reading(R) and other for writing(W).Index R
> refers to the same data dir of W (defined in solrconfig via ).
> To make sure the R index sees the indexed documents of W , i am firing an
> empty commit on R.
> With this , I am getting performance improvement as compared to using the
> same index for reading and writing .
> Can anyone help me in knowing why this performance improvement is taking
> place even though both the indexeses are pointing to the same data
> directory.
>
> --
> Thanks & Regards,
> Isan Fulia.
>


Re: Optimize seaches; business is progressing with my Solr site

2011-02-06 Thread Erick Erickson
What does &debugQuery=on give you? Second, what optimizatons are you doing?
What shows up in they analysis page? does your admin page show the terms in
your copyfield you expect?

Best
Erick

On Sun, Feb 6, 2011 at 2:03 AM, Dennis Gearon  wrote:

> Thanks to LOTS of information from you guys, my site is up and working.
> It's
> only an API now, I need to work on my OWN front end, LOL!
>
> I have my second customer. My general purpose repository API is very useful
> I'm
> finding. I will soon be in the business of optimizing the search engine
> part.
>
>
> For example. I have a copy field that has the words, 'boogie woogie
> ballroom' on
> lots of records in the copy field. I cannot find those records using
> 'boogie/boogi/boog', or the woogie versions of those, but I can with
> ballroom.
> For my VERY first lesson in optimization of search, what might be causing
> that,
> and where are the places to read on the Solr site on this?
>
> All the best on a Sunday, guys and gals.
>
>  Dennis Gearon
>
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a
> better
> idea to learn from others’ mistakes, so you do not have to make them
> yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>


Separating Index Reader and Writer

2011-02-06 Thread Isan Fulia
Hi all,
I have setup two indexes one for reading(R) and other for writing(W).Index R
refers to the same data dir of W (defined in solrconfig via ).
To make sure the R index sees the indexed documents of W , i am firing an
empty commit on R.
With this , I am getting performance improvement as compared to using the
same index for reading and writing .
Can anyone help me in knowing why this performance improvement is taking
place even though both the indexeses are pointing to the same data
directory.

-- 
Thanks & Regards,
Isan Fulia.


Re: AND operator and dismax request handler

2011-02-06 Thread Erick Erickson
Try attaching &debugQuery=on to your queries. The results will show
you exactly what the query is after it gets parsed and the difference
should stand out.

About dismax. Try looking at the "minimum should match" parameter,
that might do what you're looking for. Or, think about edismax if you're on
trunk or 3_x...

Best
Erick

On Sat, Feb 5, 2011 at 9:47 AM, Bagesh Sharma  wrote:

>
> Hi friends, Please suggest me that how can i set query operator to AND for
> dismax request handler case.
>
> My problem is that i am searching a string "water treatment plant" using
> dismax request handler . The query formed is of such type 
>
>
> http://localhost:8884/solr/select/?q=water+treatment+plant&q.alt=*:*&start=0&rows=5&sort=score%20desc&qt=dismax&omitHeader=true
>
> My handling for dismax request handler in solrConfig.xml is -
>
>  default="true">
>
>true
>explicit
>0.2
>
>
>TDR_SUBIND_SUBTDR_SHORT^3
>TDR_SUBIND_SUBTDR_DETAILS^2
>TDR_SUBIND_COMP_NAME^1.5
>TDR_SUBIND_LOC_STATE^3
>TDR_SUBIND_PROD_NAMES^2.5
>TDR_SUBIND_LOC_CITY^3
>TDR_SUBIND_LOC_ZIP^2.5
>TDR_SUBIND_NAME^1.5
>TDR_SUBIND_TENDER_NO^1
>
>
>
>TDR_SUBIND_SUBTDR_SHORT^15
>TDR_SUBIND_SUBTDR_DETAILS^10
>TDR_SUBIND_COMP_NAME^20
>
>
>1
>0
>20%
>
> 
>
>
> In the final parsed query it is like
>
> +((TDR_SUBIND_PROD_NAMES:water^2.5 | TDR_SUBIND_LOC_ZIP:water^2.5 |
> TDR_SUBIND_COMP_NAME:water^1.5 | TDR_SUBIND_TENDER_NO:water |
> TDR_SUBIND_SUBTDR_SHORT:water^3.0 | TDR_SUBIND_SUBTDR_DETAILS:water^2.0 |
> TDR_SUBIND_LOC_CITY:water^3.0 | TDR_SUBIND_LOC_STATE:water^3.0 |
> TDR_SUBIND_NAME:water^1.5)~0.2 (TDR_SUBIND_PROD_NAMES:treatment^2.5 |
> TDR_SUBIND_LOC_ZIP:treatment^2.5 | TDR_SUBIND_COMP_NAME:treatment^1.5 |
> TDR_SUBIND_TENDER_NO:treatment | TDR_SUBIND_SUBTDR_SHORT:treatment^3.0 |
> TDR_SUBIND_SUBTDR_DETAILS:treatment^2.0 | TDR_SUBIND_LOC_CITY:treatment^3.0
> | TDR_SUBIND_LOC_STATE:treatment^3.0 | TDR_SUBIND_NAME:treatment^1.5)~0.2
> (TDR_SUBIND_PROD_NAMES:plant^2.5 | TDR_SUBIND_LOC_ZIP:plant^2.5 |
> TDR_SUBIND_COMP_NAME:plant^1.5 | TDR_SUBIND_TENDER_NO:plant |
> TDR_SUBIND_SUBTDR_SHORT:plant^3.0 | TDR_SUBIND_SUBTDR_DETAILS:plant^2.0 |
> TDR_SUBIND_LOC_CITY:plant^3.0 | TDR_SUBIND_LOC_STATE:plant^3.0 |
> TDR_SUBIND_NAME:plant^1.5)~0.2) (TDR_SUBIND_SUBTDR_DETAILS:"water treatment
> plant"^10.0 | TDR_SUBIND_COMP_NAME:"water treatment plant"^20.0 |
> TDR_SUBIND_SUBTDR_SHORT:"water treatment plant"^15.0)~0.2
>
>
>
> Now it gives me results if any of the word is found from text "water
> treatment plant". I think here OR operator is working which finally
> combines
> the results.
>
> Now i want only those results for which only complete text should be
> matching "water treatment plant".
>
> 1. I do not want to make any change in solrConfig.xml dismax handler. If
> possible then suggest any other handler to deal with it.
>
> 2. Does there is really or operator is working in query. basically when i
> query like this
>
> q=%2Bwater%2Btreatment%2Bplant&q.alt=*:*&q.op=AND&start=0&rows=5&sort=score
> desc,TDR_SUBIND_SUBTDR_OPEN_DATE
> asc&omitHeader=true&debugQuery=true&qt=dismax
>
> OR
>
>
> q=water+AND+treatment+AND+plant&q.alt=*:*&q.op=AND&start=0&rows=5&sort=score
> desc,TDR_SUBIND_SUBTDR_OPEN_DATE
> asc&omitHeader=true&debugQuery=true&qt=dismax
>
>
> Then it is giving different results. Can you suggest what is the difference
> between above two queries.
>
> Please suggest me for full text search "water treatment plant".
>
> Thanks for your response.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/AND-operator-and-dismax-request-handler-tp2431391p2431391.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: HTTP ERROR 400 undefined field: *

2011-02-06 Thread Erick Erickson
I *think* that there was a post a while ago saying that if you were
using trunk 3_x one of the recent changes required re-indexing, but don't
quote me on that.
Have you tried that?

Best
Erick

On Fri, Feb 4, 2011 at 2:04 PM, Jed Glazner wrote:

> Sorry for the lack of details.
>
> It's all clear in my head.. :)
>
> We checked out the head revision from the 3.x branch a few weeks ago (
> https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/). We
> picked up r1058326.
>
> We upgraded from a previous checkout (r960098). I am using our customized
> schema.xml and the solrconfig.xml from the old revision with the new
> checkout.
>
> After upgrading I just copied the data folders from each core into the new
> checkout (hoping I wouldn't have to re-index the content, as this takes
> days).  Everything seems to work fine, except that now I can't get the score
> to return.
>
> The stack trace is attached.  I also saw this warning in the logs not sure
> exactly what it's talking about:
>
> Feb 3, 2011 8:14:10 PM org.apache.solr.core.Config getLuceneVersion
> WARNING: the luceneMatchVersion is not specified, defaulting to LUCENE_24
> emulation. You should at some point declare and reindex to at least 3.0,
> because 2.4 emulation is deprecated and will be removed in 4.0. This
> parameter will be mandatory in 4.0.
>
> Here is my request handler, the actual fields here are different than what
> is in mine, but I'm a little uncomfortable publishing how our companies
> search service works to the world:
>
> 
> 
> explicit
> edismax
> true
> 
> field_a^2 field_b^2 field_c^4 
>
> 
> field_d^10
>
> 
> 
> 
> 0.1
> 
> 
> tvComponent
> 
> 
>
> Anyway  Hopefully this is enough info, let me know if you need more.
>
> Jed.
>
>
>
>
>
>
> On 02/03/2011 10:29 PM, Chris Hostetter wrote:
>
>> : I was working on an checkout of the 3.x branch from about 6 months ago.
>> : Everything was working pretty well, but we decided that we should update
>> and
>> : get what was at the head.  However after upgrading, I am now getting
>> this
>>
>> FWIW: please be specific.  "head" of what? the 3x branch? or trunk?  what
>> revision in svn does that corrispond to? (the "svnversion" command will
>> tell you)
>>
>> : HTTP ERROR 400 undefined field: *
>> :
>> : If I clear the fl parameter (default is set to *, score) then it works
>> fine
>> : with one big problem, no score data.  If I try and set fl=score I get
>> the same
>> : error except it says undefined field: score?!
>> :
>> : This works great in the older version, what changed?  I've googled for
>> about
>> : an hour now and I can't seem to find anything.
>>
>> i can't reproduce this using either trunk (r1067044) or 3x (r1067045)
>>
>> all of these queries work just fine...
>>
>>http://localhost:8983/solr/select/?q=*
>>http://localhost:8983/solr/select/?q=solr&fl=*,score
>>http://localhost:8983/solr/select/?q=solr&fl=score
>>http://localhost:8983/solr/select/?q=solr
>>
>> ...you'll have to proivde us with a *lot* more details to help understand
>> why you might be getting an error (like: what your configs look like, what
>> the request looks like, what the full stack trace of your error is in the
>> logs, etc...)
>>
>>
>>
>>
>> -Hoss
>>
>
>


Re: keepword file with phrases

2011-02-06 Thread lee carroll
Hi Chris,

Yes you've identified the problem :-)

I've tried using keyword tokeniser but that seems to merge all comma
seperated lists of synonyms in one.

the pattern tokeniser would seem to be a candidate but can you pass the
pattern attribute to the tokeniser attribute in the synontm filter ?

example synonym line which is problematic

termA1,termA2,termA3, phrase termA, termA4 => normalisedTermA
termB1,termB2,termB3 => normalisedTermB

when the synonym filter uses the keyword tokeniser

only "phrase term A" ends up being matched as a synonym :-)


lee


On 6 February 2011 12:58, lee carroll  wrote:

> Hi Bill,
>
> quoting in the synonyms file did not produce the correct expansion :-(
>
> Looking at Chris's comments now
>
> cheers
>
> lee
>
>
> On 5 February 2011 23:38, Bill Bell  wrote:
>
>> OK that makes sense.
>>
>> If you double quote the synonyms file will that help for white space?
>>
>> Bill
>>
>>
>> On 2/5/11 4:37 PM, "Chris Hostetter"  wrote:
>>
>> >
>> >: You need to switch the order. Do synonyms and expansion first, then
>> >: shingles..
>> >
>> >except then he would be building shingles out of all the permutations of
>> >"words" in his symonyms -- including the multi-word synonyms.  i don't
>> >*think* that's what he wants based on his example (but i may be wrong)
>> >
>> >: Have you tried using analysis.jsp ?
>> >
>> >he already mentioned he has, in his original mail, and that's how he can
>> >tell it's not working.
>> >
>> >lee: based on your followup post about seeing problems in the synonyms
>> >output, i suspect the problem you are having is with how the
>> >synonymfilter
>> >"parses" the synonyms file -- by default it assumes it should split on
>> >certain characters to creates multi-word synonyms -- but in your case the
>> >tokens you are feeding synonym filter (the output of your shingle filter)
>> >really do have whitespace in them
>> >
>> >there is a "tokenizerFactory" option that Koji added a hwile back to the
>> >SYnonymFilterFactory that lets you specify the classname of a
>> >TokenizerFactory to use when parsing the synonym rule -- that may be what
>> >you need to get your synonyms with spaces in them (so they work properly
>> >with your shingles)
>> >
>> >(assuming of course that i really understand your problem)
>> >
>> >
>> >-Hoss
>>
>>
>>
>


Re: keepword file with phrases

2011-02-06 Thread lee carroll
Hi Bill,

quoting in the synonyms file did not produce the correct expansion :-(

Looking at Chris's comments now

cheers

lee

On 5 February 2011 23:38, Bill Bell  wrote:

> OK that makes sense.
>
> If you double quote the synonyms file will that help for white space?
>
> Bill
>
>
> On 2/5/11 4:37 PM, "Chris Hostetter"  wrote:
>
> >
> >: You need to switch the order. Do synonyms and expansion first, then
> >: shingles..
> >
> >except then he would be building shingles out of all the permutations of
> >"words" in his symonyms -- including the multi-word synonyms.  i don't
> >*think* that's what he wants based on his example (but i may be wrong)
> >
> >: Have you tried using analysis.jsp ?
> >
> >he already mentioned he has, in his original mail, and that's how he can
> >tell it's not working.
> >
> >lee: based on your followup post about seeing problems in the synonyms
> >output, i suspect the problem you are having is with how the
> >synonymfilter
> >"parses" the synonyms file -- by default it assumes it should split on
> >certain characters to creates multi-word synonyms -- but in your case the
> >tokens you are feeding synonym filter (the output of your shingle filter)
> >really do have whitespace in them
> >
> >there is a "tokenizerFactory" option that Koji added a hwile back to the
> >SYnonymFilterFactory that lets you specify the classname of a
> >TokenizerFactory to use when parsing the synonym rule -- that may be what
> >you need to get your synonyms with spaces in them (so they work properly
> >with your shingles)
> >
> >(assuming of course that i really understand your problem)
> >
> >
> >-Hoss
>
>
>


Re: UIMA Error

2011-02-06 Thread Darx Oman
Hi
How to apply the AlchemyAPIAnnotator?
will this helps me with the *NamedEntityExtractionAnnotator?*
*thanx a lot Tommaso for you time*