from:"Sandeep Mestry"

Re: Sorting in solr

2016-07-11 Thread Sandeep Mestry

Hi Naveen,

I am not too sure what you're after but the sorting mechanism is applied
after search results are fetched.

>From Solr Ref Guide:
https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThesortParameter

The sort parameter *arranges search results* in either ascending (asc) or
descending (desc) order.

Thanks,
Sandeep

On 11 July 2016 at 11:13, Naveen Pajjuri  wrote:

> Hi,
> If i apply some sorting order on solr. when are the Documents sorted.
>
>1. are documents sorted after fetching the results  ?
>2. or we get sorted documents ?
>
> Regards,
> Naveen
>

Re: Many to Many Mapping with Solr

2016-05-01 Thread Sandeep Mestry

Thanks Alexandre, even I am of the opinion not to use solr rdbms way but i
am concerned about the updates to the indexes. We're expecting around 500
writes per second to the database which will generate in >500 updates to
the index per second. If the entities are denormalised this will have an
impact on performance hence I was inclined to design it like db.

Joel,
I will explain it in a bit more detail what my use cases are, all of these
should be driven by search engine:

1) user logs in and the system should display all recordings for that user
2) user adds a recording, the system is updated with the additional
recording
3) user removes a recording, the system is updated with the recording
removed.
4) when the user searches for a recording, the system should only display
matches in his recordings. Every user-recording mapping has additional
properties which are also searchable attributes.

here, we are talking about 2M users and 500M recordings and this is
currently driven by database of size ~60-80GB.

I am going to do a small poc for these use cases and I will go with
denormalised entities with search requirements as my main focus. However,
if you have anything more to add, do let me know. I will be grateful.

Many Thanks,
Sandeep


On 29 April 2016 at 14:54, Joel Bernstein  wrote:

> We really still need to know more about your use case. In particular what
> types of questions will you be asking of the data? It's useful to do this
> in plain english without mapping to any specific implementation.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Fri, Apr 29, 2016 at 9:43 AM, Alexandre Rafalovitch  >
> wrote:
>
> > You do not structure Solr to represent your database. You structure it
> > to represent what you will search.
> >
> > In your case, it sounds like you want to return 'user-records', in
> > which case you will index the related information all together. Yes,
> > you will possibly need to recreate the multiple documents when you
> > update one record (or one user). And yes, you will have the same
> > information multiple times. But you can used index-only values or
> > docvalues to reduce storage and duplication.
> >
> > You may also want to have Solr return only the relevant IDs from the
> > search and you recreate the m-to-m object structure from the database.
> > Then, you don't need to store much at all, just index.
> >
> > Basically, don't think about your database as much when deciding Solr
> > structure. It does not map one-to-one.
> >
> > Regards,
> >Alex.
> > 
> > Newsletter and resources for Solr beginners and intermediates:
> > http://www.solr-start.com/
> >
> >
> > On 29 April 2016 at 20:48, Sandeep Mestry  wrote:
> > > Hi All,
> > >
> > > Hope the day is going on well for you.
> > >
> > > This question has been asked before, but I couldn't find answer to my
> > > specific request. I have many to many relationship and the mapping
> table
> > > has additional columns. Whats the best way I can model this into solr
> > > entity?
> > >
> > > For example: a user has many recordings and a recording belongs to many
> > > users. But each user-recording has additional feature like type, number
> > etc.
> > > I'd like to fetch recordings for the user. If the user adds/ updates/
> > > deletes a recording then that should be reflected in the search.
> > >
> > > I have 2 options:
> > > 1) to create user entity, recording entity and user_recording entity
> > > - this is good but it's like treating solr like rdbms which i mostly
> > avoid..
> > >
> > > 2) user entity containing all the recordings information and each
> > recording
> > > containing user information
> > > - this has impact on index size but the fetch and manipulation will be
> > > faster.
> > >
> > > Any guidance will be good..
> > >
> > > Thanks,
> > > Sandeep
> >
>

Many to Many Mapping with Solr

2016-04-29 Thread Sandeep Mestry

Hi All,

Hope the day is going on well for you.

This question has been asked before, but I couldn't find answer to my
specific request. I have many to many relationship and the mapping table
has additional columns. Whats the best way I can model this into solr
entity?

For example: a user has many recordings and a recording belongs to many
users. But each user-recording has additional feature like type, number etc.
I'd like to fetch recordings for the user. If the user adds/ updates/
deletes a recording then that should be reflected in the search.

I have 2 options:
1) to create user entity, recording entity and user_recording entity
- this is good but it's like treating solr like rdbms which i mostly avoid..

2) user entity containing all the recordings information and each recording
containing user information
- this has impact on index size but the fetch and manipulation will be
faster.

Any guidance will be good..

Thanks,
Sandeep

Re: Newbie SolR - Need advice

2013-07-03 Thread Sandeep Mestry

+1


On 3 July 2013 14:58, Jack Krupansky  wrote:

> Design your own application layer for both indexing and query that knows
> about both SQL and Solr. Give it a REST API and then your client
> applications can talk to your REST API and not have to care about the
> details of Solr or SQL. That's the best starting point.
>
>
> -- Jack Krupansky
>
> -Original Message- From: fabio1605
> Sent: Wednesday, July 03, 2013 4:55 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Newbie SolR - Need advice
>
>
> Hi Sandeep
>
> Thank you for your reply
>
> Il have a read through the tutorials now that i understand the principle of
> all this,
>
> i would ideally like to keep mssql and bolt solr on top of this so that we
> can keep mssql as we have a 200GB database
>
> Cheers
>
>
>
> --
> View this message in context: http://lucene.472066.n3.**
> nabble.com/Newbie-SolR-Need-**advice-tp4074746p4075026.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Newbie SolR - Need advice

2013-07-02 Thread Sandeep Mestry

Hi Fabio,

Yes, you're on right track.

I'd like to now direct you to first reply from Jack to go through solr
tutorial.
Even with Solr,, it will take some time to learn various bits and pieces
about designing fields, their field types, server configuration, etc. and
then tune the results to match the results that you're currently getting
from the database. There is lots of info available for Solr on web and do
check Lucidworks' Solr Reference Guide.
http://docs.lucidworks.com/display/solr/Apache+Solr+Reference+Guide;jsessionid=16ED0DB3B6F6BE8CEC6E6CDB207DBC49

Best of Solr Luck!

Sandeep

On 2 July 2013 20:47, fabio1605  wrote:

>
> So, you keep your mssql database, you just don't use it for searches -
> that'll relieve some of the load. Searches then all go through SOLR & its
> Lucene indexes. If your various tables need SQL joins, you specify those in
> the DataImportHandler (DIH) config. That way, when SOLR indexes everything,
> it indexes the data the way you want to see it.
>
> -- SO  by this you mean we keep mssql as we do!!
>
> But we use the website to run through SOLR SOLR will then handle the
> indexing and retrieval of data from its own index's, and will make its own
> calls to our MSSQL server when required(i.e updating/adding to
> indexs..)
>
> Am I on the right tracks there now!
>
> So MSSQL becomes the datastore
> SOLR becomes the search engine...
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Newbie-SolR-Need-advice-tp4074746p4074889.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Newbie SolR - Need advice

2013-07-02 Thread Sandeep Mestry

Hi Fabio,

No, Solr isn't the database replacement for MS SQL.
Solr is built on top of Lucene which is a search engine library for text
searches.

Solr in itself is not a replacement for any database as it does not support
any relational db features, however as Jack and David mentioned its fully
optimised search engine platform that can provide all search related
features like faceting, highlighting etc.
Solr does not have a *database*. It stores the data in binary files called
indexes . These
indexes are populated with the data from the database. Solr provides an
inbuilt functionality through DataImportHandler component to get the data
and generate indexes.

When you say, your web servers are mainly doing search function, do you
mean it is a text search and you use queries with clauses as 'like', 'in'
etc. (in addition to multiple joints) to get the results? Does the web
application need faceting? If yes, then solr can be your friend to get it
through.

Do remember that it always takes some time to get the new concepts from
understanding through to implementation. As David mentioned already, it
*is* going to be a bumpy ride at the start but *definitely* a sensational
one.

Good Luck,
Sandeep

On 2 July 2013 17:09, fabio1605  wrote:

> Thanks guys
>
> So SolR is actually a database replacement for mssql...  Am I right
>
>
> We have a lot of perl scripts that contains lots of sql insert
> queries. Etc
>
>
> How do we query the SolR database from scripts  I know I have a lot to
> learn still so excuse my ignorance.
>
> Also...  What is mongo and how does it compare
>
> I just don't understand how in 10years of Web development I have never
> heard of SolR till last week
>
>
>
>
> Sent from Samsung Mobile
>
>  Original message 
> From: "David Quarterman [via Lucene]" <
> ml-node+s472066n4074772...@n3.nabble.com>
> Date: 02/07/2013  16:57  (GMT+00:00)
> To: fabio1605 
> Subject: RE: Newbie SolR - Need advice
>
> Hi Fabio,
>
> Like Jack says, try the tutorial. But to answer your question, SOLR isn't
> a bolt on to SQLServer or any other DB. It's a fantastically fast
> indexing/searching tool. You'll need to use the DataImportHandler (see the
> tutorial) to import your data from the DB into the indices that SOLR uses.
> Once in there, you'll have more power & flexibility than SQLServer would
> ever give you!
>
> Haven't tried SOLR on Windows (I guess your environment) but I'm sure
> it'll work using Jetty or Tomcat as web container.
>
> Stick with it. The ride can be bumpy but the experience is sensational!
>
> DQ
>
> -Original Message-
> From: fabio1605 [mailto:[hidden email]]
> Sent: 02 July 2013 16:16
> To: [hidden email]
> Subject: Newbie SolR - Need advice
>
> Hi
>
> we have a MSSQL Server which is just getting far to large now and
> performance is dying! the majority of our webservers mainly are doing
> search function so i thought it may be best to move to SolR But i know very
> little about it!
>
> My questions are!
>
> Does SolR Run as a bolt on to MSSQL - as in the data is still in MSSQL and
> SolR is just the search bit between?
>
> Im really struggling to understand the point of SOLR etc so if someone
> could point me to a Dummies website id apprecaite it! google is throwing to
> much confusion at me!
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Newbie-SolR-Need-advice-tp4074746.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Newbie-SolR-Need-advice-tp4074746p4074772.html
> To unsubscribe from Newbie SolR - Need advice, click here.
> NAML
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Newbie-SolR-Need-advice-tp4074746p4074782.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Dot operater issue.

2013-06-27 Thread Sandeep Mestry

Hi Sri,

This depends on how the fields (that hold the value) are defined and how
the query is generated.
Try running the query in solr console and use &debug=true to see how the
query string is getting parsed.

If that doesn't help then could you answer following 3 questions relating
to your question.

1) field definition in schema.xml
2) solr query url
3) parser config from solrconfig.xml


Thanks,
Sandeep


On 27 June 2013 10:41, Srinivasa Chegu  wrote:

> Hi team,
>
> When the user enter search term as "h.e.r.b.a.l"  in the search textbox
> and click on search button then  SOLR search engine is not returning any
>  results found. As I can see SOLR is accepting the request parameter as
> h.e.r.b.a.l. However we have many records with the string h.e.r.b.a.l as
> part of the product name.
>
> Look like there is an issue with dot operator in the search term.  If we
> enter search term as "herbal" then it is returning search results .
>
> Our requirement is search term should be "h.e.r.b.a.l" then it needs to
> display results based on dot operator .
>
> Please help us on this issue.
>
> Regards
> Srinivas
>
>
> ::DISCLAIMER::
>
> 
>
> The contents of this e-mail and any attachment(s) are confidential and
> intended for the named recipient(s) only.
> E-mail transmission is not guaranteed to be secure or error-free as
> information could be intercepted, corrupted,
> lost, destroyed, arrive late or incomplete, or may contain viruses in
> transmission. The e mail and its contents
> (with or without referred errors) shall therefore not attach any liability
> on the originator or HCL or its affiliates.
> Views or opinions, if any, presented in this email are solely those of the
> author and may not necessarily reflect the
> views or opinions of HCL or its affiliates. Any form of reproduction,
> dissemination, copying, disclosure, modification,
> distribution and / or publication of this message without the prior
> written consent of authorized representative of
> HCL is strictly prohibited. If you have received this email in error
> please delete it and notify the sender immediately.
> Before opening any email and/or attachments, please check them for viruses
> and other defects.
>
>
> 
>

Re: Solr 4.2.1 + Distribution scripts (rsync) Issue

2013-06-05 Thread Sandeep Mestry

Hi Hoss,

Thanks for your reply, Please find answers to your questions below.

*Well, for starters -- have you considered at least looking into using the java
based Replicationhandler instead of the rsync scripts?*
- There was an attempt to to implement java based replication but it was
very slow and so that option was discarded and instead rsync was used. This
was done couple of years ago and till Feb of this year, we were using Solr
1.4. I upgraded solr to 4.0 with rsync, however due to time and resource
constraint rsync alternative was not evaluated and it can't be done even
today - only in next release, we'll be doing solrcloud.

My setup looks like below - this was working correctly with Solr 1.4, Solr
4.0 versions.

1) Index Feeder applications feeds indexes to indexer boxes.
2) A cron job that runs every minute on indexer boxes (commiter), commits
the indexes (commit) and invokes snapshooter to create snapshot. rsync
daemon running on indexer boxes.
3) Another cron job runs on search boxes every minute, which pulls the
snapshot (using snappuller), installs it on search boxes (snapinstaller)
which also notifies search to open a new searcher (commit)

Additionally, there is a cron job that runs every morning at 4 am on
indexer boxes which optimises the index (optimize) and cleans the snapshots
until a day (snapcleaner).
This is as per http://wiki.apache.org/solr/SolrCollectionDistributionScripts

*Which config is this, your indexer or your searcher? (i'm assuming it's the
searcher since i don't see any postCommit commands to exec snapshooter but
i wanted to sanity check that wasn't a simple explanation for your problem)*
- Because of this set up, I do not have any post commit setup in
solrconfig.xml.
- This solrconfig.xml is used for both indexer and searcher boxes.

I can see that after my upgrade to Solr 4.2.1, all these scripts behave
normally just that I do not see the updates getting refreshed on search
boxes unless I restart.
*
*
*What exactly does your "manual commit" command look like?  *
- This is by using commit script under bin directory (commit -h localhost
-p 8983)
- I have also tried URL based commit as you had mentioned but no luck

*Are you doing this on the indexer box or the searcher boxes? *
- I executed manual commit on searcher boxes, the indexer boxes do show the
commit and updates correctly.

*what is the HTTP response from this comment? what do the logs show when
you do this?
*
- I have attached the logs, please note that I have enabled the
openSearcher for testing.

Thanks, please let me know if I'm missing something. I remembered people
not getting their deletes and the workaround was to add _version_ field in
schema, which I had done but no luck. I know it might be unrelated but I am
just trying all my options.

Thanks again,
Sandeep


On 5 June 2013 00:41, Chris Hostetter  wrote:

>
> : However, we haven't yet implemented SolrCloud and still relying on
> : distribution scripts - rsync, indexpuller mechanism.
>
> Well, for starters -- have you considered at least looking into using hte
> java based Replicationhandler instead of the rsync scripts?
>
> Script based replication has not been actively maintained since java
> replication was added back in Solr 1.4!
>
> : I see that the indexes are getting created on indexer boxes, snapshots
> : being created and then pulled across to search boxes. The snapshots are
> : getting installed on search boxes as well. There are no errors in the
> : scripts logs and this process works well.
> : However, when I check the update in solr console (on search boxes), I do
> : not see the updated result. The updates do not appear in search boxes
> even
> : after manual commit. Only after a *restart* of the search application
> : (deployed in tomcat) I can see the updated results.
>
> What exactly does your "manual commit" command look like?  Are you
> doing this on the indexer box or the searcher boxes?  what is the HTTP
> response from this comment? what do the logs show when you do this?
>
> It's possible that some internal changes in Solr relating to NRT
> improvements may have optimized away re-opening on commit if solr doesn't
> think the index has changed -- but i doubt it.  because I just tried a
> simple test using the 4.3.0 example where i manually simulated
> snapinstaller replacing hte index files with a newer index and issued
> "http://localhost:8983/solr/update?commit=true"; and solr loaded up that
> new index and started searching it -- so i suspect the devil is in the
> details of your setup.
>
> you're sure each of the snapshooter, snappuller, snapinstaller scripts are
> executing properly?
>
> : I have done minimal changes for the upgrade in solrconfig.xml and is
> pasted
> : below. Please can someone take a look and let me know what the issue is.
> : The same config was working fine on Solr 4.0 (as well as Solr 1.4.1).
>
> which config is this, your indexer or your searcher? (i'm assuming it's
> the searcher since i don't see a

Solr 4.2.1 + Distribution scripts (rsync) Issue

2013-06-04 Thread Sandeep Mestry

Dear All,

*Background:*
I have recently upgraded solr from 4.0 to 4.2.1 and have re-indexed all the
data. All good so far, we received better query time, lesser index size and
now it looks all shiny and nice.
However, we haven't yet implemented SolrCloud and still relying on
distribution scripts - rsync, indexpuller mechanism.

*Issue:*
I see that the indexes are getting created on indexer boxes, snapshots
being created and then pulled across to search boxes. The snapshots are
getting installed on search boxes as well. There are no errors in the
scripts logs and this process works well.
However, when I check the update in solr console (on search boxes), I do
not see the updated result. The updates do not appear in search boxes even
after manual commit. Only after a *restart* of the search application
(deployed in tomcat) I can see the updated results.
I have done minimal changes for the upgrade in solrconfig.xml and is pasted
below. Please can someone take a look and let me know what the issue is.
The same config was working fine on Solr 4.0 (as well as Solr 1.4.1).

Thanks,
Sandeep
p.s: We'll be upgrading to SolrCloud in the next release of the project but
this release will be managed with only Solr 4.2.1 upgrade.


--
 solrconfig.xml
--



  LUCENE_42


${solr.abortOnConfigurationError:true}

  ${solr.data.dir:./solr/data}

  

  
14
10
32
1


  1
  0

  

  

  50
  9
  false


  50
  9
  false

  

  
1024





true

20

100


  

  (*:*)
  AND
  0
  10
  standard

  



  

  (*:*)
  AND
  0
  10
  standard

  


false

2
  

  



  

  

  explicit

  

  
  

  
  

  
  

  standard
  *:*
  all

  

  

  explicit
  true

  

  
5
  

  
*:*

Re: Solr Faceting doesn't return values.

2013-05-23 Thread Sandeep Mestry

*org.apache.solr.search.SyntaxError: Cannot parse
'*mm_state_code:(**TX)*': Encountered " ":" ": "" at line 1, column 14.
Was expecting one of:*

This suggests to me that you kept the df parameter in the query hence it
was forming mm_state_code:mm_state_code:(TX), can you try exactly they way
I gave you - i.e. without the df parameter?
Also, can you post schema.xml and /select handler config from
solrconfig.xml?


On 22 May 2013 18:36, samabhiK  wrote:

> When I use your query, I get :
>
> 
> 
>
> 
>   400
>   12
>   
> true
> mm_state_code
> true
> *mm_state_code:(**TX)*
> 1369244078714
> all
> sa_site_city
> xml
>   
> 
> 
>   org.apache.solr.search.SyntaxError: Cannot parse
> '*mm_state_code:(**TX)*': Encountered " ":" ": "" at line 1, column 14.
> Was expecting one of:
> 
>  ...
>  ...
>  ...
> "+" ...
> "-" ...
>  ...
> "(" ...
> "*" ...
> "^" ...
>  ...
>  ...
>  ...
>  ...
>  ...
>  ...
> "[" ...
> "{" ...
>  ...
>  ...
> 
>   400
> 
> 
>
> Not sure why the data wont show up. Almost all the records has the field
> sa_site_city has data and is also indexed. :(
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Faceting-doesn-t-return-values-tp4065276p4065406.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Solr Faceting doesn't return values.

2013-05-22 Thread Sandeep Mestry

>From the response you've mentioned it appears to me that the query term TX
is searched against sa_site_city instead of mm_state_code.
Can you try your query like below:

http://xx.xx.xx.xx/solr/collection1/select?q=*mm_state_code:(**TX)*
&wt=xml&indent=true&facet=true&facet.field=sa_site_city&debug=all

and post your output?

On 22 May 2013 17:13, samabhiK  wrote:

> sa_site_city

Re: filter query by string length or word count?

2013-05-22 Thread Sandeep Mestry

I doubt if there is any straight out of the box feature that supports this
requirement, you will probably need to handle this at the index time.
You can play around with Function Queries
http://wiki.apache.org/solr/FunctionQuery for any such feature.

On 22 May 2013 16:37, Sam Lee  wrote:

> I have schema.xml
>  omitNorms="true"/>
> ...
>  positionIncrementGap="100">
> 
> 
> 
>  ignoreCase="true"
> words="stopwords_en.txt"
> enablePositionIncrements="true"
> />
> 
> 
>  protected="protwords.txt"/>
> 
> 
> 
> 
>  ignoreCase="true" expand="true"/>
>  ignoreCase="true"
> words="stopwords_en.txt"
> enablePositionIncrements="true"
> />
> 
> 
>  protected="protwords.txt"/>
> 
> 
> 
>
>
> how can I query docs whose body has more than 80 words (or 80 characters) ?
>

Re: Solr Faceting doesn't return values.

2013-05-22 Thread Sandeep Mestry

Hi There,

Not sure I understand your problem correctly, but is 'mm_state_code' a real
value or is it field name?
Also, as Erick pointed out above, the facets are not calculated if there
are no results. Hence you get no facets.

You have mentioned which facets you want but you haven't mentioned which
field you want to search against. That field should be defined in df
parameter instead of sa_property_id.

Can you post example solr document you're indexing?

-Sandeep

On 22 May 2013 14:28, samabhiK  wrote:

> Ok my bad.
>
> I do have a default field defined in the /select handler in the config
> file.
>
> 
>explicit
>10
>sa_property_id
> 
>
> But then how do I change my query now?
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Faceting-doesn-t-return-values-tp4065276p4065298.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core

2013-05-22 Thread Sandeep Mestry

Thanks Erick for your suggestion.

Turns out I won't be going that route after all as the highlighter
component is quite complicated - to follow and to override - and not much
time left in hand so did it the manual (dirty) way.

Beat Regards,
Sandeep


On 22 May 2013 12:21, Erick Erickson  wrote:

> Sandeep:
>
> You need to be a little careful here, I second Shawn's comment that
> you are mixing versions. You say you are using solr 4.0. But the jar
> that ships with that is apache-solr-core-4.0.0.jar. Then you talk
> about using solr-core, which is called solr-core-4.1.jar.
>
> Maven is not officially supported, so grabbing some solr-core.jar
> (with no apache) and doing _anything_ with it from a 4.0 code base is
> not a good idea.
>
> You can check out the 4.0 code branch and just compile the whole
> thing. Or you can get a new 4.0 distro and use the jars there. But I'd
> be _really_ cautious about using a 4.1 or later jar with 4.0.
>
> FWIW,
> Erick
>
> On Tue, May 21, 2013 at 12:05 PM, Sandeep Mestry 
> wrote:
> > Thanks Steve,
> >
> > I could find solr-core.jar in the repo but could not find
> > apache-solr-core.jar.
> > I think my issue got misunderstood - which is totally my fault.
> >
> > Anyway, I took into account Shawn's comment and will use solr-core.jar
> only
> > for compiling the project - not for deploying.
> >
> > Thanks,
> > Sandeep
> >
> >
> > On 21 May 2013 16:46, Steve Rowe  wrote:
> >
> >> The 4.0 solr-core jar is available in Maven Central: <
> >>
> http://search.maven.org/#artifactdetails%7Corg.apache.solr%7Csolr-core%7C4.0.0%7Cjar
> >> >
> >>
> >> Steve
> >>
> >> On May 21, 2013, at 11:26 AM, Sandeep Mestry 
> wrote:
> >>
> >> > Hi Steve,
> >> >
> >> > Solr 4.0 - mentioned in the subject.. :-)
> >> >
> >> > Thanks,
> >> > Sandeep
> >> >
> >> >
> >> > On 21 May 2013 14:58, Steve Rowe  wrote:
> >> >
> >> >> Sandeep,
> >> >>
> >> >> What version of Solr are you using?
> >> >>
> >> >> Steve
> >> >>
> >> >> On May 21, 2013, at 6:55 AM, Sandeep Mestry 
> >> wrote:
> >> >>
> >> >>> Hi Shawn,
> >> >>>
> >> >>> Thanks for your reply.
> >> >>>
> >> >>> I'm not mixing versions.
> >> >>> The problem I faced is I want to override Highlighter from solr-core
> >> jar
> >> >>> and if I add that as a dependency in my project then there was a
> clash
> >> >>> between solr-core.jar and the apache-solr-core.jar that comes
> bundled
> >> >>> within the solr distribution. It was complaining about
> >> >> MorfologikFilterFactory
> >> >>> classcastexception.
> >> >>> I can't use apache-solr-core.jar as a dependency as no such jar
> exists
> >> in
> >> >>> any maven repo.
> >> >>>
> >> >>> The only thing I could do is to remove apache-solr-core.jar from
> >> solr.war
> >> >>> and then use solr-core.jar as a dependency - however I do not think
> >> this
> >> >> is
> >> >>> the ideal solution.
> >> >>>
> >> >>> Thanks,
> >> >>> Sandeep
> >> >>>
> >> >>>
> >> >>> On 20 May 2013 15:18, Shawn Heisey  wrote:
> >> >>>
> >> >>>> On 5/20/2013 8:01 AM, Sandeep Mestry wrote:
> >> >>>>> And I do remember the discussion on the forum about dropping the
> name
> >> >>>>> *apache* from solr jars. If that's what caused this issue, then
> can
> >> you
> >> >>>>> tell me if the mirrors need updating with solr-core.jar instead of
> >> >>>>> apache-solr-core.jar?
> >> >>>>
> >> >>>> If it's named apache-solr-core, then it's from 4.0 or earlier.  If
> >> it's
> >> >>>> named solr-core, then it's from 4.1 or later.  That might mean that
> >> you
> >> >>>> are mixing versions - don't do that.  Make sure that you have jars
> >> from
> >> >>>> the exact same version as your server.
> >> >>>>
> >> >>>> Thanks,
> >> >>>> Shawn
> >> >>>>
> >> >>>>
> >> >>
> >> >>
> >>
> >>
>

Re: Boosting Documents

2013-05-22 Thread Sandeep Mestry

I'm running out of options now, can't really see the issue you're facing
unless the debug analysis is posted.
I think a thorough debugging is required from both application and solr
level.

If you want a customize scoring from Solr, you can also consider overriding
DefaultSimilarity implementation - but that'll be a separate issue.


On 22 May 2013 11:32, Oussama Jilal  wrote:

> Yes I did debug it and there is nothing special about it, everything is
> treated the same,
>
> My Solr version is 4.2
>
> The copy field is used because the 2 field are of different types but only
> one value is indexed in them (so no multiValue is required and it works
> perfectly).
>
>
>
>
> On 05/22/2013 11:18 AM, Sandeep Mestry wrote:
>
>> Did you use the debugQuery=true in solr console to see how the query is
>> being interpreted and the result calculation?
>>
>> Also, I'm not sure but this copyfield directive seems a bit confusing to
>> me..
>> 
>> Because multiValued is false for Suggestion field so does that schema mean
>> Suggestion has value only from Id and not from any other input?
>>
>> You haven't mentioned the version of Solr, can you also post the query
>> params?
>>
>>
>>
>> On 22 May 2013 11:04, Oussama Jilal  wrote:
>>
>>  I don't know if this can help (since the document boost should be
>>> independent of any schema) but here is my schema :
>>>
>>> |
>>> 
>>>  
>>>  >>   sortMissingLast="true"  />
>>>  >>   sortMissingLast="true"  precisionStep="0"  positionIncrementGap="0"  />
>>>  >>   sortMissingLast="true"  omitNorms="true">
>>>  
>>>  >>   />
>>>  >>  />
>>>  >>
>>>   maxGramSize="255"  />
>>>  
>>>  
>>>  >>   />
>>>  >>  />
>>>
>>>  
>>>  
>>>  
>>>  
>>>  >>   stored="true"  multiValued="false"  required="true"  />
>>>  >>   stored="true"  multiValued="false"  required="false"  />
>>>  >>   stored="true"  multiValued="false"  required="true"  />
>>>  >>   stored="true"  multiValued="true"  required="false"  />
>>>  >>   stored="true"/>
>>>  
>>>  
>>>  Id
>>>  Suggestion
>>>
>>> |
>>>
>>> My query is somthing like : Suggestion:"Olive Oil".
>>>
>>> The result is 9 documents, wich all has the same score "11.287682", even
>>> if they had been indexed with different boosts (I am sure of this).
>>>
>>>
>>>
>>>
>>> On 05/22/2013 10:54 AM, Sandeep Mestry wrote:
>>>
>>>  I think that is applicable only for the field level boosting and not at
>>>> document level boosting.
>>>>
>>>> Can you post your query, field definition and results you're expecting.
>>>>
>>>> I am using index and query time boosting without any issues so far. also
>>>> which version of Solr you're using?
>>>>
>>>>
>>>> On 22 May 2013 10:44, Oussama Jilal  wrote:
>>>>
>>>>   I don't know if this is the issue or not but, concidering this note
>>>> from
>>>>
>>>>> the wiki :
>>>>>
>>>>> NOTE: make sure norms are enabled (omitNorms="false" in the schema.xml)
>>>>> for any fields where the index-time boost should be stored.
>>>>>
>>>>> In my case where I only need to boost the whole document (not a
>>>>> specific
>>>>> field), do I have to activate the << omitNorms="false" >> for all the
>>>>> fields in the schema ?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 05/22/2013 10:41 AM, Oussama Ji

Re: Boosting Documents

2013-05-22 Thread Sandeep Mestry

Did you use the debugQuery=true in solr console to see how the query is
being interpreted and the result calculation?

Also, I'm not sure but this copyfield directive seems a bit confusing to
me..

Because multiValued is false for Suggestion field so does that schema mean
Suggestion has value only from Id and not from any other input?

You haven't mentioned the version of Solr, can you also post the query
params?



On 22 May 2013 11:04, Oussama Jilal  wrote:

> I don't know if this can help (since the document boost should be
> independent of any schema) but here is my schema :
>
>|
>
> 
>   sortMissingLast="true"  />
>   sortMissingLast="true"  precisionStep="0"  positionIncrementGap="0"  />
>   sortMissingLast="true"  omitNorms="true">
> 
>   />
> 
>   maxGramSize="255"  />
> 
> 
>   />
> 
> 
> 
> 
> 
>   stored="true"  multiValued="false"  required="true"  />
>   stored="true"  multiValued="false"  required="false"  />
>   stored="true"  multiValued="false"  required="true"  />
>   stored="true"  multiValued="true"  required="false"  />
>   stored="true"/>
>     
> 
> Id
> **Suggestion
>|
>
> My query is somthing like : Suggestion:"Olive Oil".
>
> The result is 9 documents, wich all has the same score "11.287682", even
> if they had been indexed with different boosts (I am sure of this).
>
>
>
>
> On 05/22/2013 10:54 AM, Sandeep Mestry wrote:
>
>> I think that is applicable only for the field level boosting and not at
>> document level boosting.
>>
>> Can you post your query, field definition and results you're expecting.
>>
>> I am using index and query time boosting without any issues so far. also
>> which version of Solr you're using?
>>
>>
>> On 22 May 2013 10:44, Oussama Jilal  wrote:
>>
>>  I don't know if this is the issue or not but, concidering this note from
>>> the wiki :
>>>
>>> NOTE: make sure norms are enabled (omitNorms="false" in the schema.xml)
>>> for any fields where the index-time boost should be stored.
>>>
>>> In my case where I only need to boost the whole document (not a specific
>>> field), do I have to activate the << omitNorms="false" >> for all the
>>> fields in the schema ?
>>>
>>>
>>>
>>>
>>> On 05/22/2013 10:41 AM, Oussama Jilal wrote:
>>>
>>>  Thank you Sandeep,
>>>>
>>>> I did post the document like that (a minor difference is that I did not
>>>> add the boost to the field since I don't want to boost on specific
>>>> field, I
>>>> boosted the whole document '  '), but the
>>>> issue
>>>> is that everything in the queries results has the same score even if
>>>> they
>>>> had been indexed with different boosts, and I can't sort on another
>>>> field
>>>> since this is independent from any field value.
>>>>
>>>> Any ideas ?
>>>>
>>>> On 05/22/2013 10:30 AM, Sandeep Mestry wrote:
>>>>
>>>>  Hi Oussama,
>>>>>
>>>>> This is explained very nicely on Solr Wiki..
>>>>> http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts<http://wiki.apache.org/solr/**SolrRelevancyFAQ#index-time_**boosts>
>>>>> <http://wiki.apache.org/**solr/SolrRelevancyFAQ#index-**time_boosts<http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts>
>>>>> >
>>>>> http://wiki.apache.org/solr/UpdateXmlMessages#Optional_**<http://wiki.apache.org/solr/**UpdateXmlMessages#Optional_**>
>>>>> attributes_for_.22add.22>>>> UpdateXmlMessages#Optional_**attributes_for_.22add.22<http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22>
>>>>> >
>>>>>
>>>>>
>>>>> All

Re: Boosting Documents

2013-05-22 Thread Sandeep Mestry

I think that is applicable only for the field level boosting and not at
document level boosting.

Can you post your query, field definition and results you're expecting.

I am using index and query time boosting without any issues so far. also
which version of Solr you're using?


On 22 May 2013 10:44, Oussama Jilal  wrote:

> I don't know if this is the issue or not but, concidering this note from
> the wiki :
>
> NOTE: make sure norms are enabled (omitNorms="false" in the schema.xml)
> for any fields where the index-time boost should be stored.
>
> In my case where I only need to boost the whole document (not a specific
> field), do I have to activate the << omitNorms="false" >> for all the
> fields in the schema ?
>
>
>
>
> On 05/22/2013 10:41 AM, Oussama Jilal wrote:
>
>> Thank you Sandeep,
>>
>> I did post the document like that (a minor difference is that I did not
>> add the boost to the field since I don't want to boost on specific field, I
>> boosted the whole document '  '), but the issue
>> is that everything in the queries results has the same score even if they
>> had been indexed with different boosts, and I can't sort on another field
>> since this is independent from any field value.
>>
>> Any ideas ?
>>
>> On 05/22/2013 10:30 AM, Sandeep Mestry wrote:
>>
>>> Hi Oussama,
>>>
>>> This is explained very nicely on Solr Wiki..
>>> http://wiki.apache.org/solr/**SolrRelevancyFAQ#index-time_**boosts<http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts>
>>> http://wiki.apache.org/solr/**UpdateXmlMessages#Optional_**
>>> attributes_for_.22add.22<http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22>
>>>
>>> All you need to do is something similar to below..
>>>
>>> -
>>>
>>>   05991
>>>Bridgewater 
>>>
>>>
>>> What is not clear from your message is whether you need better scoring or
>>> better sorting. so, additionally, you can consider adding a secondary
>>> sort
>>> parameter for the docs having the same score.
>>> http://wiki.apache.org/solr/**CommonQueryParameters#sort<http://wiki.apache.org/solr/CommonQueryParameters#sort>
>>>
>>>
>>> HTH,
>>> Sandeep
>>>
>>>
>>> On 22 May 2013 09:21, Oussama Jilal  wrote:
>>>
>>>  Thank you for your reply bbarani,
>>>>
>>>> I can't do that because I want to boost some documents over others,
>>>> independing of the query.
>>>>
>>>>
>>>> On 05/21/2013 05:41 PM, bbarani wrote:
>>>>
>>>>Why don't you boost during query time?
>>>>>
>>>>> Something like q=superman&qf=title^2 subject
>>>>>
>>>>> You can refer: 
>>>>> http://wiki.apache.org/solr/SolrRelevancyFAQ<http://wiki.apache.org/solr/**SolrRelevancyFAQ>
>>>>> <http://wiki.**apache.org/solr/**SolrRelevancyFAQ<http://wiki.apache.org/solr/SolrRelevancyFAQ>
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context: http://lucene.472066.n3.**
>>>>> nabble.com/Boosting-Documents-tp4064955p4064966.html<http://nabble.com/Boosting-Documents-**tp4064955p4064966.html>
>>>>> >>>> tp4064955p4064966.html<http://lucene.472066.n3.nabble.com/Boosting-Documents-tp4064955p4064966.html>>
>>>>>
>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>
>>
>

Re: Boosting Documents

2013-05-22 Thread Sandeep Mestry

Hi Oussama,

This is explained very nicely on Solr Wiki..
http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts
http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22

All you need to do is something similar to below..

   -

 05991
  Bridgewater  

What is not clear from your message is whether you need better scoring or
better sorting. so, additionally, you can consider adding a secondary sort
parameter for the docs having the same score.
http://wiki.apache.org/solr/CommonQueryParameters#sort

HTH,
Sandeep

On 22 May 2013 09:21, Oussama Jilal  wrote:

> Thank you for your reply bbarani,
>
> I can't do that because I want to boost some documents over others,
> independing of the query.
>
>
> On 05/21/2013 05:41 PM, bbarani wrote:
>
>>  Why don't you boost during query time?
>>
>> Something like q=superman&qf=title^2 subject
>>
>> You can refer: 
>> http://wiki.apache.org/solr/**SolrRelevancyFAQ
>>
>>
>>
>> --
>> View this message in context: http://lucene.472066.n3.**
>> nabble.com/Boosting-Documents-**tp4064955p4064966.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>

Re: Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core

2013-05-21 Thread Sandeep Mestry

Thanks Steve,

I could find solr-core.jar in the repo but could not find
apache-solr-core.jar.
I think my issue got misunderstood - which is totally my fault.

Anyway, I took into account Shawn's comment and will use solr-core.jar only
for compiling the project - not for deploying.

Thanks,
Sandeep


On 21 May 2013 16:46, Steve Rowe  wrote:

> The 4.0 solr-core jar is available in Maven Central: <
> http://search.maven.org/#artifactdetails%7Corg.apache.solr%7Csolr-core%7C4.0.0%7Cjar
> >
>
> Steve
>
> On May 21, 2013, at 11:26 AM, Sandeep Mestry  wrote:
>
> > Hi Steve,
> >
> > Solr 4.0 - mentioned in the subject.. :-)
> >
> > Thanks,
> > Sandeep
> >
> >
> > On 21 May 2013 14:58, Steve Rowe  wrote:
> >
> >> Sandeep,
> >>
> >> What version of Solr are you using?
> >>
> >> Steve
> >>
> >> On May 21, 2013, at 6:55 AM, Sandeep Mestry 
> wrote:
> >>
> >>> Hi Shawn,
> >>>
> >>> Thanks for your reply.
> >>>
> >>> I'm not mixing versions.
> >>> The problem I faced is I want to override Highlighter from solr-core
> jar
> >>> and if I add that as a dependency in my project then there was a clash
> >>> between solr-core.jar and the apache-solr-core.jar that comes bundled
> >>> within the solr distribution. It was complaining about
> >> MorfologikFilterFactory
> >>> classcastexception.
> >>> I can't use apache-solr-core.jar as a dependency as no such jar exists
> in
> >>> any maven repo.
> >>>
> >>> The only thing I could do is to remove apache-solr-core.jar from
> solr.war
> >>> and then use solr-core.jar as a dependency - however I do not think
> this
> >> is
> >>> the ideal solution.
> >>>
> >>> Thanks,
> >>> Sandeep
> >>>
> >>>
> >>> On 20 May 2013 15:18, Shawn Heisey  wrote:
> >>>
> >>>> On 5/20/2013 8:01 AM, Sandeep Mestry wrote:
> >>>>> And I do remember the discussion on the forum about dropping the name
> >>>>> *apache* from solr jars. If that's what caused this issue, then can
> you
> >>>>> tell me if the mirrors need updating with solr-core.jar instead of
> >>>>> apache-solr-core.jar?
> >>>>
> >>>> If it's named apache-solr-core, then it's from 4.0 or earlier.  If
> it's
> >>>> named solr-core, then it's from 4.1 or later.  That might mean that
> you
> >>>> are mixing versions - don't do that.  Make sure that you have jars
> from
> >>>> the exact same version as your server.
> >>>>
> >>>> Thanks,
> >>>> Shawn
> >>>>
> >>>>
> >>
> >>
>
>

Re: Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core

2013-05-21 Thread Sandeep Mestry

Hi Steve,

Solr 4.0 - mentioned in the subject.. :-)

Thanks,
Sandeep


On 21 May 2013 14:58, Steve Rowe  wrote:

> Sandeep,
>
> What version of Solr are you using?
>
> Steve
>
> On May 21, 2013, at 6:55 AM, Sandeep Mestry  wrote:
>
> > Hi Shawn,
> >
> > Thanks for your reply.
> >
> > I'm not mixing versions.
> > The problem I faced is I want to override Highlighter from solr-core jar
> > and if I add that as a dependency in my project then there was a clash
> > between solr-core.jar and the apache-solr-core.jar that comes bundled
> > within the solr distribution. It was complaining about
> MorfologikFilterFactory
> > classcastexception.
> > I can't use apache-solr-core.jar as a dependency as no such jar exists in
> > any maven repo.
> >
> > The only thing I could do is to remove apache-solr-core.jar from solr.war
> > and then use solr-core.jar as a dependency - however I do not think this
> is
> > the ideal solution.
> >
> > Thanks,
> > Sandeep
> >
> >
> > On 20 May 2013 15:18, Shawn Heisey  wrote:
> >
> >> On 5/20/2013 8:01 AM, Sandeep Mestry wrote:
> >>> And I do remember the discussion on the forum about dropping the name
> >>> *apache* from solr jars. If that's what caused this issue, then can you
> >>> tell me if the mirrors need updating with solr-core.jar instead of
> >>> apache-solr-core.jar?
> >>
> >> If it's named apache-solr-core, then it's from 4.0 or earlier.  If it's
> >> named solr-core, then it's from 4.1 or later.  That might mean that you
> >> are mixing versions - don't do that.  Make sure that you have jars from
> >> the exact same version as your server.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>
>

Re: Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core

2013-05-21 Thread Sandeep Mestry

Hi Shawn,

Thanks for your reply.

I'm not mixing versions.
The problem I faced is I want to override Highlighter from solr-core jar
and if I add that as a dependency in my project then there was a clash
between solr-core.jar and the apache-solr-core.jar that comes bundled
within the solr distribution. It was complaining about MorfologikFilterFactory
classcastexception.
I can't use apache-solr-core.jar as a dependency as no such jar exists in
any maven repo.

The only thing I could do is to remove apache-solr-core.jar from solr.war
and then use solr-core.jar as a dependency - however I do not think this is
the ideal solution.

Thanks,
Sandeep

On 20 May 2013 15:18, Shawn Heisey  wrote:

> On 5/20/2013 8:01 AM, Sandeep Mestry wrote:
> > And I do remember the discussion on the forum about dropping the name
> > *apache* from solr jars. If that's what caused this issue, then can you
> > tell me if the mirrors need updating with solr-core.jar instead of
> > apache-solr-core.jar?
>
> If it's named apache-solr-core, then it's from 4.0 or earlier.  If it's
> named solr-core, then it's from 4.1 or later.  That might mean that you
> are mixing versions - don't do that.  Make sure that you have jars from
> the exact same version as your server.
>
> Thanks,
> Shawn
>
>

Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core

2013-05-20 Thread Sandeep Mestry

Hi All,

I want to override a component from solr-core and for that I need solr-core
jar.

I am using the solr.war that comes from Apache mirror and if I open the
war, I see the solr-core jar is actually named as apache-solr-core.jar.
This is also true about solrj jar.

If I now provide a dependency in my module for apache-solr-core.jar, it's
not being found in the mirror. And if I use solr-core.jar, I get strange
class cast exception during Solr startup for MorfologikFilterFactory.

(I'm not using this factory at all in my project.)

at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: java.lang.ClassCastException: class
org.apache.lucene.analysis.morfologik.MorfologikFilterFactory
at java.lang.Class.asSubclass(Unknown Source)
at org.apache.lucene.util.SPIClassIterator.next(SPIClassIterator.java:126)
at
org.apache.lucene.analysis.util.AnalysisSPILoader.reload(AnalysisSPILoader.java:73)
at
org.apache.lucene.analysis.util.AnalysisSPILoader.(AnalysisSPILoader.java:55)

I tried manually removing the apache-solr-core.jar from the solr
distribution war and then providing the dependency and everything worked
fine.

And I do remember the discussion on the forum about dropping the name
*apache* from solr jars. If that's what caused this issue, then can you
tell me if the mirrors need updating with solr-core.jar instead of
apache-solr-core.jar?

Many Thanks,
Sandeep

Re: Highlight only when all keywords match

2013-05-20 Thread Sandeep Mestry

Thanks Upayavira for that valuable suggestion.

I believe overriding highlight component should be the way forward.
Could you tell me if there is any existing example or which methods I
should particularly override?

Thanks,
Sandeep


On 20 May 2013 12:47, Upayavira  wrote:

> If you are saying that you want to change highlighting behaviour, not
> query behaviour, then I suspect you are going to have to interact with
> the java HighlightComponent. If you can work out how to update that
> component to behave as you wish, you could either subclass it, or create
> your own implementation that you can include in your Solr setup. Or, if
> you make it generic enough, offer it back as a contribution that can be
> included in future Solr releases.
>
> Upayavira
>
> On Mon, May 20, 2013, at 12:14 PM, Sandeep Mestry wrote:
> > I doubt if that will be the correct approach as it will be hard to
> > generate
> > the query grammar considering we have support for phrase, operator,
> > wildcard and group queries.
> > That's why I have kept it simple and only passing the query text with
> > minimal parsing (escaping lucene special characters) to configured
> > edismax.
> > The number of fields I have mentioned above are a lot lesser than the
> > actual number of fields - around 50 in number :-). So forming such a long
> > query will both be time and resource consuming. Further, it's not going
> > to
> > fulfill my requirement anyway because I do not want to change my search
> > results, the requirement is only to provide a highlight if a field is
> > matched for all the query terms.
> >
> > Thanks,
> > Sandeep
> >
> >
> > On 20 May 2013 12:02, Jaideep Dhok  wrote:
> >
> > > If you know all fields that need to be queried, you can rewrite it as -
> > > (assuming, f1, f2 are the fields that you have to search)
> > > (f1:kw1 AND f1:kw2 ... f1:kwn) OR (f2:kw1 AND f2:kw2 ... f2:kwn)
> > >
> > > -
> > > Jaideep
> > >
> > >
> > > On Mon, May 20, 2013 at 4:22 PM, Sandeep Mestry 
> > > wrote:
> > >
> > > > Hi Jaideep,
> > > >
> > > > The edismax config I have posted mentioned that the default operator
> is
> > > > AND. I am sorry if I was not clear in my previous mail, what I need
> > > really
> > > > is highlight a field when all search query terms present. The current
> > > > highlighter works for *any* of the terms match and not for *all*
> terms
> > > > match.
> > > >
> > > > Thanks,
> > > > Sandeep
> > > >
> > > >
> > > > On 20 May 2013 11:40, Jaideep Dhok  wrote:
> > > >
> > > > > Sandeep,
> > > > > If you AND all keywords, that should be OK?
> > > > >
> > > > > Thanks
> > > > > Jaideep
> > > > >
> > > > >
> > > > > On Mon, May 20, 2013 at 3:44 PM, Sandeep Mestry <
> sanmes...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Dear All,
> > > > > >
> > > > > > I have a requirement to highlight a field only when all keywords
> > > > entered
> > > > > > match. This also needs to support phrase, operator or wildcard
> > > queries.
> > > > > > I'm using Solr 4.0 with edismax because the search needs to be
> > > carried
> > > > > out
> > > > > > on multiple fields.
> > > > > > I know with highlighting feature I can configure a field to
> indicate
> > > a
> > > > > > match, however I do not find a setting to highlight only if all
> > > > keywords
> > > > > > match. That makes me think is that the right approach to take?
> Can
> > > you
> > > > > > please guide me in right direction?
> > > > > >
> > > > > > The edsimax config looks like below:
> > > > > >
> > > > > > 
> > > > > > 
> > > > > > edismax
> > > > > > explicit
> > > > > > 0.01
> > > > > > title^10 description^5 annotations^3 notes^2
> > > > > > categories
> > > > > > title
> > > > > > 0
> > > > > > *:*
> > > > > > *,score
> > > > > > 100%
> > > > > > AND
> > > > > > score desc
> > > > > > true

Re: Highlight only when all keywords match

2013-05-20 Thread Sandeep Mestry

I doubt if that will be the correct approach as it will be hard to generate
the query grammar considering we have support for phrase, operator,
wildcard and group queries.
That's why I have kept it simple and only passing the query text with
minimal parsing (escaping lucene special characters) to configured edismax.
The number of fields I have mentioned above are a lot lesser than the
actual number of fields - around 50 in number :-). So forming such a long
query will both be time and resource consuming. Further, it's not going to
fulfill my requirement anyway because I do not want to change my search
results, the requirement is only to provide a highlight if a field is
matched for all the query terms.

Thanks,
Sandeep


On 20 May 2013 12:02, Jaideep Dhok  wrote:

> If you know all fields that need to be queried, you can rewrite it as -
> (assuming, f1, f2 are the fields that you have to search)
> (f1:kw1 AND f1:kw2 ... f1:kwn) OR (f2:kw1 AND f2:kw2 ... f2:kwn)
>
> -
> Jaideep
>
>
> On Mon, May 20, 2013 at 4:22 PM, Sandeep Mestry 
> wrote:
>
> > Hi Jaideep,
> >
> > The edismax config I have posted mentioned that the default operator is
> > AND. I am sorry if I was not clear in my previous mail, what I need
> really
> > is highlight a field when all search query terms present. The current
> > highlighter works for *any* of the terms match and not for *all* terms
> > match.
> >
> > Thanks,
> > Sandeep
> >
> >
> > On 20 May 2013 11:40, Jaideep Dhok  wrote:
> >
> > > Sandeep,
> > > If you AND all keywords, that should be OK?
> > >
> > > Thanks
> > > Jaideep
> > >
> > >
> > > On Mon, May 20, 2013 at 3:44 PM, Sandeep Mestry 
> > > wrote:
> > >
> > > > Dear All,
> > > >
> > > > I have a requirement to highlight a field only when all keywords
> > entered
> > > > match. This also needs to support phrase, operator or wildcard
> queries.
> > > > I'm using Solr 4.0 with edismax because the search needs to be
> carried
> > > out
> > > > on multiple fields.
> > > > I know with highlighting feature I can configure a field to indicate
> a
> > > > match, however I do not find a setting to highlight only if all
> > keywords
> > > > match. That makes me think is that the right approach to take? Can
> you
> > > > please guide me in right direction?
> > > >
> > > > The edsimax config looks like below:
> > > >
> > > > 
> > > > 
> > > > edismax
> > > > explicit
> > > > 0.01
> > > > title^10 description^5 annotations^3 notes^2
> > > > categories
> > > > title
> > > > 0
> > > > *:*
> > > > *,score
> > > > 100%
> > > > AND
> > > > score desc
> > > > true
> > > > -1
> > > > 1
> > > > uniq_subtype_id
> > > > component_type
> > > > genre_type
> > > > 
> > > > 
> > > > collection:assets
> > > > 
> > > > 
> > > >
> > > > If I search for 'countryside number 10' as the keyword then highlight
> > > only
> > > > if the 'annotations' contain all these entered search terms. Any
> > document
> > > > containing just one or two terms is not a match.
> > > >
> > > > Thanks,
> > > > Sandeep
> > > > (p.s: I haven't enabled the highlighting feature yet on this config
> and
> > > > will be doing so only if that will fulfil the requirement I have
> > > mentioned
> > > > above.)
> > > >
> > >
> > > --
> > > _
> > > The information contained in this communication is intended solely for
> > the
> > > use of the individual or entity to whom it is addressed and others
> > > authorized to receive it. It may contain confidential or legally
> > privileged
> > > information. If you are not the intended recipient you are hereby
> > notified
> > > that any disclosure, copying, distribution or taking any action in
> > reliance
> > > on the contents of this information is strictly prohibited and may be
> > > unlawful. If you have received this communication in error, please
> notify
> > > us immediately by responding to this email and then delete it from your
> > > system. The firm is neither liable for the proper and complete
> > transmission
> > > of the information contained in this communication nor for any delay in
> > its
> > > receipt.
> > >
> >
>
> --
> _
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.
>

Re: Highlight only when all keywords match

2013-05-20 Thread Sandeep Mestry

Hi Jaideep,

The edismax config I have posted mentioned that the default operator is
AND. I am sorry if I was not clear in my previous mail, what I need really
is highlight a field when all search query terms present. The current
highlighter works for *any* of the terms match and not for *all* terms
match.

Thanks,
Sandeep


On 20 May 2013 11:40, Jaideep Dhok  wrote:

> Sandeep,
> If you AND all keywords, that should be OK?
>
> Thanks
> Jaideep
>
>
> On Mon, May 20, 2013 at 3:44 PM, Sandeep Mestry 
> wrote:
>
> > Dear All,
> >
> > I have a requirement to highlight a field only when all keywords entered
> > match. This also needs to support phrase, operator or wildcard queries.
> > I'm using Solr 4.0 with edismax because the search needs to be carried
> out
> > on multiple fields.
> > I know with highlighting feature I can configure a field to indicate a
> > match, however I do not find a setting to highlight only if all keywords
> > match. That makes me think is that the right approach to take? Can you
> > please guide me in right direction?
> >
> > The edsimax config looks like below:
> >
> > 
> > 
> > edismax
> > explicit
> > 0.01
> > title^10 description^5 annotations^3 notes^2
> > categories
> > title
> > 0
> > *:*
> > *,score
> > 100%
> > AND
> > score desc
> > true
> > -1
> > 1
> > uniq_subtype_id
> > component_type
> > genre_type
> > 
> > 
> > collection:assets
> > 
> > 
> >
> > If I search for 'countryside number 10' as the keyword then highlight
> only
> > if the 'annotations' contain all these entered search terms. Any document
> > containing just one or two terms is not a match.
> >
> > Thanks,
> > Sandeep
> > (p.s: I haven't enabled the highlighting feature yet on this config and
> > will be doing so only if that will fulfil the requirement I have
> mentioned
> > above.)
> >
>
> --
> _
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.
>

Highlight only when all keywords match

2013-05-20 Thread Sandeep Mestry

Dear All,

I have a requirement to highlight a field only when all keywords entered
match. This also needs to support phrase, operator or wildcard queries.
I'm using Solr 4.0 with edismax because the search needs to be carried out
on multiple fields.
I know with highlighting feature I can configure a field to indicate a
match, however I do not find a setting to highlight only if all keywords
match. That makes me think is that the right approach to take? Can you
please guide me in right direction?

The edsimax config looks like below:



edismax
explicit
0.01
title^10 description^5 annotations^3 notes^2 categories
title
0
*:*
*,score
100%
AND
score desc
true
-1
1
uniq_subtype_id
component_type
genre_type


collection:assets



If I search for 'countryside number 10' as the keyword then highlight only
if the 'annotations' contain all these entered search terms. Any document
containing just one or two terms is not a match.

Thanks,
Sandeep
(p.s: I haven't enabled the highlighting feature yet on this config and
will be doing so only if that will fulfil the requirement I have mentioned
above.)

Re: Question about Edismax - Solr 4.0

2013-05-17 Thread Sandeep Mestry

Hello Jack,

Thanks for pointing the issues out and for your valuable suggestion. My
preliminary tests were okay on search but I will be doing more testing to
see if this has impacted any other searches.

Thanks once again and have a nice sunny weekend,
Sandeep


On 17 May 2013 05:35, Jack Krupansky  wrote:

> Ah... I think your issue is the preserveOriginal=1 on the query analyzer
> as well as the fact that you have all of these catenatexx="1" options on
> the query analyzer - I indicated that you should remove them all.
>
> The problem is that the whitespace analyzer leaves the leading comma in
> place, and the preserveOriginal="1" also generates an extra token for the
> term, with the comma in place . But, with the space, the comma and "10" are
> separate terms and get analyzed independently.
>
> The query results probably indicate that you don't have that exact
> combination of the term and leading punctuation - or that there is no
> standalone comma in your input data.
>
> Try the following replacement for the query-time WDF:
>
>
>  stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
> catenateWords="0" catenateNumbers="0" catenateAll="0"
> splitOnCaseChange="1" splitOnNumerics="0" preserveOriginal="0" />
>
>
> -- Jack Krupansky
>
> -Original Message- From: Sandeep Mestry
> Sent: Thursday, May 16, 2013 5:50 PM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Question about Edismax - Solr 4.0
>
> Hi Jack,
>
> Thanks for your response again and for helping me out to get through this.
>
> The URL is definitely encoded for spaces and it looks like below. As I
> mentioned in my previous mail, I can't add it to query parameter as that
> searches on multiple fields.
>
> The title field is defined as below:
>  multiValued="true"/>
>
> q=countryside&rows=20&qt=**assdismax&fq=%28title%3A%28,**
> 10%29%29&fq=collection:assets
>
> 
> 
> edismax
> explicit
> 0.01
> title^10 description^5 annotations^3 notes^2
> categories
> title
> 0
> *:*
> *,score
> 100%
> AND
> score desc
> true
> -1
> 1
> uniq_**subtype_id
> component_**type
> genre_type<**/str>
> 
> 
> collection:assets
> 
> 
>
> The term 'countryside' needs to be searched against multiple fields
> including titles, descriptions, annotations, categories, notes but the UI
> also has a feature to limit results by providing a title field.
>
>
> I can see that the filter queries are always parsed by LuceneQueryParser
> however I'd expect it to generate the parsed_filter_queries debug output in
> every situation.
>
> I have tried it as the main query with both edismax and lucene defType and
> it gives me correct output and correct results.
> But, there is some problem when this is used as a filter query as the the
> parser is not able to parse a comma with a space.
>
> Thanks again Jack, please let me know in case you need more inputs from my
> side.
>
> Best Regards,
> Sandeep
>
> On 16 May 2013 18:03, Jack Krupansky  wrote:
>
>  Could you show us the full query URL - spaces must be encoded in URL query
>> parameters.
>>
>> Also show the actual field XML - you omitted that.
>>
>> Try the same query as a main query, using both defType=edismax and
>> defType=lucene.
>>
>> Note that the filter query is parsed using the Lucene query parser, not
>> edismax, independent of the defType parameter. But you don't have any
>> edismax features in your fq anyway.
>>
>> But you can stick {!edismax} in front of the query to force edismax to be
>> used for the fq, although it really shouldn't change anything:
>>
>> Also, catenate is fine for indexing, but will mess up your queries at
>> query time, so set them to "0" in the query analyzer
>>
>> Also, make sure you have autoGeneratePhraseQueries="true" on the
>> field
>>
>> type, but that's not the issue here.
>>
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Sandeep Mestry
>> Sent: Thursday, May 16, 2013 12:42 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Question about Edismax - Solr 4.0
>>
>>
>> Thanks Jack for your reply..
>>
>> The problem is, I'm finding results for fq=title:(,10) but not for
>> fq=title:(, 10) - apologies if that was not clear from my first mail.
>> I have already mentioned the debug analysis in my previous mail.
>>
>> Additionally,

Re: Question about Edismax - Solr 4.0

2013-05-16 Thread Sandeep Mestry

Hi Jack,

Thanks for your response again and for helping me out to get through this.

The URL is definitely encoded for spaces and it looks like below. As I
mentioned in my previous mail, I can't add it to query parameter as that
searches on multiple fields.

The title field is defined as below:


q=countryside&rows=20&qt=assdismax&fq=%28title%3A%28,10%29%29&fq=collection:assets



edismax
explicit
0.01
title^10 description^5 annotations^3 notes^2 categories
title
0
*:*
*,score
100%
AND
score desc
true
-1
1
uniq_subtype_id
component_type
genre_type


collection:assets



The term 'countryside' needs to be searched against multiple fields
including titles, descriptions, annotations, categories, notes but the UI
also has a feature to limit results by providing a title field.


I can see that the filter queries are always parsed by LuceneQueryParser
however I'd expect it to generate the parsed_filter_queries debug output in
every situation.

I have tried it as the main query with both edismax and lucene defType and
it gives me correct output and correct results.
But, there is some problem when this is used as a filter query as the the
parser is not able to parse a comma with a space.

Thanks again Jack, please let me know in case you need more inputs from my
side.

Best Regards,
Sandeep

On 16 May 2013 18:03, Jack Krupansky  wrote:

> Could you show us the full query URL - spaces must be encoded in URL query
> parameters.
>
> Also show the actual field XML - you omitted that.
>
> Try the same query as a main query, using both defType=edismax and
> defType=lucene.
>
> Note that the filter query is parsed using the Lucene query parser, not
> edismax, independent of the defType parameter. But you don't have any
> edismax features in your fq anyway.
>
> But you can stick {!edismax} in front of the query to force edismax to be
> used for the fq, although it really shouldn't change anything:
>
> Also, catenate is fine for indexing, but will mess up your queries at
> query time, so set them to "0" in the query analyzer
>
> Also, make sure you have autoGeneratePhraseQueries="**true" on the field
> type, but that's not the issue here.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Sandeep Mestry
> Sent: Thursday, May 16, 2013 12:42 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Question about Edismax - Solr 4.0
>
>
> Thanks Jack for your reply..
>
> The problem is, I'm finding results for fq=title:(,10) but not for
> fq=title:(, 10) - apologies if that was not clear from my first mail.
> I have already mentioned the debug analysis in my previous mail.
>
> Additionally, the title field is defined as below:
> 
>>
>>  
>
> stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
> catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
> splitOnNumerics="0" preserveOriginal="1" />
>
>
>
>
> stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
> catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
> splitOnNumerics="0" preserveOriginal="1" />
>
>
>
>
> I have the set catenate options to 1 for all types.
> I can understand if ',' getting ignored when it is on its own (title:(,
> 10)) but
> - Why solr is not searching for 10 in that case just like it did when the
> query was (title:(,10))?
> - And why other filter queries did not show up (collection:assets) in debug
> section?
>
>
> Thanks,
> Sandeep
>
>
> On 16 May 2013 13:57, Jack Krupansky  wrote:
>
>  You haven't indicated any problem here! What is the symptom that you
>> actually think is a problem.
>>
>> There is no comma operator in any of the Solr query parsers. Comma is just
>> another character that may or may not be included or discarded depending
>> on
>> the specific field type and analyzer. For example, a white space analyzer
>> will keep commas, but the standard analyzer or the word delimiter filter
>> will discard them. If "title" were a "string" type, all punctuation would
>> be preserved, including commas and spaces (but spaces would need to be
>> escaped or the term text enclosed in parentheses.)
>>
>> Let us know what your symptom is though, first.
>>
>> I mean, the filter query looks perfectly reasonable from an abstr

Re: Question about Edismax - Solr 4.0

2013-05-16 Thread Sandeep Mestry

Thanks Jack for your reply..

The problem is, I'm finding results for fq=title:(,10) but not for
fq=title:(, 10) - apologies if that was not clear from my first mail.
I have already mentioned the debug analysis in my previous mail.

Additionally, the title field is defined as below:

 











I have the set catenate options to 1 for all types.
I can understand if ',' getting ignored when it is on its own (title:(,
10)) but
- Why solr is not searching for 10 in that case just like it did when the
query was (title:(,10))?
- And why other filter queries did not show up (collection:assets) in debug
section?


Thanks,
Sandeep


On 16 May 2013 13:57, Jack Krupansky  wrote:

> You haven't indicated any problem here! What is the symptom that you
> actually think is a problem.
>
> There is no comma operator in any of the Solr query parsers. Comma is just
> another character that may or may not be included or discarded depending on
> the specific field type and analyzer. For example, a white space analyzer
> will keep commas, but the standard analyzer or the word delimiter filter
> will discard them. If "title" were a "string" type, all punctuation would
> be preserved, including commas and spaces (but spaces would need to be
> escaped or the term text enclosed in parentheses.)
>
> Let us know what your symptom is though, first.
>
> I mean, the filter query looks perfectly reasonable from an abstract
> perspective.
>
> -- Jack Krupansky
>
> -Original Message- From: Sandeep Mestry
> Sent: Thursday, May 16, 2013 6:51 AM
> To: solr-user@lucene.apache.org
> Subject: Question about Edismax - Solr 4.0
>
> -- *Edismax and Filter Queries with Commas and spaces* --
>
>
> Dear Experts,
>
> This appears to be a bug, please suggest if I'm wrong.
>
> If I search with the following filter query,
>
> 1) fq=title:(, 10)
>
> - I get no results.
> - The debug output does NOT show the section containing
> parsed_filter_queries
>
> if I carry a search with the filter query,
>
> 2) fq=title:(,10) - (No space between , and 10)
>
> - I get results and the debug output shows the parsed filter queries
> section as,
> 
> (titles:(,10))
> (collection:assets)
>
> As you can see above, I'm also passing in other filter queries
> (collection:assets) which appear correctly but they do not appear in case 1
> above.
>
> I can't make this as part of the query parameter as that needs to be
> searched against multiple fields.
>
> Can someone suggest a fix in this case please. I'm using Solr 4.0.
>
> Many Thanks,
> Sandeep
>

Question about Edismax - Solr 4.0

2013-05-16 Thread Sandeep Mestry

-- *Edismax and Filter Queries with Commas and spaces* --

Dear Experts,

This appears to be a bug, please suggest if I'm wrong.

If I search with the following filter query,

1) fq=title:(, 10)

- I get no results.
- The debug output does NOT show the section containing
parsed_filter_queries

if I carry a search with the filter query,

2) fq=title:(,10) - (No space between , and 10)

- I get results and the debug output shows the parsed filter queries
section as,

(titles:(,10))
(collection:assets)

As you can see above, I'm also passing in other filter queries
(collection:assets) which appear correctly but they do not appear in case 1
above.

I can't make this as part of the query parameter as that needs to be
searched against multiple fields.

Can someone suggest a fix in this case please. I'm using Solr 4.0.

Many Thanks,
Sandeep

Solr Sorting Algorithm

2013-05-13 Thread Sandeep Mestry

Good Morning All,

The alphabetical sorting is causing slight issues as below:

I have 3 documents with title value as below:

1) "Acer Palmatum (Tree)"
2) "Aceraceae (Tree Family)"
3) "Acer Pseudoplatanus (Tree)"

I have created title_sort field which is defined with field type as
alphaNumericalSort (that comes with solr example schema)

When I apply the sort order (sort=title_sort asc), I get the results as:

"Aceraceae (Tree Family)"
"Acer Palmatum (Tree)"
"Acer Pseudoplatanus (Tree)"

But, the expected order is (spaces first),

"Acer Palmatum (Tree)"
"Acer Pseudoplatanus (Tree)"
"Aceraceae (Tree Family)"

My unit test contains Collections.sort method and I get the expected
results but I'm not sure why Solr is doing it in different way.

>From Collections.sort API, I can see that it uses modified merge sort,
could you tell me which algorithm solr follows for sorting logic and also
if there is any other approach I can take?

Many Thanks,
Sandeep

Re: commit in solr4 takes a longer time

2013-05-03 Thread Sandeep Mestry

That's not ideal.
Can you post solrconfig.xml?
On 3 May 2013 07:41, "vicky desai"  wrote:

> Hi sandeep,
>
> I made the changes u mentioned and tested again for the same set of docs
> but
> unfortunately the commit time increased.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396p4060622.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: commit in solr4 takes a longer time

2013-05-02 Thread Sandeep Mestry

Hi Vicky,

I faced this issue as well and after some playing around I found the
autowarm count in cache sizes to be a problem.
I changed that from a fixed count (3072) to percentage (10%) and all commit
times were stable then onwards.





HTH,
Sandeep


On 2 May 2013 16:31, Alexandre Rafalovitch  wrote:

> If you don't re-open the searcher, you will not see new changes. So,
> if you only have hard commit, you never see those changes (until
> restart). But if you also have soft commit enabled, that will re-open
> your searcher for you.
>
> Regards,
>Alex.
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Thu, May 2, 2013 at 11:21 AM, Furkan KAMACI 
> wrote:
> > What happens exactly when you don't open searcher at commit?
> >
> > 2013/5/2 Gopal Patwa 
> >
> >> you might want to added openSearcher=false for hard commit, so hard
> commit
> >> also act like soft commit
> >>
> >>
> >> 5
> >> 30
> >>false
> >> 
> >>
> >>
> >>
> >> On Thu, May 2, 2013 at 12:16 AM, vicky desai  >> >wrote:
> >>
> >> > Hi,
> >> >
> >> > I am using 1 shard and two replicas. Document size is around 6 lakhs
> >> >
> >> >
> >> > My solrconfig.xml is as follows
> >> > 
> >> > 
> >> > LUCENE_40
> >> > 
> >> >
> >> >
> >> > 2147483647
> >> > simple
> >> > true
> >> > 
> >> > 
> >> > 
> >> > 500
> >> > 1000
> >> > 
> >> > 
> >> > 5
> >> > 30
> >> > 
> >> > 
> >> >
> >> > 
> >> >  >> > multipartUploadLimitInKB="204800" />
> >> > 
> >> >
> >> >  >> class="solr.StandardRequestHandler"
> >> > default="true" />
> >> >  class="solr.UpdateRequestHandler"
> >> />
> >> >  >> > class="org.apache.solr.handler.admin.AdminHandlers" />
> >> >  >> > class="solr.ReplicationHandler" />
> >> >  >> > class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}" />
> >> > true
> >> > 
> >> > *:*
> >> > 
> >> > 
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > View this message in context:
> >> >
> >>
> http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396p4060402.html
> >> > Sent from the Solr - User mailing list archive at Nabble.com.
> >> >
> >>
>

Re: Exact and Partial Matches

2013-04-30 Thread Sandeep Mestry

Thanks Erick,

I tried grouping and it appears to work okay. However, I will need to
change the client to parse the output..

&fq=title:(tree)&group=true&group.query=title:(trees) NOT
title_ci:trees&group.query=title_ci:blair&group.sort=title_sort
desc&sort=score desc,title_sort asc

I used the actual query as the filter query so my scores will be 1 and then
used 2 group queries - one which will give me exact matches and other that
will give me partial minus exact matches.
I have tried this with operators too and it seems to be doing the job I
want, do you see any issue in this?

Thanks again for your reply and by the way thanks for SOLR-4662.

-S


On 30 April 2013 15:06, Erick Erickson  wrote:

> I don't think you can do that. You're essentially
> trying to mix ordering of the result set. You
> _might_ be able to kludge some of this with
> grouping, but I doubt it.
>
> You'll need two queries I'd guess.
>
> Best
> Erick
>
> On Mon, Apr 29, 2013 at 9:44 AM, Sandeep Mestry 
> wrote:
> > Dear Experts,
> >
> > I have a requirement for the exact matches and applying alphabetical
> > sorting thereafter.
> >
> > To illustrate, the results should be sorted in exact matches and all
> later
> > alphabetical.
> >
> > So, if there are 5 documents as below
> >
> > Doc1
> > title: trees
> >
> > Doc 2
> > title: plum trees
> >
> > Doc 3
> > title: Money Trees (Legendary Trees)
> >
> > Doc 4
> > title: Cork Trees
> >
> > Doc 5
> > title: Old Trees
> >
> > Then, if user searches with query term as 'trees', the results should be
> in
> > following order:
> >
> > Doc 1 trees - Highest Rank
> > Doc 4 Cork Trees - Alphabetical afterwards..
> > Doc 3 Money Trees (Legendary Trees)
> > Doc 5 Old Trees
> > Doc 2 plum trees
> >
> > I can achieve the alphabetical sorting by adding the title sort
> > parameter, However,
> > Solr relevancy is higher for Doc 3 (due to matches in 2 terms and so
> > it arranges
> > Doc 3 above Doc 4, 5 and 2).
> > So, it looks like:
> >
> > Doc 1 trees - Highest Rank
> > Doc 3 Money Trees (Legendary Trees)
> > Doc 4 Cork Trees - Alphabetical afterwards..
> > Doc 5 Old Trees
> > Doc 2 plum trees
> >
> > Can you tell me an easy way to achieve this requirement please?
> >
> > I'm using Solr 4.0 and the *title *field is defined as follows:
> >
> >  positionIncrementGap="100"
> >>
> > 
> > 
> >  > stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
> > catenateWords="1" catenateNumbers="1" catenateAll="1"
> splitOnCaseChange="1"
> > splitOnNumerics="0" preserveOriginal="1" />
> > 
> > 
> > 
> > 
> >  > stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
> > catenateWords="1" catenateNumbers="1" catenateAll="1"
> splitOnCaseChange="1"
> > splitOnNumerics="0" preserveOriginal="1" />
> > 
> > 
> > 
> >
> >
> >
> > Many Thanks in advance,
> > Sandeep
>

Custom sorting of Solr Results

2013-04-30 Thread Sandeep Mestry

Dear Experts,

>
> I have a requirement for the exact matches and applying alphabetical
> sorting thereafter.
>
> To illustrate, the results should be sorted in exact matches and all later
> alphabetical.
>
> So, if there are 5 documents as below
>
> Doc1
> title: trees
>
> Doc 2
> title: plum trees
>
> Doc 3
> title: Money Trees (Legendary Trees)
>
> Doc 4
> title: Cork Trees
>
> Doc 5
> title: Old Trees
>
> Then, if user searches with query term as 'trees', the results should be
> in following order:
>
> Doc 1 trees - Highest Rank
> Doc 4 Cork Trees - Alphabetical afterwards..
> Doc 3 Money Trees (Legendary Trees)
> Doc 5 Old Trees
> Doc 2 plum trees
>
> I can achieve the alphabetical sorting by adding the title sort parameter, 
> However,
> Solr relevancy is higher for Doc 3 (due to matches in 2 terms and so it 
> arranges
> Doc 3 above Doc 4, 5 and 2).
> So, it looks like:
>
> Doc 1 trees - Highest Rank
> Doc 3 Money Trees (Legendary Trees)
> Doc 4 Cork Trees - Alphabetical afterwards..
> Doc 5 Old Trees
> Doc 2 plum trees
>
> Can you tell me an easy way to achieve this requirement please?
>
> I'm using Solr 4.0 and the *title *field is defined as follows:
>
>  positionIncrementGap="100" >
> 
> 
>  stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
> catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
> splitOnNumerics="0" preserveOriginal="1" />
> 
> 
> 
> 
>  stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
> catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
> splitOnNumerics="0" preserveOriginal="1" />
> 
> 
> 
>
>
>
> Many Thanks in advance,
> Sandeep
>

Exact and Partial Matches

2013-04-29 Thread Sandeep Mestry

Dear Experts,

I have a requirement for the exact matches and applying alphabetical
sorting thereafter.

To illustrate, the results should be sorted in exact matches and all later
alphabetical.

So, if there are 5 documents as below

Doc1
title: trees

Doc 2
title: plum trees

Doc 3
title: Money Trees (Legendary Trees)

Doc 4
title: Cork Trees

Doc 5
title: Old Trees

Then, if user searches with query term as 'trees', the results should be in
following order:

Doc 1 trees - Highest Rank
Doc 4 Cork Trees - Alphabetical afterwards..
Doc 3 Money Trees (Legendary Trees)
Doc 5 Old Trees
Doc 2 plum trees

I can achieve the alphabetical sorting by adding the title sort
parameter, However,
Solr relevancy is higher for Doc 3 (due to matches in 2 terms and so
it arranges
Doc 3 above Doc 4, 5 and 2).
So, it looks like:

Doc 1 trees - Highest Rank
Doc 3 Money Trees (Legendary Trees)
Doc 4 Cork Trees - Alphabetical afterwards..
Doc 5 Old Trees
Doc 2 plum trees

Can you tell me an easy way to achieve this requirement please?

I'm using Solr 4.0 and the *title *field is defined as follows:
















Many Thanks in advance,
Sandeep

Re: Exact matching in Solr 3.6.1

2013-04-25 Thread Sandeep Mestry

Agree with Jack.

The current field type text_general is designed to match the query tokens
instead of exact matches - so it's not able to fulfill your requirements.

Can you use flat file
as spell check
dictionary instead and that way you can search on exact
matched field while generating spell check suggestions from the file
instead of from index?

-S


On 25 April 2013 16:25, Jack Krupansky  wrote:

> Well then just do an exact match ONLY!
>
> It sounds like you haven't worked out the inconsistencies in your
> requirements.
>
> To be clear: We're not offering you "solutions" - that's your job. We're
> only pointing out tools that you can use. It is up to you to utilize the
> tools wisely to implement your solution.
>
> I suspect that you simply haven't experimented enough with various boosts
> to assure that the unstemmed result is consistently higher.
>
> Maybe you need a custom stemmer or stemmer overide so that "passengers"
> does get stemmed to "passenger", but "cats" does not (but "dogs" does.)
> That can be a choice that you can make, but I would urge caution. Still, it
> is a decision that you can make - it's not a matter of Solr forcing or
> preventing you. I still think boosting of an unstemmed field should be
> sufficient.
>
> But until you clarify the inconsistencies in your requirements, we won't
> be able to make much progress.
>
>
> -- Jack Krupansky
>
> -Original Message- From: vsl
> Sent: Thursday, April 25, 2013 10:45 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Exact matching in Solr 3.6.1
>
> Thanks for your reply but this solution does not fullfil my requirment
> because other documents (not exact matched) will be returned as well.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.**
> nabble.com/Exact-matching-in-**Solr-3-6-1-tp4058865p4058929.**html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Exact matching in Solr 3.6.1

2013-04-25 Thread Sandeep Mestry

I think in that case, making a field String type is your option, however
remember that it'd be case sensitive.
Another approach is to create a case insensitive field type and doing
searches on those fields only.

Can you provide your fields and dismax config and if possible records you
would like and records you do not want?

-S

On 25 April 2013 11:50, vsl  wrote:

> Thanks for your reply. I am using edismax as well. What I want to get is
> the
> exact match without other results that could be close to the given term.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058876.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Exact matching in Solr 3.6.1

2013-04-25 Thread Sandeep Mestry

Hi Pawel,

Not sure which parser you are using, I am using edismax and tried using the
bq parameter to boost the results having exact matches at the top.
You may try something like:
q="cats" AND London NOT Leeds&bq="cats"^50

In edismax, pf and pf2 parameters also need some tuning to get the results
at the top.

HTH,
Sandeep

On 25 April 2013 10:33, vsl  wrote:

> Hi,
>  is it possible to get exact matched result if the search term is combined
> e.g. "cats" AND London NOT Leeds
>
>
> In the previus threads I have read that it is possible to create new field
> of String type and perform phrase search on it but nowhere the above
> mentioned combined search term had been taken into consideration.
>
> BR
> Pawel
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Question on Exact Matches - edismax

2013-04-04 Thread Sandeep Mestry

Another problem that I see in Solr analysis is the query term that matches
the tokenized field does not match on the case insensitive field.
So, if I'm searching for 'coast to coast', I see that the tokenized series
title (pg_series_title) is matched but not the ci field which is
pg_series_title_ci.

The definition of both field is as below:

























**
*
*
*Can this copyfield directive be an issue? Should it be other way round or
does it matter?*

Thanks,
Sandeep





On 4 April 2013 10:38, Sandeep Mestry  wrote:

> Hi Jan,
>
> Thanks for your reply. I have defined string_ci like below:
>
>  omitNorms="true" compressThreshold="10">
> 
> 
> 
> 
> 
>
> When I analyse the query in solr, I saw that document containing
> pg_series_title_ci:"funny"  matches when I do a search for
> pg_series_title_ci:"funny games" and is ranked higher than the document
> containing the exact matches. I can use the default string data type but
> then the match will be on exact casing.
>
> Thanks,
> Sandeep
>
>
> On 3 April 2013 22:20, Jan Høydahl  wrote:
>
>> Can you show us your *_ci field type? Solr does not really have a way to
>> tell whether a match is "exact" or only partial, but you could hack around
>> it with the fieldType. See https://github.com/cominvent/exactmatch for a
>> possible solution.
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> Solr Training - www.solrtraining.com
>>
>> 3. apr. 2013 kl. 15:55 skrev Sandeep Mestry :
>>
>> > Hi All,
>> >
>> > I have a requirement where in exact matches for 2 fields (Series Title,
>> > Title) should be ranked higher than the partial matches. The
>> configuration
>> > looks like below:
>> >
>> > 
>> >
>> >edismax
>> >explicit
>> >0.01
>> >*pg_series_title_ci*^500 *title_ci*^300 *
>> > pg_series_title*^200 *title*^25 classifications^15
>> classifications_texts^15
>> > parent_classifications^10 synonym_classifications^5 pg_brand_title^5
>> > pg_series_working_title^5 p_programme_title^5 p_item_title^5
>> > p_interstitial_title^5 description^15 pg_series_description
>> annotations^0.1
>> > classification_notes^0.05 pv_program_version_number^2
>> > pv_program_version_number_ci^2 pv_program_number^2
>> pv_program_number_ci^2
>> > p_program_number^2 ma_version_number^2 ma_recording_location
>> > ma_contributions^0.001 rel_pg_series_title rel_programme_title
>> > rel_programme_number rel_programme_number_ci pg_uuid^0.5 p_uuid^0.5
>> > pv_uuid^0.5 ma_uuid^0.5
>> >pg_series_title_ci^500 title_ci^500
>> >0
>> >*:*
>> >100%
>> >AND
>> >true
>> >-1
>> >1
>> >
>> >
>> >
>> > As you can see above, the search is against many fields. What I'd want
>> is
>> > the documents that have exact matches for series title and title fields
>> > should rank higher than the rest.
>> >
>> > I have added 2 case insensitive (*pg_series_title_ci, title_ci*) fields
>> for
>> > series title and title and have boosted them higher over the tokenized
>> and
>> > rest of the fields. I have also implemented a similarity class to
>> override
>> > idf however I still get documents having partial matches in title and
>> other
>> > fields ranking higher than exact match in pg_series_title_ci.
>> >
>> > Many Thanks,
>> > Sandeep
>>
>>
>

Re: Question on Exact Matches - edismax

2013-04-04 Thread Sandeep Mestry

Hi Jan,

Thanks for your reply. I have defined string_ci like below:








When I analyse the query in solr, I saw that document containing
pg_series_title_ci:"funny"  matches when I do a search for
pg_series_title_ci:"funny games" and is ranked higher than the document
containing the exact matches. I can use the default string data type but
then the match will be on exact casing.

Thanks,
Sandeep


On 3 April 2013 22:20, Jan Høydahl  wrote:

> Can you show us your *_ci field type? Solr does not really have a way to
> tell whether a match is "exact" or only partial, but you could hack around
> it with the fieldType. See https://github.com/cominvent/exactmatch for a
> possible solution.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> 3. apr. 2013 kl. 15:55 skrev Sandeep Mestry :
>
> > Hi All,
> >
> > I have a requirement where in exact matches for 2 fields (Series Title,
> > Title) should be ranked higher than the partial matches. The
> configuration
> > looks like below:
> >
> > 
> >
> >edismax
> >explicit
> >0.01
> >*pg_series_title_ci*^500 *title_ci*^300 *
> > pg_series_title*^200 *title*^25 classifications^15
> classifications_texts^15
> > parent_classifications^10 synonym_classifications^5 pg_brand_title^5
> > pg_series_working_title^5 p_programme_title^5 p_item_title^5
> > p_interstitial_title^5 description^15 pg_series_description
> annotations^0.1
> > classification_notes^0.05 pv_program_version_number^2
> > pv_program_version_number_ci^2 pv_program_number^2 pv_program_number_ci^2
> > p_program_number^2 ma_version_number^2 ma_recording_location
> > ma_contributions^0.001 rel_pg_series_title rel_programme_title
> > rel_programme_number rel_programme_number_ci pg_uuid^0.5 p_uuid^0.5
> > pv_uuid^0.5 ma_uuid^0.5
> >pg_series_title_ci^500 title_ci^500
> >0
> >*:*
> >100%
> >AND
> >true
> >-1
> >1
> >
> >
> >
> > As you can see above, the search is against many fields. What I'd want is
> > the documents that have exact matches for series title and title fields
> > should rank higher than the rest.
> >
> > I have added 2 case insensitive (*pg_series_title_ci, title_ci*) fields
> for
> > series title and title and have boosted them higher over the tokenized
> and
> > rest of the fields. I have also implemented a similarity class to
> override
> > idf however I still get documents having partial matches in title and
> other
> > fields ranking higher than exact match in pg_series_title_ci.
> >
> > Many Thanks,
> > Sandeep
>
>

Question on Exact Matches - edismax

2013-04-03 Thread Sandeep Mestry

Hi All,

I have a requirement where in exact matches for 2 fields (Series Title,
Title) should be ranked higher than the partial matches. The configuration
looks like below:



edismax
explicit
0.01
*pg_series_title_ci*^500 *title_ci*^300 *
pg_series_title*^200 *title*^25 classifications^15 classifications_texts^15
parent_classifications^10 synonym_classifications^5 pg_brand_title^5
pg_series_working_title^5 p_programme_title^5 p_item_title^5
p_interstitial_title^5 description^15 pg_series_description annotations^0.1
classification_notes^0.05 pv_program_version_number^2
pv_program_version_number_ci^2 pv_program_number^2 pv_program_number_ci^2
p_program_number^2 ma_version_number^2 ma_recording_location
ma_contributions^0.001 rel_pg_series_title rel_programme_title
rel_programme_number rel_programme_number_ci pg_uuid^0.5 p_uuid^0.5
pv_uuid^0.5 ma_uuid^0.5
pg_series_title_ci^500 title_ci^500
0
*:*
100%
AND
true
-1
1



As you can see above, the search is against many fields. What I'd want is
the documents that have exact matches for series title and title fields
should rank higher than the rest.

I have added 2 case insensitive (*pg_series_title_ci, title_ci*) fields for
series title and title and have boosted them higher over the tokenized and
rest of the fields. I have also implemented a similarity class to override
idf however I still get documents having partial matches in title and other
fields ranking higher than exact match in pg_series_title_ci.

Many Thanks,
Sandeep

Re: Problem when I search something that contains a forward slash?

2013-02-19 Thread Sandeep Mestry

Hi Bruno,

I have never used 3.6 so I am sorry I might not be of much help.
But, I have a similar requirement for 2 fields and I use string & case
insensitive string fields and by escaping the forward slash, I get the
result correctly.

The field definitions are as below:







 
 

The debug output for string field is as below:

*String field:*

pv_program_version_number_ci:HNAD002D\/01
pv_program_version_number_ci:HNAD002D\/01
pv_program_version_number_ci:hnad002d/01
pv_program_version_number_ci:hnad002d/01


*Case Insensitive String field:*
pv_program_version_number:HNAD002D\/01
pv_program_version_number:HNAD002D\/01
pv_program_version_number:HNAD002D/01
pv_program_version_number:HNAD002D/01


HTH,
Sandeep


On 19 February 2013 12:24, Bruno Mannina  wrote:

> Hi,
>
> Even I use backslash, the problem is the same:
> ic:A01H2\/023 returns the same problem.
>
> May be I must disable an option ? or something 
>
> Le 19/02/2013 13:11, Bruno Mannina a écrit :
>
>  Hi Sandeep,
>>
>> First thanks for your answer but I use Solr 3.6 and not 4.0.
>> I can't actually update my solr to 4.0 version.
>>
>> And using the " " is not the solution because Solr 3.6 has an issue when
>> I use troncation like * inside the request:
>> "A01H2/0*" doesn't work.
>>
>> Do you have an other solution for Solr 3.6 ?
>>
>> thanks a lot,
>> Bruno
>>
>> Le 19/02/2013 13:05, Sandeep Mestry a écrit :
>>
>>> Hi Bruno,
>>>
>>> [image: ] Solr 4.0 added regular expression support, which means that
>>> '/' is now a special character and must be escaped if searching for
>>> literal
>>> forward slash.
>>>
>>> http://wiki.apache.org/solr/**SolrQuerySyntax<http://wiki.apache.org/solr/SolrQuerySyntax>
>>>
>>> So, you can either escape it or use quotes like "A01H2/001"
>>>
>>> Cheers,
>>> Sandeep
>>>
>>>
>>>
>>> On 19 February 2013 11:40, Bruno Mannina  wrote:
>>>
>>>  Dear Solr Users,
>>>>
>>>> I use Solr 3.6
>>>>
>>>> I have a field name IC which contains IPC codes with a forward slash
>>>> inside like:
>>>> A01H2/001
>>>> G06F1/023
>>>> C01C3/147
>>>> G06F3/023
>>>> etc...
>>>>
>>>> My definition for this field is:
>>>> >>> multiValued="true"/>
>>>>
>>>> If i try to search:
>>>> ic:G06F3/023
>>>> http://:/solr/select/?q=ic%3AG06F3%2F023&**
>>>> version=2.2&start=0&rows=10&indent=on
>>>>
>>>> the result is wrong.
>>>>
>>>> When I use debugQuery, I see that the forward slash split the request
>>>> as:
>>>> ic:g06f3 ic:023
>>>>
>>>> How can I search a term that contains a / (forward slash)?
>>>>
>>>> Thanks a lot for your help,
>>>> Bruno
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>>
>>
>

Re: Problem when I search something that contains a forward slash?

2013-02-19 Thread Sandeep Mestry

Hi Bruno,

[image: ] Solr 4.0 added regular expression support, which means that
'/' is now a special character and must be escaped if searching for literal
forward slash.

http://wiki.apache.org/solr/SolrQuerySyntax

So, you can either escape it or use quotes like "A01H2/001"

Cheers,
Sandeep



On 19 February 2013 11:40, Bruno Mannina  wrote:

> Dear Solr Users,
>
> I use Solr 3.6
>
> I have a field name IC which contains IPC codes with a forward slash
> inside like:
> A01H2/001
> G06F1/023
> C01C3/147
> G06F3/023
> etc...
>
> My definition for this field is:
>  multiValued="true"/>
>
> If i try to search:
> ic:G06F3/023
> http://:/solr/**select/?q=ic%3AG06F3%2F023&**
> version=2.2&start=0&rows=10&**indent=on
>
> the result is wrong.
>
> When I use debugQuery, I see that the forward slash split the request as:
> **ic:g06f3 ic:023
>
> How can I search a term that contains a / (forward slash)?
>
> Thanks a lot for your help,
> Bruno
>
>
>
>

Re: How to give more more importance to a document if term match is more

2013-02-19 Thread Sandeep Mestry

Hi Pragyanshis,

I faced a similar problem few days ago and I was advised on this forum to
override Solr DefaultSimilairy calculation to return always a constant
value for idf. I think, in your case you'd also want to suppress the length
norm which will require re-indexing as length norm is calculated during
indexing.

The link of my issue is as below:
http://lucene.472066.n3.nabble.com/Possible-issue-in-edismax-td4037397.html

Cheers,
Sandeep


On 14 February 2013 19:20, Pragyanshis Pattanaik wrote:

> Hi,
> My schema is like below.
>  indexed="true" stored="true"/> name="Subject-Mark-*" type="int" indexed="true" stored="true"/>
> My need is to search only three subject fields and boost those subjects
> which has a higher Mark(Mark can be in between 1 - 10).
> Again Top subjects will get a higher boost than preceding one's.
> Like if a search term is present in Subject-Name-1,Then it will get a
> higher boost than Subject-Name-2 and Subject-Name-3.
> Similarly Subject-Mark-1 will get higher boost than Subject-Mark-2 and
> Subject-Mark-3.
> To achieve this i am querying over subject fields and my query looks like
> below.
>
> q=+Economics+Geography&wt=xml&deftype=edismax&qf=Subject-Name-1+Subject-Name-2+Subject-Name-3&bq=Subject-Name-1%3AEconomics%3BGeography^50.0+Subject-Mark-1%3A20^90.0+Subject-Mark-1%3A9^80.0+Subject-Mark-1%3A8^70.0+Subject-Mark-1%3A7^60.0+Subject-Name-2%3AEconomics%3BGeography^45.0+Subject-Mark-2%3A20^90.0+Subject-Mark-2%3A9^80.0+Subject-Mark-2%3A8^70.0+Subject-Mark-2%3A7^60.0+Subject-Name-3%3AEconomics%3BGeography^40.0+Subject-Mark-3%3A20^90.0+Subject-Mark-3%3A9^80.0+Subject-Mark-3%3A8^70.0+Subject-Mark-3%3A7^60.0
> If i am having four documents like below
> Economics name="Subject-Name-1">Geography name="Subject-Name-1">History7
>  76
>  Economics name="Subject-Name-1">History name="Subject-Name-1">Geography8
>85
>Economics
>  History name="Subject-Name-1">Geography9
>67
>Economics
>  Mathematics name="Subject-Name-1">History7
>  76
>  
>
> then i am getting a higher score for last document which has only one of
> the search term !!!
> But in my situation it is not applicable. My requirement is,if a document
> has only one term then they should get a lower score than the documents
> which are having both of the terms.
> Is it happening because of idf(rarer terms give higher contribution to the
> total score) ?
> Or there is something wrong with my query ?
> Can anybody help me to achieve the desired output.
> Thanks in advance

Re: Possible issue in edismax?

2013-02-12 Thread Sandeep Mestry

Hi Felipe, Just a short note to say thanks for your valuable suggestion. I
had implemented that and could see expected results. The length norm still
spoils it for few fields but I balanced it with the boost factors
accordingly.

Once again, Many Thanks!
Sandeep


On 1 February 2013 22:53, Sandeep Mestry  wrote:

> Brilliant!  Thanks very much for your response. .
> On 1 Feb 2013 20:37, "Felipe Lahti"  wrote:
>
>> It's not necessary. It's only query time.
>>
>>
>> On Fri, Feb 1, 2013 at 5:00 PM, Sandeep Mestry 
>> wrote:
>>
>> > Hi..
>> >
>> > Could you tell me if changing default similarity to custom
>> implementation
>> > will require me to rebuild the index? Or will it be used only query
>> time?
>> >
>> > thanks,
>> > Sandeep
>> >  On 31 Jan 2013 13:55, "Felipe Lahti"  wrote:
>> >
>> > > So, it depends of your business requirement, right? If a document has
>> > > matches in more searchable fields, at least for me, this document is
>> more
>> > > important than other document that has less matches.
>> > >
>> > > Example:
>> > > Put this in your schema:
>> > > 
>> > >
>> > > And create a class in your classpath of your Solr:
>> > >
>> > > package com.your.namespace;
>> > >
>> > > import org.apache.lucene.search.similarities.DefaultSimilarity;
>> > >
>> > > public class NoIDFSimilarity extends DefaultSimilarity {
>> > >
>> > > @Override
>> > >
>> > > public float idf(long docFreq, long numDocs) {
>> > >
>> > > return 1;
>> > >
>> > > }
>> > >
>> > > }
>> > >
>> > >
>> > > It will "neutralize" the idf (which is the rarity of term).
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Thu, Jan 31, 2013 at 5:31 AM, Sandeep Mestry 
>> > > wrote:
>> > >
>> > > > Thanks Felipe..
>> > > > Can you point me an example please?
>> > > >
>> > > > Also forgive me but if a document has matches in more searchable
>> fields
>> > > > then should it not rank higher?
>> > > >
>> > > > Thanks,
>> > > > Sandeep
>> > > > On 30 Jan 2013 19:30, "Felipe Lahti" 
>> wrote:
>> > > >
>> > > > > If you compare the first and last document scores you will see
>> that
>> > the
>> > > > > last one matches more fields than first one. So, you maybe
>> thinking
>> > > why?
>> > > > > The first doc only matches "contributions" field and the last
>> > matches a
>> > > > > bunch of fields so if you want to  have behave more like (> > > > > name="qf">series_title^500 title^100 description^15
>> > contribution)
>> > > > you
>> > > > > have to override the method of DefaultSimilarity.
>> > > > >
>> > > > >
>> > > > > On Wed, Jan 30, 2013 at 4:12 PM, Sandeep Mestry <
>> sanmes...@gmail.com
>> > >
>> > > > > wrote:
>> > > > >
>> > > > > > I have pasted it below and it is slightly variant from the
>> dismax
>> > > > > > configuration I have mentioned above as I was playing with all
>> > sorts
>> > > of
>> > > > > > boost values, however it looks more lie below:
>> > > > > >
>> > > > > > 
>> > > > > > 2675.7844 = (MATCH) sum of: 2675.7844 = (MATCH) max plus 0.01
>> times
>> > > > > others
>> > > > > > of: 2675.7844 = (MATCH) weight(contributions:news in 63298)
>> > > > > > [DefaultSimilarity], result of: 2675.7844 =
>> > score(doc=63298,freq=1.0
>> > > =
>> > > > > > termFreq=1.0 ), product of: 0.004495774 = queryWeight, product
>> of:
>> > > > > > 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 =
>> > queryNorm
>> > > > > > 595177.7 = fieldWeight in 63298, product of: 1.0 = tf(freq=1.0),
>> > with
>> > > > > freq
>> > > > > > of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14,

Re: Possible issue in edismax?

2013-02-01 Thread Sandeep Mestry

Brilliant!  Thanks very much for your response. .
On 1 Feb 2013 20:37, "Felipe Lahti"  wrote:

> It's not necessary. It's only query time.
>
>
> On Fri, Feb 1, 2013 at 5:00 PM, Sandeep Mestry 
> wrote:
>
> > Hi..
> >
> > Could you tell me if changing default similarity to custom implementation
> > will require me to rebuild the index? Or will it be used only query time?
> >
> > thanks,
> > Sandeep
> >  On 31 Jan 2013 13:55, "Felipe Lahti"  wrote:
> >
> > > So, it depends of your business requirement, right? If a document has
> > > matches in more searchable fields, at least for me, this document is
> more
> > > important than other document that has less matches.
> > >
> > > Example:
> > > Put this in your schema:
> > > 
> > >
> > > And create a class in your classpath of your Solr:
> > >
> > > package com.your.namespace;
> > >
> > > import org.apache.lucene.search.similarities.DefaultSimilarity;
> > >
> > > public class NoIDFSimilarity extends DefaultSimilarity {
> > >
> > > @Override
> > >
> > > public float idf(long docFreq, long numDocs) {
> > >
> > > return 1;
> > >
> > > }
> > >
> > > }
> > >
> > >
> > > It will "neutralize" the idf (which is the rarity of term).
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Jan 31, 2013 at 5:31 AM, Sandeep Mestry 
> > > wrote:
> > >
> > > > Thanks Felipe..
> > > > Can you point me an example please?
> > > >
> > > > Also forgive me but if a document has matches in more searchable
> fields
> > > > then should it not rank higher?
> > > >
> > > > Thanks,
> > > > Sandeep
> > > > On 30 Jan 2013 19:30, "Felipe Lahti" 
> wrote:
> > > >
> > > > > If you compare the first and last document scores you will see that
> > the
> > > > > last one matches more fields than first one. So, you maybe thinking
> > > why?
> > > > > The first doc only matches "contributions" field and the last
> > matches a
> > > > > bunch of fields so if you want to  have behave more like ( > > > > name="qf">series_title^500 title^100 description^15
> > contribution)
> > > > you
> > > > > have to override the method of DefaultSimilarity.
> > > > >
> > > > >
> > > > > On Wed, Jan 30, 2013 at 4:12 PM, Sandeep Mestry <
> sanmes...@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > I have pasted it below and it is slightly variant from the dismax
> > > > > > configuration I have mentioned above as I was playing with all
> > sorts
> > > of
> > > > > > boost values, however it looks more lie below:
> > > > > >
> > > > > > 
> > > > > > 2675.7844 = (MATCH) sum of: 2675.7844 = (MATCH) max plus 0.01
> times
> > > > > others
> > > > > > of: 2675.7844 = (MATCH) weight(contributions:news in 63298)
> > > > > > [DefaultSimilarity], result of: 2675.7844 =
> > score(doc=63298,freq=1.0
> > > =
> > > > > > termFreq=1.0 ), product of: 0.004495774 = queryWeight, product
> of:
> > > > > > 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 =
> > queryNorm
> > > > > > 595177.7 = fieldWeight in 63298, product of: 1.0 = tf(freq=1.0),
> > with
> > > > > freq
> > > > > > of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14,
> > maxDocs=11282414)
> > > > > > 40960.0 = fieldNorm(doc=63298)
> > > > > > 
> > > > > > 
> > > > > > 2317.297 = (MATCH) sum of: 2317.297 = (MATCH) max plus 0.01 times
> > > > others
> > > > > > of: 2317.297 = (MATCH) weight(contributions:news in 9826415)
> > > > > > [DefaultSimilarity], result of: 2317.297 =
> > > score(doc=9826415,freq=3.0 =
> > > > > > termFreq=3.0 ), product of: 0.004495774 = queryWeight, product
> of:
> > > > > > 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 =
> > queryNorm
> > > > > > 515439.0 = fieldWeight in 9826415, product of: 1.732050

Re: Possible issue in edismax?

2013-02-01 Thread Sandeep Mestry

Hi..

Could you tell me if changing default similarity to custom implementation
will require me to rebuild the index? Or will it be used only query time?

thanks,
Sandeep
 On 31 Jan 2013 13:55, "Felipe Lahti"  wrote:

> So, it depends of your business requirement, right? If a document has
> matches in more searchable fields, at least for me, this document is more
> important than other document that has less matches.
>
> Example:
> Put this in your schema:
> 
>
> And create a class in your classpath of your Solr:
>
> package com.your.namespace;
>
> import org.apache.lucene.search.similarities.DefaultSimilarity;
>
> public class NoIDFSimilarity extends DefaultSimilarity {
>
> @Override
>
> public float idf(long docFreq, long numDocs) {
>
> return 1;
>
> }
>
> }
>
>
> It will "neutralize" the idf (which is the rarity of term).
>
>
>
>
>
>
> On Thu, Jan 31, 2013 at 5:31 AM, Sandeep Mestry 
> wrote:
>
> > Thanks Felipe..
> > Can you point me an example please?
> >
> > Also forgive me but if a document has matches in more searchable fields
> > then should it not rank higher?
> >
> > Thanks,
> > Sandeep
> > On 30 Jan 2013 19:30, "Felipe Lahti"  wrote:
> >
> > > If you compare the first and last document scores you will see that the
> > > last one matches more fields than first one. So, you maybe thinking
> why?
> > > The first doc only matches "contributions" field and the last matches a
> > > bunch of fields so if you want to  have behave more like ( > > name="qf">series_title^500 title^100 description^15 contribution)
> > you
> > > have to override the method of DefaultSimilarity.
> > >
> > >
> > > On Wed, Jan 30, 2013 at 4:12 PM, Sandeep Mestry 
> > > wrote:
> > >
> > > > I have pasted it below and it is slightly variant from the dismax
> > > > configuration I have mentioned above as I was playing with all sorts
> of
> > > > boost values, however it looks more lie below:
> > > >
> > > > 
> > > > 2675.7844 = (MATCH) sum of: 2675.7844 = (MATCH) max plus 0.01 times
> > > others
> > > > of: 2675.7844 = (MATCH) weight(contributions:news in 63298)
> > > > [DefaultSimilarity], result of: 2675.7844 = score(doc=63298,freq=1.0
> =
> > > > termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
> > > > 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> > > > 595177.7 = fieldWeight in 63298, product of: 1.0 = tf(freq=1.0), with
> > > freq
> > > > of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
> > > > 40960.0 = fieldNorm(doc=63298)
> > > > 
> > > > 
> > > > 2317.297 = (MATCH) sum of: 2317.297 = (MATCH) max plus 0.01 times
> > others
> > > > of: 2317.297 = (MATCH) weight(contributions:news in 9826415)
> > > > [DefaultSimilarity], result of: 2317.297 =
> score(doc=9826415,freq=3.0 =
> > > > termFreq=3.0 ), product of: 0.004495774 = queryWeight, product of:
> > > > 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> > > > 515439.0 = fieldWeight in 9826415, product of: 1.7320508 =
> > tf(freq=3.0),
> > > > with freq of: 3.0 = termFreq=3.0 14.530705 = idf(docFreq=14,
> > > > maxDocs=11282414) 20480.0 = fieldNorm(doc=9826415)
> > > > 
> > > > 
> > > > 2140.6274 = (MATCH) sum of: 2140.6274 = (MATCH) max plus 0.01 times
> > > others
> > > > of: 2140.6274 = (MATCH) weight(contributions:news in 9882325)
> > > > [DefaultSimilarity], result of: 2140.6274 =
> score(doc=9882325,freq=1.0
> > =
> > > > termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
> > > > 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> > > > 476142.16 = fieldWeight in 9882325, product of: 1.0 = tf(freq=1.0),
> > with
> > > > freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14,
> > maxDocs=11282414)
> > > > 32768.0 = fieldNorm(doc=9882325)
> > > > 
> > > > 
> > > > 1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times
> > > others
> > > > of: 1605.4707 = (MATCH) weight(contributions:news in 220007)
> > > > [DefaultSimilarity], result of: 1605.4707 =
> score(doc=220007,freq=1.0 =
> > > > termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
&g

Re: Possible issue in edismax?

2013-01-31 Thread Sandeep Mestry

Fantastic! Thanks very much.. I will do so accordingly and will let you
know the results.

Thanks again,
Sandeep


On 31 January 2013 13:54, Felipe Lahti  wrote:

> So, it depends of your business requirement, right? If a document has
> matches in more searchable fields, at least for me, this document is more
> important than other document that has less matches.
>
> Example:
> Put this in your schema:
> 
>
> And create a class in your classpath of your Solr:
>
> package com.your.namespace;
>
> import org.apache.lucene.search.similarities.DefaultSimilarity;
>
> public class NoIDFSimilarity extends DefaultSimilarity {
>
> @Override
>
> public float idf(long docFreq, long numDocs) {
>
> return 1;
>
> }
>
> }
>
>
> It will "neutralize" the idf (which is the rarity of term).
>
>
>
>
>
>
> On Thu, Jan 31, 2013 at 5:31 AM, Sandeep Mestry 
> wrote:
>
> > Thanks Felipe..
> > Can you point me an example please?
> >
> > Also forgive me but if a document has matches in more searchable fields
> > then should it not rank higher?
> >
> > Thanks,
> > Sandeep
> > On 30 Jan 2013 19:30, "Felipe Lahti"  wrote:
> >
> > > If you compare the first and last document scores you will see that the
> > > last one matches more fields than first one. So, you maybe thinking
> why?
> > > The first doc only matches "contributions" field and the last matches a
> > > bunch of fields so if you want to  have behave more like ( > > name="qf">series_title^500 title^100 description^15 contribution)
> > you
> > > have to override the method of DefaultSimilarity.
> > >
> > >
> > > On Wed, Jan 30, 2013 at 4:12 PM, Sandeep Mestry 
> > > wrote:
> > >
> > > > I have pasted it below and it is slightly variant from the dismax
> > > > configuration I have mentioned above as I was playing with all sorts
> of
> > > > boost values, however it looks more lie below:
> > > >
> > > > 
> > > > 2675.7844 = (MATCH) sum of: 2675.7844 = (MATCH) max plus 0.01 times
> > > others
> > > > of: 2675.7844 = (MATCH) weight(contributions:news in 63298)
> > > > [DefaultSimilarity], result of: 2675.7844 = score(doc=63298,freq=1.0
> =
> > > > termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
> > > > 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> > > > 595177.7 = fieldWeight in 63298, product of: 1.0 = tf(freq=1.0), with
> > > freq
> > > > of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
> > > > 40960.0 = fieldNorm(doc=63298)
> > > > 
> > > > 
> > > > 2317.297 = (MATCH) sum of: 2317.297 = (MATCH) max plus 0.01 times
> > others
> > > > of: 2317.297 = (MATCH) weight(contributions:news in 9826415)
> > > > [DefaultSimilarity], result of: 2317.297 =
> score(doc=9826415,freq=3.0 =
> > > > termFreq=3.0 ), product of: 0.004495774 = queryWeight, product of:
> > > > 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> > > > 515439.0 = fieldWeight in 9826415, product of: 1.7320508 =
> > tf(freq=3.0),
> > > > with freq of: 3.0 = termFreq=3.0 14.530705 = idf(docFreq=14,
> > > > maxDocs=11282414) 20480.0 = fieldNorm(doc=9826415)
> > > > 
> > > > 
> > > > 2140.6274 = (MATCH) sum of: 2140.6274 = (MATCH) max plus 0.01 times
> > > others
> > > > of: 2140.6274 = (MATCH) weight(contributions:news in 9882325)
> > > > [DefaultSimilarity], result of: 2140.6274 =
> score(doc=9882325,freq=1.0
> > =
> > > > termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
> > > > 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> > > > 476142.16 = fieldWeight in 9882325, product of: 1.0 = tf(freq=1.0),
> > with
> > > > freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14,
> > maxDocs=11282414)
> > > > 32768.0 = fieldNorm(doc=9882325)
> > > > 
> > > > 
> > > > 1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times
> > > others
> > > > of: 1605.4707 = (MATCH) weight(contributions:news in 220007)
> > > > [DefaultSimilarity], result of: 1605.4707 =
> score(doc=220007,freq=1.0 =
> > > > termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
> > > > 14.530705 = idf

Re: Possible issue in edismax?

2013-01-30 Thread Sandeep Mestry

Thanks Felipe..
Can you point me an example please?

Also forgive me but if a document has matches in more searchable fields
then should it not rank higher?

Thanks,
Sandeep
On 30 Jan 2013 19:30, "Felipe Lahti"  wrote:

> If you compare the first and last document scores you will see that the
> last one matches more fields than first one. So, you maybe thinking why?
> The first doc only matches "contributions" field and the last matches a
> bunch of fields so if you want to  have behave more like ( name="qf">series_title^500 title^100 description^15 contribution) you
> have to override the method of DefaultSimilarity.
>
>
> On Wed, Jan 30, 2013 at 4:12 PM, Sandeep Mestry 
> wrote:
>
> > I have pasted it below and it is slightly variant from the dismax
> > configuration I have mentioned above as I was playing with all sorts of
> > boost values, however it looks more lie below:
> >
> > 
> > 2675.7844 = (MATCH) sum of: 2675.7844 = (MATCH) max plus 0.01 times
> others
> > of: 2675.7844 = (MATCH) weight(contributions:news in 63298)
> > [DefaultSimilarity], result of: 2675.7844 = score(doc=63298,freq=1.0 =
> > termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
> > 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> > 595177.7 = fieldWeight in 63298, product of: 1.0 = tf(freq=1.0), with
> freq
> > of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
> > 40960.0 = fieldNorm(doc=63298)
> > 
> > 
> > 2317.297 = (MATCH) sum of: 2317.297 = (MATCH) max plus 0.01 times others
> > of: 2317.297 = (MATCH) weight(contributions:news in 9826415)
> > [DefaultSimilarity], result of: 2317.297 = score(doc=9826415,freq=3.0 =
> > termFreq=3.0 ), product of: 0.004495774 = queryWeight, product of:
> > 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> > 515439.0 = fieldWeight in 9826415, product of: 1.7320508 = tf(freq=3.0),
> > with freq of: 3.0 = termFreq=3.0 14.530705 = idf(docFreq=14,
> > maxDocs=11282414) 20480.0 = fieldNorm(doc=9826415)
> > 
> > 
> > 2140.6274 = (MATCH) sum of: 2140.6274 = (MATCH) max plus 0.01 times
> others
> > of: 2140.6274 = (MATCH) weight(contributions:news in 9882325)
> > [DefaultSimilarity], result of: 2140.6274 = score(doc=9882325,freq=1.0 =
> > termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
> > 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> > 476142.16 = fieldWeight in 9882325, product of: 1.0 = tf(freq=1.0), with
> > freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
> > 32768.0 = fieldNorm(doc=9882325)
> > 
> > 
> > 1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times
> others
> > of: 1605.4707 = (MATCH) weight(contributions:news in 220007)
> > [DefaultSimilarity], result of: 1605.4707 = score(doc=220007,freq=1.0 =
> > termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
> > 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> > 357106.62 = fieldWeight in 220007, product of: 1.0 = tf(freq=1.0), with
> > freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
> > 24576.0 = fieldNorm(doc=220007)
> > 
> > 
> > 1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times
> others
> > of: 1605.4707 = (MATCH) weight(contributions:news in 241151)
> > [DefaultSimilarity], result of: 1605.4707 = score(doc=241151,freq=1.0 =
> > termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
> > 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> > 357106.62 = fieldWeight in 241151, product of: 1.0 = tf(freq=1.0), with
> > freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
> > 24576.0 = fieldNorm(doc=241151)
> > 
> > 
> > id:c208c2b4-1b3e-27b8-e040-a8c00409063a
> > 
> >  
> > 6.5742764 = (MATCH) sum of: 6.5742764 = (MATCH) max plus 0.01 times
> others
> > of: 3.304414 = (MATCH) weight(description:news^25.0 in 967895)
> > [DefaultSimilarity], result of: 3.304414 = score(doc=967895,freq=1.0 =
> > termFreq=1.0 ), product of: 0.042727955 = queryWeight, product of: 25.0 =
> > boost 5.5240083 = idf(docFreq=122362, maxDocs=11282414) 3.093982E-4 =
> > queryNorm 77.33611 = fieldWeight in 967895, product of: 1.0 =
> tf(freq=1.0),
> > with freq of: 1.0 = termFreq=1.0 5.5240083 = idf(docFreq=122362,
> > maxDocs=11282414) 14.0 = fieldNorm(doc=967895) 5.913381 = (MATCH)
> > weight(pg_series_title:news^50.0 in 967895) [DefaultSimilarity], result
> of:
> > 5.913381 = score(doc=967895,f

Re: Possible issue in edismax?

2013-01-30 Thread Sandeep Mestry

 7.2153096 =
idf(docFreq=22548, maxDocs=11282414) 3.093982E-4 = queryNorm 7.2153096 =
fieldWeight in 967895, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 =
termFreq=1.0 7.2153096 = idf(docFreq=22548, maxDocs=11282414) 1.0 =
fieldNorm(doc=967895)



On 30 January 2013 17:55, Felipe Lahti  wrote:

> Let me see if I understood your problem:
>
> By your first e-mail I think you are worried about the returned order of
> documents from Solr. Is that correct? If yes, as I said before it's not
> only the boosting that influence the order of returned documents. There's
> term frequency, IDF(inverse document frequency)... If I understood
> correctly by your first e-mail, you are interested in get rid of IDF. So
> for that, you can create a NoIDFSimilarity class to override the default
> similarity.
>
> Can you paste here the score calculation for one document?
>
>
> On Wed, Jan 30, 2013 at 2:06 PM, Sandeep Mestry wrote:
>
>> (Sorry for in complete reply in my previous mail, didn't know Ctrl F sends
>> an email in Gmail.. ;-))
>>
>> Thanks Felipe, yes I have seen that and my requirement falls for
>>
>> How can I make exact-case matches score higher
>>
>> Example: a query of "Penguin" should score documents containing "Penguin"
>> higher than docs containing "penguin".
>>
>> The general strategy is to index the content twice, using different fields
>> with different fieldTypes (and different analyzers associated with those
>> fieldTypes). One analyzer will contain a lowercase filter for
>> case-insensitive matches, and one will preserve case for exact-case
>> matches.
>>
>> Use copyField <http://wiki.apache.org/solr/SchemaXml#copyField> commands
>> in
>>
>> the schema to index a single input field multiple times.
>>
>> Once the content is indexed into multiple fields that are analyzed
>> differently, query across both
>> fields<http://wiki.apache.org/solr/SolrRelevancyFAQ#multiFieldQuery>
>>
>> .
>>
>> I have added a case insensitive field too to match the exact matches
>> higher, however the result is not even considering the matches in field -
>> forget the exact matching part.
>>
>> And I have tried the debugQuery option as mentioned in my previous mail,
>> and I have also posted the parsed queries. From the debug query, I see
>> that
>> field boosted with lesser factor (contribution) is still resulting higher
>> than the one with higher boost factor (series_title).
>>
>>
>> Thanks,
>>
>> Sandeep
>>
>>
>>
>>
>> On 30 January 2013 16:02, Sandeep Mestry  wrote:
>>
>> > Thanks Felipe, yes I have seen that and my requirement somewhere falls
>> for
>> >
>> >
>> > On 30 January 2013 15:53, Felipe Lahti  wrote:
>> >
>> >> Hi Sandeep,
>> >>
>> >> Quick answer is that not only the boost that you define in your
>> >> requestHandler is taken to calculate the score of each document. There
>> are
>> >> others factors that contribute to score calculation. You can take a
>> look
>> >> here about http://wiki.apache.org/solr/SolrRelevancyFAQ. Also, you can
>> >> see
>> >> using debugQuery=true the score calculation for each document returned.
>> >>
>> >> Let me know you need something else.
>> >>
>> >>
>> >>
>> >> On Wed, Jan 30, 2013 at 1:13 PM, Sandeep Mestry 
>> >> wrote:
>> >>
>> >> > Hi All,
>> >> >
>> >> > I'm facing an issue in relevancy calculation by dismax query parser.
>> >> > The boost factor applied does not work as expected in certain cases
>> when
>> >> > the keyword is generic and by generic I mean, if the keyword is
>> >> appearing
>> >> > many times in the document as well as in the index.
>> >> >
>> >> > I have parser configuration as below:
>> >> >
>> >> > 
>> >> > 
>> >> > edismax
>> >> > explicit
>> >> > 0.01
>> >> > series_title^500 title^100 description^15
>> >> > contribution
>> >> > series_title^200
>> >> > 0
>> >> > *:*
>> >> > 
>> >> > 
>> >> >
>> >> > As you can see above, I&

Re: Possible issue in edismax?

2013-01-30 Thread Sandeep Mestry

(Sorry for in complete reply in my previous mail, didn't know Ctrl F sends
an email in Gmail.. ;-))

Thanks Felipe, yes I have seen that and my requirement falls for

How can I make exact-case matches score higher

Example: a query of "Penguin" should score documents containing "Penguin"
higher than docs containing "penguin".

The general strategy is to index the content twice, using different fields
with different fieldTypes (and different analyzers associated with those
fieldTypes). One analyzer will contain a lowercase filter for
case-insensitive matches, and one will preserve case for exact-case matches.

Use copyField <http://wiki.apache.org/solr/SchemaXml#copyField> commands in
the schema to index a single input field multiple times.

Once the content is indexed into multiple fields that are analyzed
differently, query across both
fields<http://wiki.apache.org/solr/SolrRelevancyFAQ#multiFieldQuery>
.

I have added a case insensitive field too to match the exact matches
higher, however the result is not even considering the matches in field -
forget the exact matching part.

And I have tried the debugQuery option as mentioned in my previous mail,
and I have also posted the parsed queries. From the debug query, I see that
field boosted with lesser factor (contribution) is still resulting higher
than the one with higher boost factor (series_title).

Thanks,

Sandeep

On 30 January 2013 16:02, Sandeep Mestry  wrote:

> Thanks Felipe, yes I have seen that and my requirement somewhere falls for
>
>
> On 30 January 2013 15:53, Felipe Lahti  wrote:
>
>> Hi Sandeep,
>>
>> Quick answer is that not only the boost that you define in your
>> requestHandler is taken to calculate the score of each document. There are
>> others factors that contribute to score calculation. You can take a look
>> here about http://wiki.apache.org/solr/SolrRelevancyFAQ. Also, you can
>> see
>> using debugQuery=true the score calculation for each document returned.
>>
>> Let me know you need something else.
>>
>>
>>
>> On Wed, Jan 30, 2013 at 1:13 PM, Sandeep Mestry 
>> wrote:
>>
>> > Hi All,
>> >
>> > I'm facing an issue in relevancy calculation by dismax query parser.
>> > The boost factor applied does not work as expected in certain cases when
>> > the keyword is generic and by generic I mean, if the keyword is
>> appearing
>> > many times in the document as well as in the index.
>> >
>> > I have parser configuration as below:
>> >
>> > 
>> > 
>> > edismax
>> > explicit
>> > 0.01
>> > series_title^500 title^100 description^15
>> > contribution
>> > series_title^200
>> > 0
>> > *:*
>> > 
>> > 
>> >
>> > As you can see above, I'd expect the documents containing the matches
>> for
>> > series title should rank higher than the ones in contribution.
>> >
>> > This works well, if I type in a query like 'wonderworld' which is a less
>> > occurring term and the series titles rank higher. But, if I type in a
>> > keyword like 'news' which is the most common term in the index, I get
>> hits
>> > in contributions even though I have lots of documents having word news
>> in
>> > series title.
>> >
>> > The field definition is as below:
>> >
>> > > > multiValued="false" />
>> > > > multiValued="false" />
>> > > > multiValued="false" />
>> > > > multiValued="true" />
>> >
>> > > > compressThreshold="10">
>> > 
>> > 
>> > > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>> > 
>> > 
>> > 
>> > 
>> > > > generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>> > 
>> > 
>> > 
>> >
>> > > positionIncrementGap="100"
>> > >
>> > 
>> >

Re: Possible issue in edismax?

2013-01-30 Thread Sandeep Mestry

Thanks Felipe, yes I have seen that and my requirement somewhere falls for


On 30 January 2013 15:53, Felipe Lahti  wrote:

> Hi Sandeep,
>
> Quick answer is that not only the boost that you define in your
> requestHandler is taken to calculate the score of each document. There are
> others factors that contribute to score calculation. You can take a look
> here about http://wiki.apache.org/solr/SolrRelevancyFAQ. Also, you can see
> using debugQuery=true the score calculation for each document returned.
>
> Let me know you need something else.
>
>
>
> On Wed, Jan 30, 2013 at 1:13 PM, Sandeep Mestry 
> wrote:
>
> > Hi All,
> >
> > I'm facing an issue in relevancy calculation by dismax query parser.
> > The boost factor applied does not work as expected in certain cases when
> > the keyword is generic and by generic I mean, if the keyword is appearing
> > many times in the document as well as in the index.
> >
> > I have parser configuration as below:
> >
> > 
> > 
> > edismax
> > explicit
> > 0.01
> > series_title^500 title^100 description^15
> > contribution
> > series_title^200
> > 0
> > *:*
> > 
> > 
> >
> > As you can see above, I'd expect the documents containing the matches for
> > series title should rank higher than the ones in contribution.
> >
> > This works well, if I type in a query like 'wonderworld' which is a less
> > occurring term and the series titles rank higher. But, if I type in a
> > keyword like 'news' which is the most common term in the index, I get
> hits
> > in contributions even though I have lots of documents having word news in
> > series title.
> >
> > The field definition is as below:
> >
> >  > multiValued="false" />
> >  > multiValued="false" />
> >  > multiValued="false" />
> >  > multiValued="true" />
> >
> >  > compressThreshold="10">
> > 
> > 
> >  > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> > 
> > 
> > 
> > 
> >  > generateWordParts="1" generateNumberParts="1" catenateWords="0"
> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> > 
> > 
> > 
> >
> >  positionIncrementGap="100"
> > >
> > 
> > 
> >  > stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
> > catenateWords="1" catenateNumbers="1" catenateAll="1"
> splitOnCaseChange="1"
> > splitOnNumerics="0" preserveOriginal="1" />
> > 
> > 
> > 
> > 
> >  > stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
> > catenateWords="1" catenateNumbers="1" catenateAll="1"
> splitOnCaseChange="1"
> > splitOnNumerics="0" preserveOriginal="1" />
> > 
> > 
> >  
> >
> > I have tried debugging and when I use query term news, I see that matches
> > for contributions are ranked higher than series title. The parsed queries
> > look like below:
> > (Note that I have edited the query as in reality I have lot of fields
> that
> > are searchable and I have only mentioned the fields containing text data
> -
> > rest all contain uuids)
> >
> > 
> > (+DisjunctionMaxQuery((description:news^15.0 | title:news^100.0 |
> > contributions:news | series_title:news^500.0)~0.01) () () () () () () ()
> ()
> > () () () () () () () () () () () () () () () () () () () ())/no_coord
> > 
> > 
> > +(description:news^15 | title:news^100.0 | contributions:news |
> > series_title:news^500.0)~0.01 () () () () () () () () () () () () () ()
> ()
> > () () () () () () () () () () () () ()
> >
> >
> > Could you guide me in right direction please?
> >
> > Many Thanks,
> > Sandeep
> >
>
>
>
> --
> Felipe Lahti
> Consultant Developer - ThoughtWorks Porto Alegre
>

Possible issue in edismax?

2013-01-30 Thread Sandeep Mestry

Hi All,

I'm facing an issue in relevancy calculation by dismax query parser.
The boost factor applied does not work as expected in certain cases when
the keyword is generic and by generic I mean, if the keyword is appearing
many times in the document as well as in the index.

I have parser configuration as below:



edismax
explicit
0.01
series_title^500 title^100 description^15
contribution
series_title^200
0
*:*



As you can see above, I'd expect the documents containing the matches for
series title should rank higher than the ones in contribution.

This works well, if I type in a query like 'wonderworld' which is a less
occurring term and the series titles rank higher. But, if I type in a
keyword like 'news' which is the most common term in the index, I get hits
in contributions even though I have lots of documents having word news in
series title.

The field definition is as below:






























 

I have tried debugging and when I use query term news, I see that matches
for contributions are ranked higher than series title. The parsed queries
look like below:
(Note that I have edited the query as in reality I have lot of fields that
are searchable and I have only mentioned the fields containing text data -
rest all contain uuids)


(+DisjunctionMaxQuery((description:news^15.0 | title:news^100.0 |
contributions:news | series_title:news^500.0)~0.01) () () () () () () () ()
() () () () () () () () () () () () () () () () () () () ())/no_coord


+(description:news^15 | title:news^100.0 | contributions:news |
series_title:news^500.0)~0.01 () () () () () () () () () () () () () () ()
() () () () () () () () () () () () ()


Could you guide me in right direction please?

Many Thanks,
Sandeep

Re: ConcurrentModificationException in Solr 3.6.1

2013-01-18 Thread Sandeep Mestry

Hi There, I think Andre has already guided you in your earlier mail..


This should be fixed in 3.6.2 which is available since Dec 25.

>From the release notes:

"Fixed ConcurrentModificationException during highlighting, if all fields
were requested."

André




Von: mechravi25 [mechrav...@yahoo.co.in]
Gesendet: Freitag, 18. Januar 2013 11:10
An: solr-user@lucene.apache.org
Betreff: ConcurrentModificationException in Solr 3.6.1


On 18 January 2013 12:01, mechravi25  wrote:

> Hi all,
>
>
> I am using Solr 3.6.1 version. I am giving a set of requests to solr
> simultaneously. When I check the log file, I noticed the below exception
> stack trace
>
>
> SEVERE: java.util.ConcurrentModificationException
>  at
> java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:761)
>  at java.util.LinkedList$ListItr.next(LinkedList.java:696)
>  at
>
> org.apache.solr.highlight.SolrHighlighter.getHighlightFields(SolrHighlighter.java:106)
>  at
>
> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:369)
>  at
>
> org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131)
>  at
>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:186)
>  at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
>  at
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
>  at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
>  at
>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>  at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>  at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>  at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>  at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>  at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>  at
>
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>  at
>
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>  at
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>  at org.mortbay.jetty.Server.handle(Server.java:326)
>  at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>  at
>
> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
>  at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
>  at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
>  at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>  at
>
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
>  at
>
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
>
> When I searched through the solr issues, I got the following two url's,
>
> https://issues.apache.org/jira/browse/SOLR-2684
> https://issues.apache.org/jira/browse/SOLR-3790
>
> The stack trace given in the second url coincides with the one given above
> so I have applied the code change as given in the below link
>
> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java?r1=1229401&r2=1231606&diff_format=h
>
> The first url's stack trace seems to be different.
> I have two questions here. 1.) Please tell me why this exception stack
> trace
> occurs 2.) IS there any other patch/solution available to overcome this
> exception.
> Please guide me.
>
> Thanks
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/ConcurrentModificationException-in-Solr-3-6-1-tp4034520.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Solr 4 : Optimize very slow

2012-12-06 Thread Sandeep Mestry

Hi All,

I followed the advice Michael and the timings reduced to couple of hours
now from 6-8 hours :-)
I have attached the solrconfig.xml we're using, can you let me know if I'm
missing something..

Thanks,
Sandeep




	LUCENE_40
  
  ${solr.abortOnConfigurationError:true}

  
  
  
  
  
  
  
  
  
   
  

  
  
  ${solr.data.dir:./solr/data}

   


  
  
   
30
	
	
		15
		15
	
	
	
32



  
  1
  
  0
  



 false 

  

  
  



 
  5
	  false


  

  

1024








   


  



true

   
20


200




  
  




  
  



false


2

  

  
  






   

  


  
  

 
   explicit
   
 
  

  

   
  

textSpell


  default
  name
  ./spellchecker


  

  
  


  
  default
  
  org.carrot2.clustering.lingo.LingoClusteringAlgorithm
  
  20


  stc
  org.carrot2.clustering.stc.STCClusteringAlgorithm

  
  
 
   true
   default
   true
   
   name
   id
   
   features
   
   true
   
   
   
   false
 

  clusteringComponent

  
  
  
  

  
  text
  true
  ignored_

  
  true
  links
  ignored_

  


  
  

  
 
  true
 

  termsComponent

  


  
  

string
elevate.xml
  

  
  

  explicit


  elevator

  


  
  

  

  

  
  

  
  


  
  
  

  
  

  standard
  solrpingquery
  all

  

  
  

 explicit 
 true

  

  
   
   
   

 100

   

   
   

  
  70
  
  0.5
  
  [-\w ,/\n\"']{20,200}

   

   
   

 
 

   
  

  
  
5

Re: Incremental Update of index

2012-12-05 Thread Sandeep Mestry

Hi Amit/Shanu,

You can create the solr document for only the updated record and index it
to ensure only the updated record gets indexed.
You need not rebuild indexes from scratch for every record update.

Thanks,
Sandeep

Re: Solr 4 : Optimize very slow

2012-12-05 Thread Sandeep Mestry

@ Walter, the daily optimization was introduced as we saw a decrease in the
performance for searches that happen during the peak hours - when loads of
updates take place on index. The load testing was proved slightly
successfull on optimized indexes. As a matter of fact, the merge factor was
increased from 10 to 30 to make it acceptable.

@Upayavira , thanks for the inputs. I will try to avoid the daily
optimizations however its sort of the workplace policy not to alter
anything except the essential configs for this release of project. I take
your point that the daily optimizations are unnecessary even then its hard
to imagine why they take 6-8 hours a day when previously they were finished
within half an hour.

@Michael, thank for poitning that out, I will try using
solr.NIOFSDirectoryFactory
as currently I'm using the default one. Regarding your questions,
- Nothing has changed between solr 1.4 and solr 4 except the solr config. I
have built 2 separate environments using solr 1.4 and solr 4 with the same
application code, db config etc. and can see the difference in the
optimization timings.
- I will check the solr stats for gc and also during optimization. I see
that the index size reaches to 17 Gig from 8.5G and the CPU utilization
then is the highest..
And I meant WAS only as in Websphere Application Server.

@Otis, a quick google for optimize wunder Erick Otis results in this mail
chain (ha ha !), but I will dig the mail archives, thank you for your
suggestion..

Have a good day all, I will come back with my findings..

Best,
Sandeep


On 5 December 2012 06:07, Walter Underwood  wrote:

> It was not necessary under 1.4. It has never been necessary.
>
> It was not necessary in Ultraseek Server in 1996, using the same merging
> model.
>
> In some cases, it can be a good idea. Since you are continuously updating,
> this is not one of those cases.
>
> wunder
>
> On Dec 4, 2012, at 9:29 PM, Upayavira wrote:
>
> > I tried that search, without success :-(
> >
> > I suspect what Otis was trying to say was to question why you are
> > optimising. Optimise was necessary under 1.4, but with newer Solr, the
> > new TieredMergePolicy does a much better job of handling background
> > merging, reducing the need for optimize. Try just not doing it at all
> > and see if your index actually reaches a point where it is needed.
> >
> > Upayavira
> >
> > On Wed, Dec 5, 2012, at 12:31 AM, Otis Gospodnetic wrote:
> >> Hi,
> >>
> >> You should search the ML archives for : optimize wunder Erick Otis :)
> >>
> >> Is WAS really AWS? If so, if these are new EC2 instances you are
> >> unfortunately unable to do a fair apples to apples comparison. Have you
> >> tried a different set of instances?
> >>
> >> Otis
> >> --
> >> Performance Monitoring - http://sematext.com/spm
> >> On Dec 4, 2012 6:29 PM, "Sandeep Mestry"  wrote:
> >>
> >>> Hi All,
> >>>
> >>> I have recently migrated from solr 1.4 to solr 4 and have done the
> basic
> >>> changes required for solr 4 in solrconfig.xml and schema.xml. I have
> also
> >>> rebuilt the index set for solr 4.
> >>> We run optimize every morning at 4 am and we keep the index updates off
> >>> during this process.
> >>> Previously, with 1.4 - the optimization used to take around 20-30 mins
> per
> >>> shard but now with solr 4, its taking 6-8 hours or even more..
> >>> I have also tested the optimize from solr UI and that takes 6-8 hours
> too..
> >>> The hardware is saeme and, we have deployed solr under WAS.
> >>> There ar 4 shards and every shard contains around 8 - 9 Gig of data
> and we
> >>> are using master-slave configuration with rsync. I have not enabled
> soft
> >>> commit. Also, commiter process is scheduled to run every minute.
> >>>
> >>> I am not sure which part I'm missing, do let me know your inputs
> please.
> >>>
> >>> Many Thanks in advance,
> >>> Sandeep
> >>>
>
> --
> Walter Underwood
> wun...@wunderwood.org
>
>
>
>

Solr 4 : Optimize very slow

2012-12-04 Thread Sandeep Mestry

Hi All,

I have recently migrated from solr 1.4 to solr 4 and have done the basic
changes required for solr 4 in solrconfig.xml and schema.xml. I have also
rebuilt the index set for solr 4.
We run optimize every morning at 4 am and we keep the index updates off
during this process.
Previously, with 1.4 - the optimization used to take around 20-30 mins per
shard but now with solr 4, its taking 6-8 hours or even more..
I have also tested the optimize from solr UI and that takes 6-8 hours too..
The hardware is saeme and, we have deployed solr under WAS.
There ar 4 shards and every shard contains around 8 - 9 Gig of data and we
are using master-slave configuration with rsync. I have not enabled soft
commit. Also, commiter process is scheduled to run every minute.

I am not sure which part I'm missing, do let me know your inputs please.

Many Thanks in advance,
Sandeep

Re: Does SolrCloud support distributed IDFs?

2012-11-28 Thread Sandeep Mestry

Dear All, Can anyone suggest how long it will take to get SOLR-1632 patch
into Solr 4?

Also, it'd be good if someone has used any alternate method like Ultraseek
XPA Java library to calculate the distributed ranking?

Many Thanks,
Sandeep

On 22 October 2012 13:23, Sascha SZOTT  wrote:

> Hi Mark,
>
>
> Mark Miller wrote:
>
>> Still waiting on that issue. I think Andrzej should just update it to
>> trunk and commit - it's option and defaults to off. Go vote :)
>>
> Sounds like the problem is already solved and the remaining work consists
> of code integration? Can somebody estimate how much work that would be?
>
> -Sascha
>

Re: Forming Solr Query for multiple operators against multiple fields

2012-10-23 Thread Sandeep Mestry

Thanks Ahmet, however as I have mentioned in my e-mail, we're using Solr
1.4 here and edismax is supported from Solr 3.1.

:-)

On 23 October 2012 13:42, Ahmet Arslan  wrote:

>
>
> --- On Tue, 10/23/12, Sandeep Mestry  wrote:
>
> > From: Sandeep Mestry 
> > Subject: Forming Solr Query for multiple operators against multiple
> fields
> > To: solr-user@lucene.apache.org
> > Date: Tuesday, October 23, 2012, 2:51 PM
> > Dear All,
> >
> > I have a requirement to search against multiple fields like
> > title,
> > description, annotations, comments, text and the query can
> > contain multiple
> > boolean operators.
> > So, can someone point me out in right direction.
> >
> > If the user enters a query like ,
> >
> > - (day AND world) NOT night
>
> Probably you can make use of (e)dismax query parser.
> http://wiki.apache.org/solr/DisMax
> http://wiki.apache.org/solr/ExtendedDisMax
>
>

Forming Solr Query for multiple operators against multiple fields

2012-10-23 Thread Sandeep Mestry

Dear All,

I have a requirement to search against multiple fields like title,
description, annotations, comments, text and the query can contain multiple
boolean operators.
So, can someone point me out in right direction.

If the user enters a query like ,

- (day AND world) NOT night

I want to form a query:

*(title:day AND title:world NOT title:night) OR (description:day
AND description:world NOT description:night) OR (annotations:day
AND annotations:world NOT annotations:night) OR (comments:day
AND comments:world NOT comments:night) OR (text:day AND text:world
NOT text:night) *

I've tried Lucene MultiFieldQueryParser to form the query and after some
string manipulation tried producing a query as below, however it does not
provide me correct relevancy.

*(title:day OR description:day OR annotations:day OR comments:day OR
text:day) AND (title:world OR description:world OR annotations:world OR
comments:world OR text:world) NOT (title:night OR description:night
OR annotations:night OR comments:night OR text:night)*

For the record, the project is still on Solr 1.4 and hence I'm using
Standard Query Parser (the upgrade is due in coming months). But for now, I
need to make it work for above requirement.

Please suggest if there is any straightforward approach or should I take
the route of writing the QueryGrammar myself?

Many Thanks,
Sandeep

62 matches

Mail list logo