date:20120413

Please review:
http://wiki.apache.org/solr/UsingMailingLists

Especially the bit about adding debugQuery=on
and showing the results. You're asking people
to guess at solutions without providing much
in the way of context.

You might try looking at your index with Luke to
see what's actually in your index, or perhaps
TermsComponent


Best
Erick

On Fri, Apr 13, 2012 at 2:29 AM, Rajani Maski rajinima...@gmail.com wrote:
 Hi All,

   I tried to index with UTF-8  encode but the issue is still not fixed.
 Please see my inputs below.

 *Indexed XML:*
 ?xml version=1.0 encoding=UTF-8 ?
 add
  doc
    field name=ID0.100/field
    field name=BODYµ/field
  /doc
 /add

 *Search Query - * BODY:µ

 numfound : 0 results obtained.

 *What can be the reason for this? How do i need to make search query so
 that the above document is found.*


 Thanks  Regards

 Regards
 Rajani



 2012/4/2 Rajani Maski rajinima...@gmail.com

 Thank you for the reply.



 On Sat, Mar 31, 2012 at 3:38 AM, Chris Hostetter hossman_luc...@fucit.org
  wrote:


 : We have data having such symbols like :  ต
 : Indexed data has  -    Dose:0 ตL
 : Now , when  it is searched as  - Dose:0 ตL
        ...
 : Query Q value observed  : str name=qS257:0 ยตL/injection/str

 First off: your when searched as example does not match up to your
 Query Q observed value (ie: field queries, extra /injection text at
 the end) suggesting that you maybe cut/paste something you didn't mean to
 -- so take the rest of this advice with a grain of salt.

 If i ignore your when it is searched as exampleand focus entirely on
 what you say you've indexed the data as, and the Q value you are sing (in
 what looks like the echoParams output) then the first thing that jumps out
 at me is that it looks like your servlet container (or perhaps your web
 browser if that's where you tested this) is not dealing with the unicode
 correctly -- because allthough i see a ต in the first three lines i
 quoted above (UTF8: 0xC2 0xB5) in your value observed i'm seeing it
 preceeded by a ย (UTF8: 0xC3 0x82) ... suggesting that perhaps the ต
 did not get URL encoded properly when the request was made to your servlet
 container?

 In particular, you might want to take a look at...


 https://wiki.apache.org/solr/FAQ#Why_don.27t_International_Characters_Work.3F
 http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config
 The example/exampledocs/test_utf8.sh script included with solr




 -Hoss

Re: two structures in solr

bq: Is that right?

I don't know, does it work G? You'll probably want an
additional field for unique id (just named id in the example)
that should be disjoint between your types.

Best
Erick

On Fri, Apr 13, 2012 at 3:41 AM, tkoomzaaskz tomasz.du...@gmail.com wrote:
 Thank you very much Erick for your reply!

 So should it go something like the following:

 http://lucene.472066.n3.nabble.com/file/n3907393/solr_index.png
 sorry for an ugly drawing ;)

 In this example, the index will have 13 columns: 6 for project, 6 for
 contractor and one to define the type. Is that right?

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/two-structures-in-solr-tp3905143p3907393.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Boost differences in two environments for same query and config

Well, next thing I'd do is just copy your entire solr home
directory to the remote machine and try that. If that gives
identical results on both, then try moving just your
solr home/data directory to the remote machine.

I suspect that you've done something different between the two
machines that's leading to this, but haven't a clue what.

If you copy your entire Solr installation over and _still_ get
this kind of thing, we're into whether the JVM or op system
are somehow changing things, which would surprise me a lot.

Best
Erick

On Fri, Apr 13, 2012 at 4:24 AM, Kerwin kerwin...@gmail.com wrote:
 Hi Erick,

 Thanks for your suggestions.
 I did an optimize on the remote installation and this time with the
 same number of documents but still face the same issue as seen from
 the debug output below:

 9.950362E-4 = (MATCH) sum of:
        9.950362E-4 = (MATCH) weight(RECORD_TYPE:info in 35916), product of:
                9.950362E-4 = queryWeight(RECORD_TYPE:info), product of:
                        1.0 = idf(docFreq=58891, maxDocs=8181811)
                        9.950362E-4 = queryNorm
                1.0 = (MATCH) fieldWeight(RECORD_TYPE:info in 35916), product 
 of:
                        1.0 = tf(termFreq(RECORD_TYPE:info)=1)
                        1.0 = idf(docFreq=58891, maxDocs=8181811)
                        1.0 = fieldNorm(field=RECORD_TYPE, doc=35916)
        0.0 = (MATCH) product of:
                1.0945399 = (MATCH) sum of:
                        0.99503624 = (MATCH) weight(CD:ee123^1000.0 in 35916), 
 product of:
                                0.99503624 = queryWeight(CD:ee123^1000.0), 
 product of:
                                        1000.0 = boost
                                        1.0 = idf(docFreq=1, maxDocs=8181811)
                                        9.950362E-4 = queryNorm
                                1.0 = (MATCH) fieldWeight(CD:ee123 in 35916), 
 product of:
                                        1.0 = tf(termFreq(CD:ee123)=1)
                                        1.0 = idf(docFreq=1, maxDocs=8181811)
                                        1.0 = fieldNorm(field=CD, doc=35916)
                                0.09950362 = (MATCH)
 ConstantScoreQuery(QueryWrapperFilter(CD:ee123 CD:ee123c CD:ee123c.
 CD:ee123dc CD:ee123e CD:ee123e. CD:ee123en CD:ee123fx CD:ee123g
 CD:ee123g.1 CD:ee123g1 CD:ee123ee123 CD:ee123l.1 CD:ee123l1 CD:ee123ll
 CD:ee123lr CD:ee123m.z CD:ee123mg CD:ee123mz CD:ee123na CD:ee123nx
 CD:ee123ol CD:ee123op CD:ee123p CD:ee123p.1 CD:ee123p1 CD:ee123pn
 CD:ee123r.1 CD:ee123r1 CD:ee123s CD:ee123s.z CD:ee123sm CD:ee123sn
 CD:ee123sp CD:ee123ss CD:ee123sz)), product of:
                                        100.0 = boost
                                        9.950362E-4 = queryNorm
                0.0 = coord(2/3)


 So I got the conf folder from the remote server location and replaced
 my local conf folder with this one to see if the indexes were formed
 differently but my local installation continues to work.I would expect
 to see the same behaviour as on the remote installation but it did not
 happen. (The only difference on the remote installation is that there
 are cores while my local installation has no cores).
 Anything else I could try?
 Thanks for your help.

 On 4/11/12, Erick Erickson erickerick...@gmail.com wrote:
 Well, you're matching a different number of records, so I have to assume
 your indexes are different on the two machines.

 Here is one case where doing an optimize might make sense, that'll purge
 the data associated with any deleted records from the index which should
 make comparisons better

 Additionally, you have to insure that your request handler is identical
 on both, have you made any changes to solrconfig.xml?

 About the coord (2/3), I'm pretty clueless. But also insure that your
 parsed query is identical on both, which is an additional check on
 whether you've changed something on one server and not the
 other.

 Best
 Erick

 On Wed, Apr 11, 2012 at 8:19 AM, Kerwin kerwin...@gmail.com wrote:
 Hi All,

 I am firing the following Solr query against installations on two
 environments one on my local Windows machine and the other on Unix
 (Remote).

 RECORD_TYPE:info AND (NAME:ee123* OR CD:ee123^1000 OR CD:ee123*^100)

 There are no differences in the DataImportHandler configuration ,
 Schema and Solrconfig for both these installations.
 The correct expected result is given by the local installation of Solr
 which also gives scores as expected for the boosts.

 CORRECT/Expected:
 Debug query output for local installation:

 10.822258 = (MATCH) sum of:
        0.002170282 = (MATCH) weight(RECORD_TYPE:info in 35916), product
 of:
                3.65739E-4 = queryWeight(RECORD_TYPE:info), product of:
                        5.933964 = idf(docFreq=58891, maxDocs=8181811)
                        6.1634855E-5 = queryNorm
                5.933964 = (MATCH) fieldWeight(RECORD_TYPE:info in 35916),
 product of:

Re: Trouble handling Unit symbol

2012-04-13 Thread Rajani Maski

Fine. Thank you. I will look at it.


On Fri, Apr 13, 2012 at 5:21 PM, Erick Erickson erickerick...@gmail.comwrote:

 Please review:
 http://wiki.apache.org/solr/UsingMailingLists

 Especially the bit about adding debugQuery=on
 and showing the results. You're asking people
 to guess at solutions without providing much
 in the way of context.

 You might try looking at your index with Luke to
 see what's actually in your index, or perhaps
 TermsComponent


 Best
 Erick

 On Fri, Apr 13, 2012 at 2:29 AM, Rajani Maski rajinima...@gmail.com
 wrote:
  Hi All,
 
I tried to index with UTF-8  encode but the issue is still not fixed.
  Please see my inputs below.
 
  *Indexed XML:*
  ?xml version=1.0 encoding=UTF-8 ?
  add
   doc
 field name=ID0.100/field
 field name=BODYµ/field
   /doc
  /add
 
  *Search Query - * BODY:µ
 
  numfound : 0 results obtained.
 
  *What can be the reason for this? How do i need to make search query so
  that the above document is found.*
 
 
  Thanks  Regards
 
  Regards
  Rajani
 
 
 
  2012/4/2 Rajani Maski rajinima...@gmail.com
 
  Thank you for the reply.
 
 
 
  On Sat, Mar 31, 2012 at 3:38 AM, Chris Hostetter 
 hossman_luc...@fucit.org
   wrote:
 
 
  : We have data having such symbols like :  ต
  : Indexed data has  -Dose:0 ตL
  : Now , when  it is searched as  - Dose:0 ตL
 ...
  : Query Q value observed  : str name=qS257:0 ยตL/injection/str
 
  First off: your when searched as example does not match up to your
  Query Q observed value (ie: field queries, extra /injection text at
  the end) suggesting that you maybe cut/paste something you didn't mean
 to
  -- so take the rest of this advice with a grain of salt.
 
  If i ignore your when it is searched as exampleand focus entirely on
  what you say you've indexed the data as, and the Q value you are sing
 (in
  what looks like the echoParams output) then the first thing that jumps
 out
  at me is that it looks like your servlet container (or perhaps your web
  browser if that's where you tested this) is not dealing with the
 unicode
  correctly -- because allthough i see a ต in the first three lines i
  quoted above (UTF8: 0xC2 0xB5) in your value observed i'm seeing it
  preceeded by a ย (UTF8: 0xC3 0x82) ... suggesting that perhaps the
 ต
  did not get URL encoded properly when the request was made to your
 servlet
  container?
 
  In particular, you might want to take a look at...
 
 
 
 https://wiki.apache.org/solr/FAQ#Why_don.27t_International_Characters_Work.3F
  http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config
  The example/exampledocs/test_utf8.sh script included with solr
 
 
 
 
  -Hoss

Re: Facets involving multiple fields

Nope. Information about your higher level use-case
would probably be a good thing, this is starting to
smell like an XY problem.

Best
Erick

On Fri, Apr 13, 2012 at 5:48 AM, Marc SCHNEIDER
marc.schneide...@gmail.com wrote:
 Hi,

 Thanks for your answer.
 Yes it works in this case when I know the facet name (Computer). What
 if I want to automatically compute all facets?
 facet.query=keyword:* short_title:* doesn't work, right?

 Marc.

 On Thu, Apr 12, 2012 at 2:08 PM, Erick Erickson erickerick...@gmail.com 
 wrote:
 facet.query=keywords:computer short_title:computer
 seems like what you're asking for.

 On Thu, Apr 12, 2012 at 3:19 AM, Marc SCHNEIDER
 marc.schneide...@gmail.com wrote:
 Hi,

 Thanks for your answer.
 Let's say I have to fields : 'keywords' and 'short_title'.
 For these fields I'd like to make a faceted search : if 'Computer' is
 stored in at least one of these fields for a document I'd like to get
 it added in my results.
 doc1 = keywords : 'Computer' / short_title : 'Computer'
 doc2 = keywords : 'Computer'
 doc3 = short_title : 'Computer'

 In this case I'd like to have : Computer (3)

 I don't see how to solve this with facet.query.

 Thanks,
 Marc.

 On Wed, Apr 11, 2012 at 5:13 PM, Erick Erickson erickerick...@gmail.com 
 wrote:
 Have you considered facet.query? You can specify an arbitrary query
 to facet on which might do what you want. Otherwise, I'm not sure what
 you mean by faceted search using two fields. How should these fields
 be combined into a single facet? What that means practically is not at
 all obvious from your problem statement.

 Best
 Erick

 On Tue, Apr 10, 2012 at 8:55 AM, Marc SCHNEIDER
 marc.schneide...@gmail.com wrote:
 Hi,

 I'd like to make a faceted search using two fields. I want to have a
 single result and not a result by field (like when using
 facet.field=f1,facet.field=f2).
 I don't want to use a copy field either because I want it to be
 dynamic at search time.
 As far as I know this is not possible for Solr 3.x...
 But I saw a new parameter named group.facet for Solr4. Could that
 solve my problem? If yes could somebody give me an example?

 Thanks,
 Marc.

Solr data export to CSV File

2012-04-13 Thread Pavnesh

Hi Team,

 

A very-very thanks to you guy who had developed such a nice product. 

I have one query regarding solr that I have app 36 Million data in my solr
and I wants to export all the data to a csv file but I have found nothing on
the same  so please help me on this topic .

 

 

Regards

Pavnesh

Re: How to read SOLR cache statistics?

Well, the place to start is here:
*stats*: lookups : 98
*hits *: 59
*hitratio *: 0.60
*inserts *: 41
*evictions *: 0
*size *: 41

the important bits are hitratio and evictions.
Caches only really start to show their stuff
when the hit ratio is quite high. That's
the percentage of requests that are satisfied
by entries already in the cache. You want
this number to be as high as possible, +0.90.

evictions are the number of entries that have been
removed from the cache. The pre-configured
number is usually 512, so when the 513th entry
is inserted in the cache, some are removed
to make room and tallied in the evictions
section.

Do note that some of the caches (documentCache
in particular) will rarely have a huge hit ratio due
to its nature, ditto with queryResultCache so you
can temporarily ignore those.

Best
Erick

On Fri, Apr 13, 2012 at 6:28 AM, Kashif Khan uplink2...@gmail.com wrote:
Hi Li Li,

I have been through that WIKI before but that does not explain what is
*evictions*, *inserts*, *cumulative_inserts*, *cumulative_evictions*,
*hitratio *and all. These terms are foreign to me. What does the following
line mean?

*item_ABC :
{field=ABC,memSize=340592,tindexSize=1192,time=1360,phase1=1344,nTerms=7373,bigTerms=1,termInstances=11513,uses=4}
*

I want that kind of explanation. I have read the wiki and the comments in
the solrconfig.xml file about all these things but does say how to read the
stats which is very *important!!!*.

--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-read-SOLR-cache-statistics-tp3907294p3907633.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: performance impact using string or float when querying ranges

Well, I guess my first question is whether using stirngs
is fast enough, in which case there's little reason to
make your life more complex.

But yes, range queries will be significantly faster with
any of the Trie types than with strings. Trie types are
all numeric types.


Best
Erick

On Fri, Apr 13, 2012 at 3:49 AM, crive marco.cr...@gmail.com wrote:
 Hi All,
 is there a big difference in terms of performances when querying a range
 like [50.0 TO *] on a string field compared to a float field?

 At the moment I am using a dynamic field of type string to map some values
 coming from our database and their type can vary depending on the context
 (float/integer/string); it easier to use a dynamic field other than having
 to create a bespoke field for each type of value.

 Marco

Re: Issues with language based indexing

Please review:
http://wiki.apache.org/solr/UsingMailingLists

there's so little information to go on here that I
really can't say anything that isn't a guess.

At a minimum we need the raw input, the
fieldType definitions from your schema,
the results of adding debugQuery=on
to your URL

Best
Erick

On Fri, Apr 13, 2012 at 6:04 AM, JGar jyothi.garladi...@citi.com wrote:
 Hello,

 I am new to Solr. it is resulting some docs in my search for Acciones y
 Valores string. When i go and search for the same word in the given doc
 manually, i could not find those word. Pls help on what basis the doc is
 found in the search .

 Thanks

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Issues-with-language-based-indexing-tp3907601p3907601.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr data export to CSV File

Does this help?

http://wiki.apache.org/solr/CSVResponseWriter

Best
Erick

On Fri, Apr 13, 2012 at 7:59 AM, Pavnesh
pavnesh.ku...@altruistindia.com wrote:
 Hi Team,



 A very-very thanks to you guy who had developed such a nice product.

 I have one query regarding solr that I have app 36 Million data in my solr
 and I wants to export all the data to a csv file but I have found nothing on
 the same  so please help me on this topic .





 Regards

 Pavnesh

RE: Realtime /get versus SearchHandler

2012-04-13 Thread Darren Govoni


Yes

brbrbr--- Original Message ---
On 4/13/2012  06:25 AM Benson Margulies wrote:brA discussion over on the dev 
list led me to expect that the by-if
brfield retrievals in a SolrCloud query would come through the get
brhandler. In fact, I've seen them turn up in my search component in the
brsearch handler that is configured with my custom QT. (I have a
br'prepare' method that sets ShardParams.QT to my QT to get my
brprocessing involved in the first of the two queries.) Did I overthink
brthis?
br
br

RE: Solr data export to CSV File

2012-04-13 Thread Ben McCarthy

A combination of the CSV response writer and SOLRJ to page through all of the 
results sending it to something like apache commons fileutils:

  FileUtils.writeStringToFile(new File(output.csv), outputLine 
(line.separator), true);

Would be quiet quick to knock up in Java.

Thanks
Ben

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 13 April 2012 13:28
To: solr-user@lucene.apache.org
Subject: Re: Solr data export to CSV File

Does this help?

http://wiki.apache.org/solr/CSVResponseWriter

Best
Erick

On Fri, Apr 13, 2012 at 7:59 AM, Pavnesh pavnesh.ku...@altruistindia.com 
wrote:
 Hi Team,



 A very-very thanks to you guy who had developed such a nice product.

 I have one query regarding solr that I have app 36 Million data in my
 solr and I wants to export all the data to a csv file but I have found
 nothing on the same  so please help me on this topic .





 Regards

 Pavnesh







This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: 
Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, 
Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and 
any files transmitted with it are confidential and may be legally privileged, 
and intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify the sender. 
This email message has been swept for the presence of computer viruses.

Re: searching across multiple fields using edismax - am i setting this up right?

2012-04-13 Thread geeky2

thank you for the response.

it seems to be working well ;)

1) i tried your suggestion about removing the qt parameter - 

*somecore/partItemNoSearch*q=dishwasherdebugQuery=onrows=10

but this results in a 404 error message - is there some configuration i am
missing to support this short-hand syntax for specifying the requestHandler
in the url ?



2) ok - good suggestion.



3) yes it looks like it IS searching across all three (3) fields.

i noticed that for the itemNo field, it reduced the search string from
dishwasher to dishwash - it this because of stemming on the field type, used
for the itemNo field?

lst name=debugstr name=rawquerystringdishwasher/strstr
name=querystringdishwasher/strstr
name=parsedquery+DisjunctionMaxQuery((brand:dishwasher^0.5 |
*itemNo:dishwash* | productType:dishwasher^0.8))/strstr
name=parsedquery_toString+(brand:dishwasher^0.5 | itemNo:dishwash |
productType:dishwasher^0.8)/str





--
View this message in context: 
http://lucene.472066.n3.nabble.com/searching-across-multiple-fields-using-edismax-am-i-setting-this-up-right-tp3906334p3907875.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: searching across multiple fields using edismax - am i setting this up right?

as to 1) you have to define your request handler with
a leading /, as in name= /partItemNoSearch. Don't
forget to restart your server.

3) Of course. The input terms MUST be run through
the associated analysis chain to have any hope of
matching correctly.

Best
Erick

On Fri, Apr 13, 2012 at 8:36 AM, geeky2 gee...@hotmail.com wrote:
 thank you for the response.

 it seems to be working well ;)

 1) i tried your suggestion about removing the qt parameter -

 *somecore/partItemNoSearch*q=dishwasherdebugQuery=onrows=10

 but this results in a 404 error message - is there some configuration i am
 missing to support this short-hand syntax for specifying the requestHandler
 in the url ?



 2) ok - good suggestion.



 3) yes it looks like it IS searching across all three (3) fields.

 i noticed that for the itemNo field, it reduced the search string from
 dishwasher to dishwash - it this because of stemming on the field type, used
 for the itemNo field?

 lst name=debugstr name=rawquerystringdishwasher/strstr
 name=querystringdishwasher/strstr
 name=parsedquery+DisjunctionMaxQuery((brand:dishwasher^0.5 |
 *itemNo:dishwash* | productType:dishwasher^0.8))/strstr
 name=parsedquery_toString+(brand:dishwasher^0.5 | itemNo:dishwash |
 productType:dishwasher^0.8)/str





 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/searching-across-multiple-fields-using-edismax-am-i-setting-this-up-right-tp3906334p3907875.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Errors during indexing

2012-04-13 Thread Ben McCarthy

Hello

We have just switched to Solr4 as we needed the ability to return geodist() 
along with our results.

I use a simple multithreaded java app and solr to ingest the data.  We keep 
seeing the following:

13-Apr-2012 15:50:10 org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.SolrException: Error handling 'status' 
action
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:546)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:156)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:359)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:175)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.FileNotFoundException: /usr/solr4/data/index/_2jb.fnm (No 
such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.init(RandomAccaessFile.java:216)
at 
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:219)
at 
org.apache.lucene.codecs.lucene40.Lucene40FieldInfosReader.read(Lucene40FieldInfosReader.java:47)
at 
org.apache.lucene.index.SegmentInfo.loadFieldInfos(SegmentInfo.java:201)
at 
org.apache.lucene.index.SegmentInfo.getFieldInfos(SegmentInfo.java:227)
at org.apache.lucene.index.SegmentInfo.files(SegmentInfo.java:415)
at org.apache.lucene.index.SegmentInfos.files(SegmentInfos.java:756)
at 
org.apache.lucene.index.StandardDirectoryReader$ReaderCommit.init(StandardDirectoryReader.java:369)
at 
org.apache.lucene.index.StandardDirectoryReader.getIndexCommit(StandardDirectoryReader.java:354)
at 
org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:558)
at 
org.apache.solr.handler.admin.CoreAdminHandler.getCoreStatus(CoreAdminHandler.java:816)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:537)
... 16 more


This seems to happen when were using the new admin tool.  Im checking on the 
autocommit handler.

Has anyone seen anything similar?

Thanks
Ben




This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: 
Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, 
Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and 
any files transmitted with it are confidential and may be legally privileged, 
and intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify the sender. 
This email message has been swept for the presence of computer viruses.

RE: solr 3.5 taking long to index

2012-04-13 Thread Rohit

Hi Shawn,

Thanks for the information, let me give this a try, since this is a live box I 
will try it during the weekend and update you.

Regards,
Rohit
Mobile: +91-9901768202
About Me: http://about.me/rohitg


-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: 13 April 2012 11:01
To: solr-user@lucene.apache.org
Subject: Re: solr 3.5 taking long to index

On 4/12/2012 8:42 PM, Rohit wrote:
 The machine has a total ram of around 46GB. My Biggest concern is Solr index 
 time gradually increasing and then the commit stops because of timeouts, out 
 commit rate is very high, but I am not able to find the root cause of the 
 issue.

For good performance, Solr relies on the OS having enough free RAM to keep 
critical portions of the index in the disk cache.  Some numbers that I have 
collected from your information so far are listed below.  
Please let me know if I've got any of this wrong:

46GB total RAM
36GB RAM allocated to Solr
300GB total index size

This leaves only 10GB of RAM free to cache 300GB of index, assuming that this 
server is dedicated to Solr.  The critical portions of your index are very 
likely considerably larger than 10GB, which causes constant reading from the 
disk for queries and updates.  With a high commit rate and a relatively low 
mergeFactor of 10, your index will be doing a lot of merging during updates, 
and some of those merges are likely to be quite large, further complicating the 
I/O situation.

Another thing that can lead to increasing index update times is cache warming, 
also greatly affected by high I/O levels.  If you visit the 
/solr/corename/admin/stats.jsp#cache URL, you can see the warmupTime for each 
cache in milliseconds.

Adding more memory to the server would probably help things.  You'll want to 
carefully check all the server and Solr statistics you can to make sure that 
memory is the root of problem, before you actually spend the money.  At the 
server level, look for things like a high iowait CPU percentage.  For Solr, you 
can turn the logging level up to INFO in the admin interface as well as turn on 
the infostream in solrconfig.xml for extensive debugging.

I hope this is helpful.  If not, I can try to come up with more specific things 
you can look at.

Thanks,
Shawn

Solr is not extracting the CDATA part of xml

I am trying to use method that is suggested in solr forum to remove CDATA
part of xml. but it is not working. result show whole xml content instead of
CDATA part.

schema.xml
fieldType name=text_ws2 class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
charFilter class=solr.HTMLStripCharFilterFactory/
 charFilter class=solr.MappingCharFilterFactory
mapping=mappings.txt/ 
  /analyzer
/fieldType

mappings.txt
 = 

my xml content
body

/body 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908317.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr is not extracting the CDATA part of xml

not sure why CDATA part did not get interpreted. this is how xml content
looks like. I added quotes just to present the exact content xml content.

body/body

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908341.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: performance impact using string or float when querying ranges

2012-04-13 Thread Yonik Seeley

On Fri, Apr 13, 2012 at 8:11 AM, Erick Erickson erickerick...@gmail.com wrote:
 Well, I guess my first question is whether using stirngs
 is fast enough, in which case there's little reason to
 make your life more complex.

 But yes, range queries will be significantly faster with
 any of the Trie types than with strings.

To elaborate on this point a bit... range queries on strings will be
the same speed as a numeric field with precisionStep=0.
You need a precisionStep  0 (so the number will be indexed in
multiple parts) to speed up range queries on numeric fields.  (See
int vs tint in the solr schema).

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10




 Trie types are
 all numeric types.


 Best
 Erick

 On Fri, Apr 13, 2012 at 3:49 AM, crive marco.cr...@gmail.com wrote:
 Hi All,
 is there a big difference in terms of performances when querying a range
 like [50.0 TO *] on a string field compared to a float field?

 At the moment I am using a dynamic field of type string to map some values
 coming from our database and their type can vary depending on the context
 (float/integer/string); it easier to use a dynamic field other than having
 to create a bespoke field for each type of value.

 Marco

mergePolicy element format change in 3.6 vs 3.5?

2012-04-13 Thread Peter Wolanin

Trying to maintain the Drupal integration module across multiple versions
of 3.x, we've gotten a bug report suggesting that Solr 3.6 needs this
change to solrconfig:

-
 mergePolicyorg.apache.lucene.index.LogByteSizeMergePolicy/mergePolicy
+mergePolicy class=org.apache.lucene.index.LogByteSizeMergePolicy /


I don't see this mentioned in the release notes - is the second format
useable with 3.5, 3.4, etc?

-- 
Peter M. Wolanin, Ph.D.  : Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com : 781-313-8322

Get a free, hosted Drupal 7 site: http://www.drupalgardens.com;

RE: mergePolicy element format change in 3.6 vs 3.5?

2012-04-13 Thread Michael Ryan

It looks like the first format was removed in 3.6 as part of 
https://issues.apache.org/jira/browse/SOLR-1052. The second format works in all 
3.x versions.

-Michael

-Original Message-
From: Peter Wolanin [mailto:peter.wola...@acquia.com] 
Sent: Friday, April 13, 2012 12:32 PM
To: solr-user@lucene.apache.org
Subject: mergePolicy element format change in 3.6 vs 3.5?

Trying to maintain the Drupal integration module across multiple versions
of 3.x, we've gotten a bug report suggesting that Solr 3.6 needs this
change to solrconfig:

-
 mergePolicyorg.apache.lucene.index.LogByteSizeMergePolicy/mergePolicy
+mergePolicy class=org.apache.lucene.index.LogByteSizeMergePolicy /

I don't see this mentioned in the release notes - is the second format
useable with 3.5, 3.4, etc?

Re: mergePolicy element format change in 3.6 vs 3.5?

2012-04-13 Thread Peter Wolanin

Ok, thanks for the info.  As long as the second one works, we can just use
that.

I just verified that it works for 3.5 at least.

-Peter

On Fri, Apr 13, 2012 at 1:12 PM, Michael Ryan mr...@moreover.com wrote:

 It looks like the first format was removed in 3.6 as part of
 https://issues.apache.org/jira/browse/SOLR-1052. The second format works
 in all 3.x versions.

 -Michael

 -Original Message-
 From: Peter Wolanin [mailto:peter.wola...@acquia.com]
 Sent: Friday, April 13, 2012 12:32 PM
 To: solr-user@lucene.apache.org
 Subject: mergePolicy element format change in 3.6 vs 3.5?

 Trying to maintain the Drupal integration module across multiple versions
 of 3.x, we've gotten a bug report suggesting that Solr 3.6 needs this
 change to solrconfig:

 -
  mergePolicyorg.apache.lucene.index.LogByteSizeMergePolicy/mergePolicy
 +mergePolicy class=org.apache.lucene.index.LogByteSizeMergePolicy /


 I don't see this mentioned in the release notes - is the second format
 useable with 3.5, 3.4, etc?




-- 
Peter M. Wolanin, Ph.D.  : Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com : 781-313-8322

Get a free, hosted Drupal 7 site: http://www.drupalgardens.com;

Re: Solr is not extracting the CDATA part of xml

Solr does not index arbitrary XML content. There is and XML
form of a solr document that can be sent to Solr, but it is
a specific form of XML.

An example of the XML you're trying to index and what you mean
by not working would be helpful.

Best
Erick

On Fri, Apr 13, 2012 at 11:50 AM, srini softtec...@gmail.com wrote:
 not sure why CDATA part did not get interpreted. this is how xml content
 looks like. I added quotes just to present the exact content xml content.

 body/body

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908341.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr is not extracting the CDATA part of xml

Erick,

Thanks for your reply. when you say Solr does not index arbitery xml
document, then below is the way my xml document looks like which is sitting
in oracle. Could you suggest the best of indexing it ? which method should I
follow? Should I use XPathEntityProcessor?

?xml version=1.0 encoding=UTF-8 ?
message xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;
xmlns=someurl xmlns:csp=someurl.xsd xsi:schemaLocation=somelocation
jar: id=002 message-type=create
content
dsp:row
dsp:channel100/dsp:channel
dsp:role115/dsp:role
/dsp:row

/body/content/message

Thanks in Advance
Erick Erickson wrote

Solr does not index arbitrary XML content. There is and XML
form of a solr document that can be sent to Solr, but it is
a specific form of XML.

An example of the XML you're trying to index and what you mean
by not working would be helpful.

Best
Erick

On Fri, Apr 13, 2012 at 11:50 AM, srini lt;softtech88@gt; wrote:
not sure why CDATA part did not get interpreted. this is how xml content
looks like. I added quotes just to present the exact content xml content.

body/body

--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908341.html
Sent from the Solr - User mailing list archive at Nabble.com.

Erick Erickson wrote

Solr does not index arbitrary XML content. There is and XML
form of a solr document that can be sent to Solr, but it is
a specific form of XML.

An example of the XML you're trying to index and what you mean
by not working would be helpful.

Best
Erick

body/body

--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908341.html
Sent from the Solr - User mailing list archive at Nabble.com.

Erick Erickson wrote

Solr does not index arbitrary XML content. There is and XML
form of a solr document that can be sent to Solr, but it is
a specific form of XML.

An example of the XML you're trying to index and what you mean
by not working would be helpful.

Best
Erick

body/body

--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908341.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908791.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr is not extracting the CDATA part of xml

Right, that will not work at all for direct transmission to
Solr.

You could write a Java program that parses this and sends
it to Solr via SolrJ.

Personally I haven't connected a database to Solr with
XPathEntityProcessor in the mix, but I believe I've seen
messages go by with this configuration. You might want
to search the mail archive...

Best
Erick

On Fri, Apr 13, 2012 at 3:13 PM, srini softtec...@gmail.com wrote:
Erick,

/body/content/message

Thanks in Advance
Erick Erickson wrote

Solr does not index arbitrary XML content. There is and XML
form of a solr document that can be sent to Solr, but it is
a specific form of XML.

An example of the XML you're trying to index and what you mean
by not working would be helpful.

Best
Erick

body/body

--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908341.html
Sent from the Solr - User mailing list archive at Nabble.com.

Erick Erickson wrote

Solr does not index arbitrary XML content. There is and XML
form of a solr document that can be sent to Solr, but it is
a specific form of XML.

An example of the XML you're trying to index and what you mean
by not working would be helpful.

Best
Erick

body/body

--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908341.html
Sent from the Solr - User mailing list archive at Nabble.com.

Erick Erickson wrote

Solr does not index arbitrary XML content. There is and XML
form of a solr document that can be sent to Solr, but it is
a specific form of XML.

An example of the XML you're trying to index and what you mean
by not working would be helpful.

Best
Erick

body/body

--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908341.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908791.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr is not extracting the CDATA part of xml

2012-04-13 Thread Alexander Aristov

This is not solr format. You must re-format your XML into solr XML. you may
find examples on solr wiki or in solr examples dir.

Best Regards
Alexander Aristov

On 13 April 2012 23:13, srini softtec...@gmail.com wrote:

Erick,

/body/content/message

Thanks in Advance
Erick Erickson wrote

Solr does not index arbitrary XML content. There is and XML
form of a solr document that can be sent to Solr, but it is
a specific form of XML.

An example of the XML you're trying to index and what you mean
by not working would be helpful.

Best
Erick

body/body

--
View this message in context:

http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908341.html
Sent from the Solr - User mailing list archive at Nabble.com.

Erick Erickson wrote

Solr does not index arbitrary XML content. There is and XML
form of a solr document that can be sent to Solr, but it is
a specific form of XML.

An example of the XML you're trying to index and what you mean
by not working would be helpful.

Best
Erick

body/body

--
View this message in context:

http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908341.html
Sent from the Solr - User mailing list archive at Nabble.com.

Erick Erickson wrote

Solr does not index arbitrary XML content. There is and XML
form of a solr document that can be sent to Solr, but it is
a specific form of XML.

An example of the XML you're trying to index and what you mean
by not working would be helpful.

Best
Erick

body/body

--
View this message in context:

http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908341.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908791.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr is not extracting the CDATA part of xml

Thanks Again for quick reply. Little curious about the procedure you
suggested. I thought of using same procedure as you suggested. Like writing
a java program to fetch xml record from db and parse the content hand it to
Solr for indexing.

but what if my database content get changed? should I re run my java program
to fetch xml and add to solr for re indexing?

the content of xml format does not match to solr example xml formats. Any
suggestions here?

when I import xml records from oracle and add it to solr and search for a
word, solr is displaying whole xml doc which has that word. what is wrong
with this procedure( I do see my search word in the content of xml, only bad
part is it is displaying whole doc instead CDATA part of it). Please suggest
if there is better of doing this task other than SolrJ

Thanks in Advance
Srini





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908825.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Boosting StandardQuery scores with a subquery?


: I'm having some trouble wrapping my head around boosting StandardQueries.
: It looks like the function: query(subquery, default)
: http://wiki.apache.org/solr/FunctionQuery#query is what I want, but the
: examples seem to focus on just returning a score (e.g. product of popularity
: and the score of the subquery). I assume my difficulty stems from the fact
: that I'd like to retrieve highlighting from one query, but impact score and
: 'relevance' by a different (sub)query.

if your primary concern is just having highlighting on some words, while 
lots of otherwords contribute to the score, then you should take a look at 
the hl.q param introduced in Solr 3.5...

http://wiki.apache.org/solr/HighlightingParameters#hl.q

That lets you completley seperate the two if you'd like.

you cna even use local param syntax to reduce duplication...

  q={!v=$qq}
  qq=content:(roi return on investment return investment~5)
  hl.q={!v=$qq}
  fq=extension:(pdf doc)
  boost=keywords:(financial investment profit loss) 
title:(financial investment profit loss) 
url:(investment investor relations phoenix)

...should work i think.

-Hoss

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-13 Thread Jan Høydahl

Hi,

For a web crawl+search like this you will probably need a lot of additional Big 
Data crunching, so a Hadoop based solution is wise.

In addition to those products mentioned we also now have Amazon's own 
CloudSearch http://aws.amazon.com/cloudsearch/ It's new, is not as cool as Solr 
(not even Lucene based), but gives you the elasticity you request I guess. If 
you run your Hadoop cluster in EC2 already it would be quite efficient to 
batch-load the crawled and processed data into a SearchDomain in the same 
availability zone. However, both cost and features may prohibit this as a 
realistic choice for you.

It would be cool to explore a Hadoop/HDFS + SolrCloud integration. SolrCloud 
would not build the indexes, but be pulling pre-built indexes from HDFS down to 
local disk every time it's told to. Or perhaps the SolrCloud nodes could be 
part of the hadoop cluster, being responsible for the Reduce part building the 
indexes?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 13. apr. 2012, at 04:23, Otis Gospodnetic wrote:

 Hello Ali,
 
 I'm trying to setup a large scale *Crawl + Index + Search *infrastructure
 
 using Nutch and Solr/Lucene. The targeted scale is *5 Billion web pages*,
 crawled + indexed every *4 weeks, *with a search latency of less than 0.5
 seconds.
 
 
 That's fine.  Whether it's doable with any tech will depend on how much 
 hardware you give it, among other things.
 
 Needless to mention, the search index needs to scale to 5Billion pages. It
 is also possible that I might need to store multiple indexes -- one for
 crawled content, and one for ancillary data that is also very large. Each
 of these indices would likely require a logically distributed and
 replicated index.
 
 
 Yup, OK.
 
 However, I would like for such a system to be homogenous with the Hadoop
 infrastructure that is already installed on the cluster (for the crawl). In
 other words, I would much prefer if the replication and distribution of the
 Solr/Lucene index be done automagically on top of Hadoop/HDFS, instead of
 using another scalability framework (such as SolrCloud). In addition, it
 would be ideal if this environment was flexible enough to be dynamically
 scaled based on the size requirements of the index and the search traffic
 at the time (i.e. if it is deployed on an Amazon cluster, it should be easy
 enough to automatically provision additional processing power into the
 cluster without requiring server re-starts).
 
 
 There is no such thing just yet.
 There is no Search+Hadoop/HDFS in a box just yet.  There was an attempt to 
 automatically index HBase content, but that was either not completed or not 
 committed into HBase.
 
 However, I'm not sure which Solr-based tool in the Hadoop ecosystem would
 be ideal for this scenario. I've heard mention of Solr-on-HBase, Solandra,
 Lily, ElasticSearch, IndexTank etc, but I'm really unsure which of these is
 mature enough and would be the right architectural choice to go along with
 a Nutch crawler setup, and to also satisfy the dynamic/auto-scaling aspects
 above.
 
 
 Here is a summary on all of them:
 * Search on HBase - I assume you are referring to the same thing I mentioned 
 above.  Not ready.
 * Solandra - uses Cassandra+Solr, plus DataStax now has a different 
 (commercial) offering that combines search and Cassandra.  Looks good.
 * Lily - data stored in HBase cluster gets indexed to a separate Solr 
 instance(s)  on the side.  Not really integrated the way you want it to be.
 * ElasticSearch - solid at this point, the most dynamic solution today, can 
 scale well (we are working on a mny-B documents index and hundreds of 
 nodes with ElasticSearch right now), etc.  But again, not integrated with 
 Hadoop the way you want it.
 * IndexTank - has some technical weaknesses, not integrated with Hadoop, not 
 sure about its future considering LinkedIn uses Zoie and Sensei already.
 * And there is SolrCloud, which is coming soon and will be solid, but is 
 again not integrated.
 
 If I were you and I had to pick today - I'd pick ElasticSearch if I were 
 completely open.  If I had Solr bias I'd give SolrCloud a try first.
 
 Lastly, how much hardware (assuming a medium sized EC2 instance) would you
 estimate my needing with this setup, for regular web-data (HTML text) at
 this scale?
 
 I don't know off the topic of my head, but I'm guessing several hundred for 
 serving search requests.
 
 HTH,
 
 Otis
 --
 Search Analytics - http://sematext.com/search-analytics/index.html
 
 Scalable Performance Monitoring - http://sematext.com/spm/index.html
 
 
 Any architectural guidance would be greatly appreciated. The more details
 provided, the wider my grin :).
 
 Many many thanks in advance.
 
 Thanks,
 Safdar

Re: Post Sorting hook before the doc slicing.


: Basically, I need to find item X in the result set and return say N items
: before and N items after.
: 
:  - N items -- Item X --- N items 
...
: So I might be wrong, but it looks like the only way would be to create a
: custom SolrIndexSearcher which will find the offset and create the related
: docslice. That slicing part doesn't seem to be well factored that I can
: see, so it seems to imply copy/pasting a significant chunk off the code. Am
: I looking at the wrong place ?

trying to do this as a hook into the SolrIndexSearcher would definitley be 
complicated ... laregley because of how matches are collected.

the most straight forward way i can think of to get the data you want is 
to consider what you are sorting on, and use that as a range filter, ie...

1) do your search, and filter on id:X
2) look at the values X has in the fields you are sorting on
3) search again, this time filter on those fields, asking for the first N 
docs with values greater then whatever id:X has
4) search again, this time reverse your sort, and reverse your filters 
(docs with values less hten whatever id:X has) and get the first N docs.


...even if your sort is score you can use the frange parser to filter 
(not usually recommended for score, but possible)



-Hoss

Re: Can I discover what part of a score is attributable to a subquery?

2012-04-13 Thread John Chee

On Fri, Apr 13, 2012 at 2:40 PM, Benson Margulies bimargul...@gmail.com wrote:
 Given a query including a subquery, is there any way for me to learn
 that subquery's contribution to the overall document score?

 I can provide 'why on earth would anyone ...' if someone wants to know.

Have you tried debugQuery=true?
http://wiki.apache.org/solr/CommonQueryParameters#debugQuery The
'explain' field of the result explains the scoring of each document.

Re: two structures in solr


: I need to store *two big structures* in SOLR: projects and contractors.
: Contractors will search for available projects and project owners will
: search for contractors who would do it for them.

http://wiki.apache.org/solr/MultipleIndexes

: that *I want to have two structures*. I guess running two parallel solr
: instances is not the idea. I took a look at

there's nothing wrong with it, the real question is wether you ever need 
to do things with both sets of documents at once.

if contractors only ever search for projects, and project owners only ever 
serach for contractors, and no one ever searches for a mix of projects and 
contractors at the same time, then i would just suggest using multiple 
SolrCores...

http://wiki.apache.org/solr/MultipleIndexes#MultiCore
http://wiki.apache.org/solr/CoreAdmin


-Hoss

Re: term frequency outweighs exact phrase match

2012-04-13 Thread alxsss

Hello Hoss,

Here are the explain tags for two doc

str name=a0127d8e70a6d523
0.021646015 = (MATCH) sum of:
  0.021646015 = (MATCH) sum of:
0.02141003 = (MATCH) max plus 0.01 times others of:
  2.84194E-4 = (MATCH) weight(content:apache^0.5 in 3578), product of:
0.0029881175 = queryWeight(content:apache^0.5), product of:
  0.5 = boost
  4.3554416 = idf(docFreq=126092, maxDocs=3613605)
  0.0013721307 = queryNorm
0.09510804 = (MATCH) fieldWeight(content:apache in 3578), product of:
  2.236068 = tf(termFreq(content:apache)=5)
  4.3554416 = idf(docFreq=126092, maxDocs=3613605)
  0.009765625 = fieldNorm(field=content, doc=3578)
  0.021407187 = (MATCH) weight(title:apache^1.2 in 3578), product of:
0.01371095 = queryWeight(title:apache^1.2), product of:
  1.2 = boost
  8.327043 = idf(docFreq=2375, maxDocs=3613605)
  0.0013721307 = queryNorm
1.5613205 = (MATCH) fieldWeight(title:apache in 3578), product of:
  1.0 = tf(termFreq(title:apache)=1)
  8.327043 = idf(docFreq=2375, maxDocs=3613605)
  0.1875 = fieldNorm(field=title, doc=3578)
2.359865E-4 = (MATCH) max plus 0.01 times others of:
  2.359865E-4 = (MATCH) weight(content:solr^0.5 in 3578), product of:
0.004071705 = queryWeight(content:solr^0.5), product of:
  0.5 = boost
  5.9348645 = idf(docFreq=25986, maxDocs=3613605)
  0.0013721307 = queryNorm
0.05795766 = (MATCH) fieldWeight(content:solr in 3578), product of:
  1.0 = tf(termFreq(content:solr)=1)
  5.9348645 = idf(docFreq=25986, maxDocs=3613605)
  0.009765625 = fieldNorm(field=content, doc=3578)
/strstr name=d89380e313c64aa5
0.021465056 = (MATCH) sum of:
  1.8154096E-4 = (MATCH) sum of:
6.354771E-5 = (MATCH) max plus 0.01 times others of:
  6.354771E-5 = (MATCH) weight(content:apache^0.5 in 638040), product of:
0.0029881175 = queryWeight(content:apache^0.5), product of:
  0.5 = boost
  4.3554416 = idf(docFreq=126092, maxDocs=3613605)
  0.0013721307 = queryNorm
0.021266805 = (MATCH) fieldWeight(content:apache in 638040), product of:
  1.0 = tf(termFreq(content:apache)=1)
  4.3554416 = idf(docFreq=126092, maxDocs=3613605)
  0.0048828125 = fieldNorm(field=content, doc=638040)
1.1799325E-4 = (MATCH) max plus 0.01 times others of:
  1.1799325E-4 = (MATCH) weight(content:solr^0.5 in 638040), product of:
0.004071705 = queryWeight(content:solr^0.5), product of:
  0.5 = boost
  5.9348645 = idf(docFreq=25986, maxDocs=3613605)
  0.0013721307 = queryNorm
0.02897883 = (MATCH) fieldWeight(content:solr in 638040), product of:
  1.0 = tf(termFreq(content:solr)=1)
  5.9348645 = idf(docFreq=25986, maxDocs=3613605)
  0.0048828125 = fieldNorm(field=content, doc=638040)
  0.021283515 = (MATCH) weight(content:apache solr~1^30.0 in 638040), product 
of:
0.42358932 = queryWeight(content:apache solr~1^30.0), product of:
  30.0 = boost
  10.290306 = idf(content: apache=126092 solr=25986)
  0.0013721307 = queryNorm
0.050245635 = fieldWeight(content:apache solr in 638040), product of:
  1.0 = tf(phraseFreq=1.0)
  10.290306 = idf(content: apache=126092 solr=25986)
  0.0048828125 = fieldNorm(field=content, doc=638040)
/str

 

 

 Although the second doc has exact match it is placed after the first one which 
does not have exact match.

I use the following request handler

requestHandler name=search class=solr.SearchHandler 
lst name=defaults
str name=defTypeedismax/str
str name=echoParamsexplicit/str
float name=tie0.01/float
str name=qfhost^30  content^0.5 title^1.2 anchor^1.2/str
str name=pfcontent^30/str
str name=flurl,id, site ,title/str
str name=mm2lt;-1 5lt;-2 6lt;90%/str
int name=ps1/int
bool name=hltrue/bool
str name=q.alt*:*/str
str name=hl.flcontent/str
str name=f.title.hl.fragsize0/str
str name=hl.fragsize165/str
str name=f.title.hl.alternateFieldtitle/str
str name=f.url.hl.fragsize0/str
str name=f.url.hl.alternateFieldurl/str
str name=f.content.hl.fragmenterregex/str
str name=spellchecktrue/str
str name=spellcheck.collatetrue/str
str name=spellcheck.count5/str
str name=grouptrue/str
str name=group.fieldsite/str
str name=group.ngroupstrue/str
/lst
arr name=last-components
 strspellcheck/str
/arr
/requestHandler


and the query is as follows 

http://localhost:8983/solr/select/?q=apache 
solrversion=2.2start=0rows=10indent=onqt=searchdebugQuery=true

Thanks.
Alex.


-Original Message-
From: Chris Hostetter hossman_luc...@fucit.org
To: solr-user solr-user@lucene.apache.org
Sent: Thu, Apr 12, 2012 7:43 pm
Subject: Re: term frequency outweighs exact phrase match



: I use solr 3.5 with edismax. I have the following issue with phrase 
: search. For example if I have three documents with content like
: 
: 1.apache apache
: 2. solr solr
:

Re: Can I discover what part of a score is attributable to a subquery?

2012-04-13 Thread Benson Margulies

On Fri, Apr 13, 2012 at 6:43 PM, John Chee johnc...@mylife.com wrote:
 On Fri, Apr 13, 2012 at 2:40 PM, Benson Margulies bimargul...@gmail.com 
 wrote:
 Given a query including a subquery, is there any way for me to learn
 that subquery's contribution to the overall document score?

I need this number to be available in a SearchComponent that runs
after QueryComponent.



 I can provide 'why on earth would anyone ...' if someone wants to know.

 Have you tried debugQuery=true?
 http://wiki.apache.org/solr/CommonQueryParameters#debugQuery The
 'explain' field of the result explains the scoring of each document.

Re: Can I discover what part of a score is attributable to a subquery?