Re: Phrase between quotes with dismax edismax
Thanks Erick for yr quick answer. I am using Solr 3.1 1) I have set the mm parameter to 0 and removed the categories from the search. Thus the query is only for chef de projet and nothing else. But the problem remains, i.e searching for chef de projet gives no results while searching for chef projet gives the right result. Here is an excerpt from the test I made: DISMAX query (q)=(chef de projet) =The Parameters= *queryResponse*=[{responseHeader={status=0,QTime=157, params={facet=true, f.createDate.facet.date.start=NOW/DAY-6DAYS,tie=0.1, facet.limit=4, f.location.facet.limit=3, *q.alt*=*:*, facet.date.other=all, hl=true,version=2, *bq*=[categoryPayloads:category1071^1, categoryPayloads:category10055078^1, categoryPayloads:category10055405^1], fl=*,score, debugQuery=true, facet.field=[soldProvisions, contractTypeText, nafCodeText, createDate, wage, keywords, labelLocation, jobCode, organizationName, requiredExperienceLevelText], *qs*=3, qt=edismax, facet.date.end=NOW/DAY, *mm*=0, facet.mincount=1, facet.date=createDate, *qf*= title^4.0 formattedDescription^2.0 nafCodeText^2.0 jobCodeText^3.0 organizationName^1.0 keywords^3.0 location^1.0 labelLocation^1.0 categoryPayloads^1.0, hl.fl=title, wt=javabin, rows=20, start=0, *q*=(chef de projet), facet.date.gap=+1DAY, *stopwords*=false, *ps*=3}}, The Solr Response response={numFound=0 Debug Info debug={ *rawquerystring*=(chef de projet), *querystring*=(chef de projet), *--- * *parsedquery*= +*DisjunctionMaxQuery*((title:chef de projet~3^4.0 | keywords:chef de projet^3.0 | organizationName:chef de projet | location:chef de projet | formattedDescription:chef de projet~3^2.0 | nafCodeText:chef de projet^2.0 | jobCodeText:chef de projet^3.0 | categoryPayloads:chef de projet~3 | labelLocation:chef de projet)~0.1) *DisjunctionMaxQuery*((title:((chef chef) de (projet) projet)~3^4.0)~0.1) categoryPayloads:category1071 categoryPayloads:category10055078 categoryPayloads:category10055405, *---* *parsedquery_toString*=+(title:chef de projet~3^4.0 | keywords:chef de projet^3.0 | organizationName:chef de projet | location:chef de projet | formattedDescription:chef de projet~3^2.0 | nafCodeText:chef de projet^2.0 | jobCodeText:chef de projet^3.0 | categoryPayloads:chef de projet~3 | labelLocation:chef de projet)~0.1 (title:((chef chef) de (projet) projet)~3^4.0)~0.1 categoryPayloads:category1071 categoryPayloads:category10055078 categoryPayloads:category10055405, explain={}, QParser=ExtendedDismaxQParser,altquerystring=null, *boost_queries*=[categoryPayloads:category1071^1, categoryPayloads:category10055078^1, categoryPayloads:category10055405^1], *parsed_boost_queries*=[categoryPayloads:category1071, categoryPayloads:category10055078, categoryPayloads:category10055405], boostfuncs=null, 2) I tried to remove the bq values but no changes: *querystring*=(chef de projet), *parsedquery*=+*DisjunctionMaxQuery*((title:chef de projet~3^4.0 | keywords:chef de projet^3.0 | organizationName:chef de projet | location:chef de projet | formattedDescription:chef de projet~3^2.0 | nafCodeText:chef de projet^2.0 | jobCodeText:chef de projet^3.0 | categoryPayloads:chef de projet~3 | labelLocation:chef de projet)~0.1) * DisjunctionMaxQuery*((title:((chef chef) de (projet) projet)~3^4.0)~0.1), *parsedquery_toString*=+(title:chef de projet~3^4.0 | keywords:chef de projet^3.0 | organizationName:chef de projet | location:chef de projet | formattedDescription:chef de projet~3^2.0 | nafCodeText:chef de projet^2.0 | jobCodeText:chef de projet^3.0 | categoryPayloads:chef de projet~3 | labelLocation:chef de projet)~0.1 (title:((chef chef) de (projet) projet)~3^4.0)~0.1, 3) and the query which works debug={ *rawquerystring*=(chef projet), *querystring*=(chef projet), *parsedquery*=+*DisjunctionMaxQuery*((title:chef projet~3^4.0 | keywords:chef projet^3.0 | organizationName:chef projet | location:chef projet | formattedDescription:chef projet~3^2.0 | nafCodeText:chef projet^2.0 | jobCodeText:chef projet^3.0 | categoryPayloads:chef projet~3 | labelLocation:chef projet)~0.1) *DisjunctionMaxQuery*((title:((chef chef) (projet) projet)~3^4.0)~0.1), *parsedquery_toString*=+(title:chef projet~3^4.0 | keywords:chef projet^3.0 | organizationName:chef projet | location:chef projet | formattedDescription:chef projet~3^2.0 | nafCodeText:chef projet^2.0 | jobCodeText:chef projet^3.0 | categoryPayloads:chef projet~3 | labelLocation:chef projet)~0.1 (title:((chef chef) (projet) projet)~3^4.0)~0.1, explain={23715081= 14.832518 = (MATCH) sum of: I really don't know how to solve this issue and would appreciate any help Best wishes Jean-Claude On Tue, Nov 15, 2011 at 9:28 PM, Erick Erickson erickerick...@gmail.comwrote: The query re-writing is...er...interesting, and I'll skip that for now... As for why you're not getting results,
Re: Aggregated indexing of updating RSS feeds
All, Can anyone advise how to stop the deleteAll event during a full import? As discussed above using clean=false with Solr 3.4 still seems to trigger a delete of all previous imported data. I want to aggregate the results of multiple imports. Thanks in advance. S -- View this message in context: http://lucene.472066.n3.nabble.com/Aggregated-indexing-of-updating-RSS-feeds-tp3485335p3512260.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can we have lucene regular and fastVectorHiglighter together in solr
(11/11/16 18:58), Shyam Bhaskaran wrote: Hi, Can we use Lucene regular highlighter along with fastVectorHighlighter together in solrconfig.xml (solr) ? -Shyam Yes, you can. See highlighting/ section in solr/example/solr/conf/solrconfig.xml for example. koji -- Check out Query Log Visualizer for Apache Solr http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html http://www.rondhuit.com/en/
Rich document indexing
I am using solr 3.4 and configured my DataImportHandler to get some data from MySql as well as index some rich document from the disk. This is the part of db-data-config file where i am indexing Rich text documents. entity name=resume dataSource=ds-db query=Select name,js_login_id div 25000 as dir from js_resumes where js_login_id='${js_logins.id}' and is_primary = 1 and deleted=0 and mask_cv != 1 pk=resume_name deltaQuery=select js_login_id from js_resumes where modified '${dataimporter.last_index_time}' and is_primary = 1 and deleted=0 parentDeltaQuery=select jsl.id as id from service_request_histories srh,service_requests sr, js_login_screenings jsls, js_logins jsl where jsl.status IN(1,2) and srh.service_request_id = sr.id and jsl.id=jsls.js_login_id and srh.status in ('8','43') and jsls.id=srh.sid and date(srh.created) lt; date_sub(now(),interval 2 day) and jsl.id = '${js_resumes.js_login_id}' entity processor=TikaEntityProcessor tikaConfig=tika-config.xml url=http://localhost/resumes-new/resumes${resume.dir}/${js_logins.id}/${resume.name}; dataSource=ds-file format=text field column=text name=resume / /entity /entity But after some time i get the following error in my error log. It looks like a class missing error, Can anyone tell me which poi jar version would work with tika.0.6. Currently I have poi-3.7.jar. Error which i am getting is this SEVERE: Exception while processing: js_logins document : SolrInputDocument[{id=id(1.0)={100984}, complete_mobile_number=complete_mobile_number(1.0)={+91 9600067575}, emailid=emailid(1.0)={vkry...@gmail.com}, full_name=full_name(1.0)={Venkat Ryali}}]:org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NoSuchMethodError: org.apache.poi.xwpf.usermodel.XWPFParagraph.init(Lorg/openxmlformats/schemas/wordprocessingml/x2006/main/CTP;Lorg/apache/poi/xwpf/usermodel/XWPFDocument;)V at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:669) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:622) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:622) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408) Caused by: java.lang.NoSuchMethodError: org.apache.poi.xwpf.usermodel.XWPFParagraph.init(Lorg/openxmlformats/schemas/wordprocessingml/x2006/main/CTP;Lorg/apache/poi/xwpf/usermodel/XWPFDocument;)V at org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator$MyXWPFParagraph.init(XWPFWordExtractorDecorator.java:163) at org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator$MyXWPFParagraph.init(XWPFWordExtractorDecorator.java:161) at org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractTableContent(XWPFWordExtractorDecorator.java:140) at org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.buildXHTML(XWPFWordExtractorDecorator.java:91) at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:69) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:51) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101) at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:128) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596) ... 7 more -- View this message in context: http://lucene.472066.n3.nabble.com/Rich-document-indexing-tp3512276p3512276.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: OutOfMemoryError when using query with sort
Hi Hamid, i also encounterd the same OOM issue on windows 2003 (32-bits) server... but only 3 millions articles stored in solr. i would like to know your configurations to drive so many records. Many thanks. Best Regards Benson -- View this message in context: http://lucene.472066.n3.nabble.com/OutOfMemoryError-when-using-query-with-sort-tp729437p3512224.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Different maxAnalyzedChars value in solrconfig.xml
(11/11/16 13:12), Shyam Bhaskaran wrote: Hi, Wanted to know whether we can set different maxAnalyzedChars values in the solrconfig.xml based on different fields. Can someone point if this is possible at all, my requirement needs me to set different values for maxAnalyzedChars parameter based on two different field values. For example field type has the value xxx then the maxAnalyzedChars needs to be set to 1MB and if the value is yyy the maxAnalyzedChars needs to set to 3MB. Let me know if this can be done and how to do set this. I don't think it is possible. koji -- Check out Query Log Visualizer for Apache Solr http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html http://www.rondhuit.com/en/
Re: Problems installing Solr PHP extension
Pecl installation is kinda buggy. I installed it ignoring pecl dependencies because I already had them. Try: pecl install -n solr (-n ignores dependencies) And when it prompts for curl and libxml, point the path to where you have installed them, probably in /usr/lib/ Cheers, Adolfo. On Tue, Nov 15, 2011 at 7:27 PM, Travis Low t...@4centurion.com wrote: I know this isn't strictly Solr, but I've been at this for hours and I'm at my wits end. I cannot install the Solr PECL extension ( http://pecl.php.net/package/solr), either by command line pecl install solr or by downloading and using phpize. Always the same error, which I see here: http://www.lmpx.com/nav/article.php/news.php.net/php.qa.reports/24197/read/index.html It boils down to this: PHP Warning: PHP Startup: Unable to load dynamic library '/root/solr-0.9.11/modules/solr.so' - /root/solr-0.9.11/modules/solr.so: undefined symbol: curl_easy_getinfo in Unknown on line 0 I am using the current Solr PECL extension. PHP 5.3.8. Curl 7.21.3. Yes, libcurl and libcurl-dev are both installed, also 7.21.3. Fedora Core 15, patched to current levels. Please help! cheers, Travis -- ** *Travis Low, Director of Development* ** t...@4centurion.com* * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* http://www.centurionresearch.com **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.
Join and faceting by children's attributes
Hello, I currently have a demand for faceting on the children of a join query. My index is set up in a way that there are parent and child documents. The child documents do have the facet information in a (precisely: some) multivalue field(s). The parent documents themselves do not have any of it. As the join query support allows me to do a simple search within the child documents and return documents from the parent document space I thought there probably is a way to figure out the available facet values from the child document space and present both in the result set, but this seems more difficult than I thought it would be. The join query support would allow me to filter on specific child-document-space facet fields, for example: but I can not really find a way to present *which faceting options are available* in the result set in first place. Denormalizing my index in a way that the parent documents would contain the faceting information is not an option at the moment, because I wanted to keep the index more generic, so that there's not one field per attribute but two generic attribute fields (multi-value), that keep the Key/Value pairs, like the following table shows. I need this setup because at index setup time I do not know which attributes for the various products/items will be available. If I now would denormalize a bunch of shoe child items into the parent product it would always contain all possible size/color combinations, even if some of the child products do not meet the initial search term's criteria, e.g. searching above for (title:Sneakers AND desc:cool) should return just facets for size (2), color (2), red (1), blue (1), 40 (1) and 42 (1), which I do postprocess in my client application, so that I know that red and blue are colors and 40 and 42 are sizes. I thought that you cracks might have an idea on how to continue from there. Best, Tobias -- View this message in context: http://lucene.472066.n3.nabble.com/Join-and-faceting-by-children-s-attributes-tp3512629p3512629.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Score Normalization
Perhaps you can solve your usecase by playing with the new eDismax boost parameter, which multiplies the functions with the other score instead of adding. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 5. nov. 2011, at 01:26, sangrish wrote: Hi, I have a (dismax) request handler which has the following 3 scoring components (1 qf 2 bf) : qf = field1^2 field2^3 bf = func1(field3)^2 func2(field4)^3 Both func1 func2 return scores between 0 1. The score returned by textual match (qf) ranges from 0 to NOT_A_FIXED_NUMBER To allow better combination of text match my functions, I want the text score to be normalized between 0 1. Is there any way I can achieve that here? Thanks Sid -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Score-Normalization-tp3481627p3481627.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search in multivalued string field does not work
Hi, Thanks for the suggestions. The index is the same in both the servers. We index using JDBC drivers. We have not modified the request handler in solrconfig on either machine and also after the latest schema update, we have re-indexed the data. *We even checked the analysis page and there is no difference between both the servers and after checking the highlight matches option in the field value, the result was getting highlighted in the term text of Index Analyzer. But still we confused as to why we are not getting the result in the search page.* Actually i forgot to post the dynamic field declaration in my schema file and this is how it is declared. dynamicField name=idx_* type=textgen indexed=true stored=true multiValued=true / dynamicField name=*Facet type=string indexed=true multiValued=true stored=false/ the textgen fieldtype definition is as follows: fieldType name=textgen class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateintegerParts=1 catenateWords=1 catenateintegers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PhoneticFilterFactory encoder=Soundex inject=true/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateintegerParts=1 catenateWords=0 catenateintegers=0 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType We have implemented shards in core DB which is in turn gets a result from shards core(core1 and core2). This actual data is present in core2. We tried all the options in core2 directly as well but with no success. The query is passed as follows : QueryString : idx_ABCFacet:XXX... ABC DEF INFO: [core2] webapp=/solr path=/select params={debugQuery=falsefl=uid,scorestart=0q=idx_ABCFacet:XXX...+ABC+DEFisShard=truewt=javabinfsv=truerows=10version=1} hits=0 status=0 QTime=2 Nov 16, 2011 5:44:17 AM org.apache.solr.core.SolrCore execute INFO: [core1] webapp=/solr path=/select params={debugQuery=falsefl=uid,scorestart=0q=idx_ABCFacet:XXX...+ABC+DEFisShard=truewt=javabinfsv=truerows=10version=1} hits=0 status=0 QTime=0 Nov 16, 2011 5:44:17 AM org.apache.solr.core.SolrCore execute INFO: [db] webapp=/solr path=/select/ params={debugQuery=onindent=onstart=0q=idx_ABCFacet:XXX...+ABC+DEFversion=2.2rows=10} status=0 QTime=64 Also can you please elaborate on the 3rd point *3 Try using Luke to examine the indexes on both servers to determine whether they're the same. * Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Search-in-multivalued-string-field-does-not-work-tp3509458p3512710.html Sent from the Solr - User mailing list archive at Nabble.com.
Problems with AutoSuggest feature(Terms Components)
Hi, When i search for a data i noticed two things 1.) I noticed that *terms.regex=.** in the logs which does a blank search on terms because of the query time is more. Is there anyway to overcome this. My actual query should go like the first one bolded but instead of that it happens like in the second case(the 2nd text highlighted in bold) 2.) Also I noticed that *terms.limit=-1* which is very expensive as it asks solr to return all the terms back. It should be set to 10 or 20 at most. Please provide some suggestions to set the same. Nov 14, 2011 2:04:08 PM org.apache.solr.core.SolrCore execute INFO: [db] webapp=/solr path=/terms params={*terms.regex=ABC\+CCC\+lll*\+data.*terms.regex.flag=case_insensitiveterms.fl=nameFacet} status=0 QTime=935 Nov 14, 2011 2:04:08 PM org.apache.solr.core.SolrCore execute INFO: [core2] webapp=/solr path=/terms params={terms.regex.flag=case_insensitiveshards.qt=/termsterms.fl=nameFacetterms=trueterms.limit=-1terms.regex=ABC\+CCC\+lll\+data.*isShard=trueqt=/termswt=javabinterms.sort=indexversion=1} status=0 QTime=842 Nov 14, 2011 2:04:08 PM org.apache.solr.core.SolrCore execute INFO: [db] webapp=/solr path=/terms params={terms.regex=ABC\+CCC\+lll\+data.*terms.regex.flag=case_insensitiveterms.fl=nameFacet} status=0 QTime=927 Nov 14, 2011 2:04:08 PM org.apache.solr.core.SolrCore execute INFO: [core3] webapp=/solr path=/terms params={terms.regex.flag=case_insensitiveshards.qt=/termsterms.fl=nameFacetterms=trueterms.limit=-1terms.regex=.*isShard=trueqt=/termswt=javabinterms.sort=indexversion=1} status=0 QTime=115 Nov 14, 2011 2:05:55 PM org.apache.solr.core.SolrCore execute INFO: [core1] webapp=/solr path=/terms params={terms.regex.flag=case_insensitiveshards.qt=/termsterms.fl=nameFacetterms=trueterms.limit=-1*terms.regex=.**isShard=trueqt=/termswt=javabinterms.sort=indexversion=1} status=0 QTime=106767 Nov 14, 2011 2:05:55 PM org.apache.solr.core.SolrCore execute INFO: [core4] webapp=/solr path=/terms params={terms.regex.flag=case_insensitiveshards.qt=/termsterms.fl=nameFacetterms=trueterms.limit=-1terms.regex=.*isShard=trueqt=/termswt=javabinterms.sort=indexversion=1} status=0 QTime=106766 Nov 14, 2011 2:05:55 PM org.apache.solr.core.SolrCore execute -- View this message in context: http://lucene.472066.n3.nabble.com/Problems-with-AutoSuggest-feature-Terms-Components-tp3512734p3512734.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Help! - ContentStreamUpdateRequest
Erick, Autocommit is commented out in solrconfig.xml. I have avoided them until after the indexing process is complete. As an experiment I tried committing every n records processed to see if varying n would make a difference, it really didn't change much. My original use case had the client running from the Solr server and streaming the document content over from a web server based on the URL gathered by a query from a backend database. The locking problem appeared there first so I tried moving the client code to the web server to be closer the the documents origin. That helped a little but ended up locking which is where I am now. Solr should be able to index way more documents than the 35K I'm trying to index. It seems from other's accounts they are able to do what I'm trying to do successfully. Therefore I believe I must be doing something extraordinarily dumb. I'll be happy to share any information about my environment or configuration if it will help find my error. Thanks for all of your help. - Tod On 11/15/2011 8:08 PM, Erick Erickson wrote: That's odd. What are your autocommit parameters? And are you either committing or optimizing as part of your program? I'd bump the autocommit parameters up and NOT commit (or optimize) from your client if you are Best Erick On Tue, Nov 15, 2011 at 2:17 PM, Todlistac...@gmail.com wrote: Otis, The files are only part of the payload. The supporting metadata exists in a database. I'm pulling that information, as well as the name and location of the file, from the database and then sending it to a remote Solr instance to be indexed. I've heard Solr would prefer to get documents it needs to index in chunks rather than one at a time as I'm doing now. The one at a time approach is locking up the Solr server at around 700 entries. My thought was if I could chunk them in a batch at a time the lockup will stop and indexing performance would improve. Thanks - Tod On 11/15/2011 12:13 PM, Otis Gospodnetic wrote: Hi, How about just concatenating your files into one? �Would that work for you? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Todlistac...@gmail.com To: solr-user@lucene.apache.org Sent: Monday, November 14, 2011 4:24 PM Subject: Help! - ContentStreamUpdateRequest Could someone take a look at this page: http://wiki.apache.org/solr/ContentStreamUpdateRequestExample ... and tell me what code changes I would need to make to be able to stream a LOT of files at once rather than just one?� It has to be something simple like a collection of some sort but I just can't get it figured out.� Maybe I'm using the wrong class altogether? TIA
Re: How to mix solr query info into the apache httpd logging (reverseproxy)?
Thanks for the answer mixing it up with params will certainly be the easiest solution. Alex -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-mix-solr-query-info-into-the-apache-httpd-logging-reverseproxy-tp3498539p3513097.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problems installing Solr PHP extension
Thanks so much for responding. I tried your suggestion and the pecl build *seems* to go okay, but after restarting Apache, I get this again in the error_log: PHP Warning: PHP Startup: Unable to load dynamic library '/usr/lib64/php/modules/solr.so' - /usr/lib64/php/modules/solr.so: undefined symbol: curl_easy_getinfo in Unknown on line 0 I'm baffled by this because the undefined symbol is in libcurl.so, and I've specified the path to that library. If I can't solve this problem then we'll basically have to write our own PHP Solr client, which would royally suck. cheers, Travis On Wed, Nov 16, 2011 at 7:11 AM, Adolfo Castro Menna adolfo.castrome...@gmail.com wrote: Pecl installation is kinda buggy. I installed it ignoring pecl dependencies because I already had them. Try: pecl install -n solr (-n ignores dependencies) And when it prompts for curl and libxml, point the path to where you have installed them, probably in /usr/lib/ Cheers, Adolfo. On Tue, Nov 15, 2011 at 7:27 PM, Travis Low t...@4centurion.com wrote: I know this isn't strictly Solr, but I've been at this for hours and I'm at my wits end. I cannot install the Solr PECL extension ( http://pecl.php.net/package/solr), either by command line pecl install solr or by downloading and using phpize. Always the same error, which I see here: http://www.lmpx.com/nav/article.php/news.php.net/php.qa.reports/24197/read/index.html It boils down to this: PHP Warning: PHP Startup: Unable to load dynamic library '/root/solr-0.9.11/modules/solr.so' - /root/solr-0.9.11/modules/solr.so: undefined symbol: curl_easy_getinfo in Unknown on line 0 I am using the current Solr PECL extension. PHP 5.3.8. Curl 7.21.3. Yes, libcurl and libcurl-dev are both installed, also 7.21.3. Fedora Core 15, patched to current levels. Please help! cheers, Travis -- ** *Travis Low, Director of Development* ** t...@4centurion.com* * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* http://www.centurionresearch.com **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email. -- ** *Travis Low, Director of Development* ** t...@4centurion.com* * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* http://www.centurionresearch.com **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.
Re: Problems installing Solr PHP extension
Am 16.11.2011 17:11, schrieb Travis Low: If I can't solve this problem then we'll basically have to write our own PHP Solr client, which would royally suck. Oh, if you really can't get the library work, no problem - there are several PHP clients out there that don't need a PECL installation. Personally, I have used http://code.google.com/p/solr-php-client/, it works well. -Kuli
Re: Problems installing Solr PHP extension
Ah, ausgezeichnet, thank you Kuli! We'll just use that. On Wed, Nov 16, 2011 at 11:35 AM, Michael Kuhlmann k...@solarier.de wrote: Am 16.11.2011 17:11, schrieb Travis Low: If I can't solve this problem then we'll basically have to write our own PHP Solr client, which would royally suck. Oh, if you really can't get the library work, no problem - there are several PHP clients out there that don't need a PECL installation. Personally, I have used http://code.google.com/p/solr-**php-client/http://code.google.com/p/solr-php-client/, it works well. -Kuli -- ** *Travis Low, Director of Development* ** t...@4centurion.com* * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* http://www.centurionresearch.com **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.
Re: Easy way to tell if there are pending documents
You can enable the stats handler (https://issues.apache.org/jira/browse/SOLR-1750), and get inspect the json pragmatically. -- Justin Latter, Antoine antoine.lat...@legis.wisconsin.gov writes: Thank you, that does help - but I am more looking for a way to get at this programmatically. -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, November 15, 2011 11:22 AM To: solr-user@lucene.apache.org Subject: Re: Easy way to tell if there are pending documents Antoine, On Solr Admin Stats page search for docsPending. I think this is what you are looking for. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Latter, Antoine antoine.lat...@legis.wisconsin.gov To: 'solr-user@lucene.apache.org' solr-user@lucene.apache.org Sent: Monday, November 14, 2011 11:39 AM Subject: Easy way to tell if there are pending documents Hi Solr, Does anyone know of an easy way to tell if there are pending documents waiting for commit? Our application performs operations that are never safe to perform while commits are pending. We make this work by making sure that all indexing operations end in a commit, and stop the unsafe operations from running while a commit is running. This works great most of the time, except when we have enough disk space to add documents to the pending area, but not enough disk space to do a commit - then the indexing operations only error out after they've done all of their adds. It would be nice if the unsafe operation could somehow detect that there are pending documents and abort. In the interim I'll have the unsafe operation perform a commit when it starts, but I've been weeding out useless commits from my app recently and I don't like them creeping back in. Thanks, Antoine
RE: Easy way to tell if there are pending documents
Excellent. It looks like I can drill down into exactly what I want without having to load up the rest of the statistics. -Original Message- From: Justin Caratzas [mailto:justin.carat...@gmail.com] Sent: Wednesday, November 16, 2011 10:41 AM To: solr-user@lucene.apache.org Subject: Re: Easy way to tell if there are pending documents You can enable the stats handler (https://issues.apache.org/jira/browse/SOLR-1750), and get inspect the json pragmatically. -- Justin Latter, Antoine antoine.lat...@legis.wisconsin.gov writes: Thank you, that does help - but I am more looking for a way to get at this programmatically. -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, November 15, 2011 11:22 AM To: solr-user@lucene.apache.org Subject: Re: Easy way to tell if there are pending documents Antoine, On Solr Admin Stats page search for docsPending. I think this is what you are looking for. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Latter, Antoine antoine.lat...@legis.wisconsin.gov To: 'solr-user@lucene.apache.org' solr-user@lucene.apache.org Sent: Monday, November 14, 2011 11:39 AM Subject: Easy way to tell if there are pending documents Hi Solr, Does anyone know of an easy way to tell if there are pending documents waiting for commit? Our application performs operations that are never safe to perform while commits are pending. We make this work by making sure that all indexing operations end in a commit, and stop the unsafe operations from running while a commit is running. This works great most of the time, except when we have enough disk space to add documents to the pending area, but not enough disk space to do a commit - then the indexing operations only error out after they've done all of their adds. It would be nice if the unsafe operation could somehow detect that there are pending documents and abort. In the interim I'll have the unsafe operation perform a commit when it starts, but I've been weeding out useless commits from my app recently and I don't like them creeping back in. Thanks, Antoine
strange behavior of scores and term proximity use
Hi, For this term proximity query: ab_main_title_l0:to be or not to be~1000 http://localhost:/solr/select?q=ab_main_title_l0%3A%22og54ct8n+to+be+or+not+to+be+5w8ojsx2%22~1000sort=score+descstart=0rows=3fl=ab_main_title_l0%2Cscore%2CiddebugQuery=true The third first results are the following one: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime5/int /lst result name=response numFound=318 start=0 maxScore=3.0814114 doc long name=id2315190010001021/long arr name=ab_main_title_l0 strog54ct8n To be or not to be a Jew. 5w8ojsx2/str /arr float name=score3.0814114/float/doc doc long name=id2313006480001021/long arr name=ab_main_title_l0 strog54ct8n To be or not to be 5w8ojsx2/str /arr float name=score3.0814114/float/doc doc long name=id2356410250001021/long arr name=ab_main_title_l0 strog54ct8n Rumspringa : to be or not to be Amish / 5w8ojsx2/str /arr float name=score3.0814114/float/doc /result lst name=debug str name=rawquerystringab_main_title_l0:og54ct8n to be or not to be 5w8ojsx2~1000/str str name=querystringab_main_title_l0:og54ct8n to be or not to be 5w8ojsx2~1000/str str name=parsedqueryPhraseQuery(ab_main_title_l0:og54ct8n to be or not to be 5w8ojsx2~1000)/str str name=parsedquery_toStringab_main_title_l0:og54ct8n to be or not to be 5w8ojsx2~1000/str lst name=explain str name=2315190010001021 5.337161 = (MATCH) weight(ab_main_title_l0:og54ct8n to be or not to be 5w8ojsx2~1000 in 378403) [DefaultSimilarity], result of: 5.337161 = fieldWeight in 378403, product of: 0.57735026 = tf(freq=0.3334), with freq of: 0.3334 = phraseFreq=0.3334 29.581549 = idf(), sum of: 1.0012436 = idf(docFreq=3297332, maxDocs=3301436) 3.0405464 = idf(docFreq=429046, maxDocs=3301436) 5.3583193 = idf(docFreq=42257, maxDocs=3301436) 4.3826413 = idf(docFreq=112108, maxDocs=3301436) 6.3982043 = idf(docFreq=14937, maxDocs=3301436) 3.0405464 = idf(docFreq=429046, maxDocs=3301436) 5.3583193 = idf(docFreq=42257, maxDocs=3301436) 1.0017256 = idf(docFreq=3295743, maxDocs=3301436) 0.3125 = fieldNorm(doc=378403) /str str name=2313006480001021 9.244234 = (MATCH) weight(ab_main_title_l0:og54ct8n to be or not to be 5w8ojsx2~1000 in 482807) [DefaultSimilarity], result of: 9.244234 = fieldWeight in 482807, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = phraseFreq=1.0 29.581549 = idf(), sum of: 1.0012436 = idf(docFreq=3297332, maxDocs=3301436) 3.0405464 = idf(docFreq=429046, maxDocs=3301436) 5.3583193 = idf(docFreq=42257, maxDocs=3301436) 4.3826413 = idf(docFreq=112108, maxDocs=3301436) 6.3982043 = idf(docFreq=14937, maxDocs=3301436) 3.0405464 = idf(docFreq=429046, maxDocs=3301436) 5.3583193 = idf(docFreq=42257, maxDocs=3301436) 1.0017256 = idf(docFreq=3295743, maxDocs=3301436) 0.3125 = fieldNorm(doc=482807) /str str name=2356410250001021 5.337161 = (MATCH) weight(ab_main_title_l0:og54ct8n to be or not to be 5w8ojsx2~1000 in 1317563) [DefaultSimilarity], result of: 5.337161 = fieldWeight in 1317563, product of: 0.57735026 = tf(freq=0.3334), with freq of: 0.3334 = phraseFreq=0.3334 29.581549 = idf(), sum of: 1.0012436 = idf(docFreq=3297332, maxDocs=3301436) 3.0405464 = idf(docFreq=429046, maxDocs=3301436) 5.3583193 = idf(docFreq=42257, maxDocs=3301436) 4.3826413 = idf(docFreq=112108, maxDocs=3301436) 6.3982043 = idf(docFreq=14937, maxDocs=3301436) 3.0405464 = idf(docFreq=429046, maxDocs=3301436) 5.3583193 = idf(docFreq=42257, maxDocs=3301436) 1.0017256 = idf(docFreq=3295743, maxDocs=3301436) 0.3125 = fieldNorm(doc=1317563) /str /response The used version is a 4.0 October snapshot. I have 2 questions about the result: - Why debug print and scores in result are different? - What is the expected behavior of this kind of term proximity query? - The debug scores seem to be well ordered but the result scores seem to be wrong. Thanks, Ariel
Re: Phrase between quotes with dismax edismax
Ah, ok I was mis-reading some things. So, let's ignore the category bits for now. Questions: 1 Can you refine down the problem. That is, demonstrate this with a single field and leave out the category stuff. Something like q=title:chef de projet getting no results and q=title:chef projet getting results? The idea is to cycle through all the fields to see if we can hone in on the problem. I'd get rid of any pf parameters of your edismax definition too. I'm after the simplest case that can demonstrate the issue. For that matter, it'd be even easier if you could make this happen with the default searcher ( solr/select?q=title:chef de projet 2 if you can do 1, please post the field definitions from your schema.xml file. One possibility is that you are removing stopwords at index time but not query time or vice-versa, but that's a wild guess. 3 Once you have a field, use the admin/analysis page to see the exact transformations that occur at index and query time to see if anything jumps out. All in all, I suspect you have a field that isn't being parsed as you expect at either index or query time, but as I said above, that's a guess. Best Erick On Wed, Nov 16, 2011 at 5:02 AM, Jean-Claude Dauphin jc.daup...@gmail.com wrote: Thanks Erick for yr quick answer. I am using Solr 3.1 1) I have set the mm parameter to 0 and removed the categories from the search. Thus the query is only for chef de projet and nothing else. But the problem remains, i.e searching for chef de projet gives no results while searching for chef projet gives the right result. Here is an excerpt from the test I made: DISMAX query (q)=(chef de projet) =The Parameters= *queryResponse*=[{responseHeader={status=0,QTime=157, params={facet=true, f.createDate.facet.date.start=NOW/DAY-6DAYS,tie=0.1, facet.limit=4, f.location.facet.limit=3, *q.alt*=*:*, facet.date.other=all, hl=true,version=2, *bq*=[categoryPayloads:category1071^1, categoryPayloads:category10055078^1, categoryPayloads:category10055405^1], fl=*,score, debugQuery=true, facet.field=[soldProvisions, contractTypeText, nafCodeText, createDate, wage, keywords, labelLocation, jobCode, organizationName, requiredExperienceLevelText], *qs*=3, qt=edismax, facet.date.end=NOW/DAY, *mm*=0, facet.mincount=1, facet.date=createDate, *qf*= title^4.0 formattedDescription^2.0 nafCodeText^2.0 jobCodeText^3.0 organizationName^1.0 keywords^3.0 location^1.0 labelLocation^1.0 categoryPayloads^1.0, hl.fl=title, wt=javabin, rows=20, start=0, *q*=(chef de projet), facet.date.gap=+1DAY, *stopwords*=false, *ps*=3}}, The Solr Response response={numFound=0 Debug Info debug={ *rawquerystring*=(chef de projet), *querystring*=(chef de projet), *--- * *parsedquery*= +*DisjunctionMaxQuery*((title:chef de projet~3^4.0 | keywords:chef de projet^3.0 | organizationName:chef de projet | location:chef de projet | formattedDescription:chef de projet~3^2.0 | nafCodeText:chef de projet^2.0 | jobCodeText:chef de projet^3.0 | categoryPayloads:chef de projet~3 | labelLocation:chef de projet)~0.1) *DisjunctionMaxQuery*((title:((chef chef) de (projet) projet)~3^4.0)~0.1) categoryPayloads:category1071 categoryPayloads:category10055078 categoryPayloads:category10055405, *---* *parsedquery_toString*=+(title:chef de projet~3^4.0 | keywords:chef de projet^3.0 | organizationName:chef de projet | location:chef de projet | formattedDescription:chef de projet~3^2.0 | nafCodeText:chef de projet^2.0 | jobCodeText:chef de projet^3.0 | categoryPayloads:chef de projet~3 | labelLocation:chef de projet)~0.1 (title:((chef chef) de (projet) projet)~3^4.0)~0.1 categoryPayloads:category1071 categoryPayloads:category10055078 categoryPayloads:category10055405, explain={}, QParser=ExtendedDismaxQParser,altquerystring=null, *boost_queries*=[categoryPayloads:category1071^1, categoryPayloads:category10055078^1, categoryPayloads:category10055405^1], *parsed_boost_queries*=[categoryPayloads:category1071, categoryPayloads:category10055078, categoryPayloads:category10055405], boostfuncs=null, 2) I tried to remove the bq values but no changes: *querystring*=(chef de projet), *parsedquery*=+*DisjunctionMaxQuery*((title:chef de projet~3^4.0 | keywords:chef de projet^3.0 | organizationName:chef de projet | location:chef de projet | formattedDescription:chef de projet~3^2.0 | nafCodeText:chef de projet^2.0 | jobCodeText:chef de projet^3.0 | categoryPayloads:chef de projet~3 | labelLocation:chef de projet)~0.1) * DisjunctionMaxQuery*((title:((chef chef) de (projet) projet)~3^4.0)~0.1), *parsedquery_toString*=+(title:chef de projet~3^4.0 | keywords:chef de projet^3.0 | organizationName:chef de projet | location:chef de projet | formattedDescription:chef de
Look what i found here...
pHi friend!brI think I found the answer to everyones problems Look at this articlebra href=http://ulysse.co.za/profile/89LeeAlien/;http://ulysse.co.za/profile/89LeeAlien//abrsee you later./p
Re: Dismax and phrases
: I am starting to wonder whether the module giving finnish language support : (lingsoft) might be the cause? It's extremeley possible -- the details relaly matter when debugging things like this. Since i don't have any access to these custom plugins, i don't know what they might be doing, or how they might be affecting the terms produced during analysis to explain why you are getting the structure you are -- but one explanation might be if every term produced by them gets a positionIncrement of 0 ... that would tell the query parser to treat them as alternatives -- it's the same thing SynonymFilter does. you'd have to look at the output from the analysis tool ,feeding your example input into the query analyzer to see what terms it produces (and what attributes those terms have). if it is a position increment issue, then you should see the same OR style query structure (instead of a phrase query) even if you use the default lucene parser and give it a quoted phrase... text_fi:asuntojen hinnat -Hoss
Re: Search in multivalued string field does not work
Attach debugQuery=true to the URL and look at the results, that'll show you what the query parsed as on the actual server. Where did shards come from? I'd advise turning all the shard stuff off until you answer this question and querying the server directly, shards may be confusing the issue. Let's get to the bottom of your query problems before introducing that complexity! By Luke, I mean get a copy of the Luke program, see here: http://code.google.com/p/luke/ Run that program and point it at the index for your severs. It'll allow you to examine the contents of of the indexes at a fairly low level. Look at the fields in question and see if the data you expect to match is, indeed, there. From what you've said, I'd guess it's some difference between the two servers, because on the surface of it I don't see why you'd be seeing the differences you claim. So either what you think is on the servers isn't there, I don't understand the problem or Best Erick On Wed, Nov 16, 2011 at 9:11 AM, mechravi25 mechrav...@yahoo.co.in wrote: Hi, Thanks for the suggestions. The index is the same in both the servers. We index using JDBC drivers. We have not modified the request handler in solrconfig on either machine and also after the latest schema update, we have re-indexed the data. *We even checked the analysis page and there is no difference between both the servers and after checking the highlight matches option in the field value, the result was getting highlighted in the term text of Index Analyzer. But still we confused as to why we are not getting the result in the search page.* Actually i forgot to post the dynamic field declaration in my schema file and this is how it is declared. dynamicField name=idx_* type=textgen indexed=true stored=true multiValued=true / dynamicField name=*Facet type=string indexed=true multiValued=true stored=false/ the textgen fieldtype definition is as follows: fieldType name=textgen class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateintegerParts=1 catenateWords=1 catenateintegers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PhoneticFilterFactory encoder=Soundex inject=true/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateintegerParts=1 catenateWords=0 catenateintegers=0 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType We have implemented shards in core DB which is in turn gets a result from shards core(core1 and core2). This actual data is present in core2. We tried all the options in core2 directly as well but with no success. The query is passed as follows : QueryString : idx_ABCFacet:XXX... ABC DEF INFO: [core2] webapp=/solr path=/select params={debugQuery=falsefl=uid,scorestart=0q=idx_ABCFacet:XXX...+ABC+DEFisShard=truewt=javabinfsv=truerows=10version=1} hits=0 status=0 QTime=2 Nov 16, 2011 5:44:17 AM org.apache.solr.core.SolrCore execute INFO: [core1] webapp=/solr path=/select params={debugQuery=falsefl=uid,scorestart=0q=idx_ABCFacet:XXX...+ABC+DEFisShard=truewt=javabinfsv=truerows=10version=1} hits=0 status=0 QTime=0 Nov 16, 2011 5:44:17 AM org.apache.solr.core.SolrCore execute INFO: [db] webapp=/solr path=/select/ params={debugQuery=onindent=onstart=0q=idx_ABCFacet:XXX...+ABC+DEFversion=2.2rows=10} status=0 QTime=64 Also can you please elaborate on the 3rd point *3 Try using Luke to examine the indexes on both servers to determine whether they're the same. * Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Search-in-multivalued-string-field-does-not-work-tp3509458p3512710.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Similar documents and advantages / disadvantages of MLT / Deduplication
: I index 1000 docs, 5 of them are 95% the same (for example: copy pasted : blog articles from different sources, with slight changes (author name, : etc..)). : But they have differences. : *Now i like to see 1 doc in my result set and the other 4 should be marked : as similar.* Do you actaully want al 1000 docs in your index, or do you want to prevent 4 of the 5 copies of hte doc from being indexed? Either way, if the the TextProfileSignature is doing a good job of identifying the 5 similar docs, then use that at index time. If you want to keep 4/5 out of the index, then use the Deduplcation features to prefent the duplicates from being indexed and your done. If you wnat all docs in the index, then you have to decide how you want to mark docs as similar ... do you want to only have one of those docs appear in all of your results, or do you want all of them in the results but with an indication that there are other similar docs? If the former: then take a look at Grouping and group on your signature field. If the latter, use the MLT component, to find similar docs based on the signature field (ie: mlt.fl=signature_t) https://wiki.apache.org/solr/FieldCollapsing -Hoss
Re: size of data replicated
: query response time. To get a clear picture, I would like to know how : to get the size of data being replicated for each commit. Through the : admin UI, you may read a x of y G data is being replicated; however, : y is the total index size, instead of data being copied over. I : couldn't find the info in the solr logs either. Any idea? maybe i'm missunderstanding your question, but isn't x in your example the number that you are looking for? (ie: how much data was replicated?) -Hoss
Re: maxFieldLength clarifications
:1. is the maxFieldLength parameter deprecated? :2. what is maxFieldLength counting? I understood it's counting tokens :per document (not per field) :3. what if I simply remove the maxFieldLength setting from the :solrconfig? 1. it has been deprecated and will not be used in Solr 4x, but still exists in Solr 3x 2. It should be terms per field per document, not just per document. 3) if you don't specify it in solrconfig.xml it defaults to -1 which means no limit. : From what I see if I remove it from the solrconfig the text values are : still constrained to some bound since if I query the last term in a long : document's text I don't get a match. a) what version of solr are you using? b) double check both the mainIndex and indexDefaults sections of your solrconfig.xml and make sure maxFieldLength isn't in either of them. -Hoss
Re: Solr Score Normalization
: Perhaps you can solve your usecase by playing with the new eDismax : boost parameter, which multiplies the functions with the other score : instead of adding. and FWIW: the boost param of the edismax parser is really just syntactic sugar for using the BoostQParsre wrapped arround an edismax query -- you can wrap it around any query produced by any QParser... q={!edismax qf=foo}barboost=func(asdf) ...is the same as... q={!boost b=func(asdf) v=$qq}qq={!edismax qf=foo}bar -Hoss
Re: to prevent number-of-matching-terms in contributing score
: 1. omitTermFreqAndPositions is very straightforward but if I avoid : positions I'll refuse to serve phrase queries. I had searched for this in but do you really need phrase queries on your cat field? i thought the point was to have simple matching on those terms? : 2. Function query seemed nice (though strange because I never used it : before) and I gave it a few hours but that too did not seem to solve my : requirement. The artificial score we are generating is getting multiplied : into rest of the score which includes score due to cat field as well. (I : can not remove cat from qf as I have to search there). It is only that : I don't want this field's score on the basis of matching tf. I don't think i realized you were using dismax ... if you just want a match on cat to help determine if the document is a match, but not have *any* impact on score, you could just set the qf boost to 0 (ie: qf=title^10 cat^0) but i'm not sure if that's really what you want. : After spending some hours on function queries I finally reached on : following query Honestly: i'm not really following what you tried there because of the formatting applied by your email client ... it seemed to be making tons of hyperlinks out of peices of the URL. Looking at your query explanation however the problem seems to be that you are still using the relevancy score of the matches on the cat field, instead of *just* using hte function boost... : But debugging the query showed that the boost value ($cat_boost) is being : multiplied into a value which is generated with the help of cat field : thus resulting in different scores for 1 and 3 (similarly for 2 and 4). : : 1.2942866 = (MATCH) boost(+(title:chair | cat:chair)~0.01 : (),map(query(cat:chair,def=-1.0),0.0,1000.0,1.0)), product of: ...my point before was to take cat:chair out of the main part of your query, and *only* put it in the boost function. if you are using dismax, the qf=cat^0 suggestion mentioned above *combined* with your boost function will probably get you what you want (i think) : I was thinking there should be some hook or plugin (or anything) which : could just change the score calculation formula *for a particular field*. : There is a function in DefaultSimilarity class - *public float tf(float : freq)* but that does not mention the field name. Is there a possibility to : look into this direction? on trunk, there is a distinct Similarity object per fieldtype, so you could certain look at that -- but you are correct that in 3x there is no way to override the tf() function on a per field basis. -Hoss
Re: to prevent number-of-matching-terms in contributing score
: 1. omitTermFreqAndPositions is very straightforward but if I avoid : positions I'll refuse to serve phrase queries. I had searched for this in but do you really need phrase queries on your cat field? i thought the point was to have simple matching on those terms? : 2. Function query seemed nice (though strange because I never used it : before) and I gave it a few hours but that too did not seem to solve my : requirement. The artificial score we are generating is getting multiplied : into rest of the score which includes score due to cat field as well. (I : can not remove cat from qf as I have to search there). It is only that : I don't want this field's score on the basis of matching tf. I don't think i realized you were using dismax ... if you just want a match on cat to help determine if the document is a match, but not have *any* impact on score, you could just set the qf boost to 0 (ie: qf=title^10 cat^0) but i'm not sure if that's really what you want. : After spending some hours on function queries I finally reached on : following query Honestly: i'm not really following what you tried there because of the formatting applied by your email client ... it seemed to be making tons of hyperlinks out of peices of the URL. Looking at your query explanation however the problem seems to be that you are still using the relevancy score of the matches on the cat field, instead of *just* using hte function boost... : But debugging the query showed that the boost value ($cat_boost) is being : multiplied into a value which is generated with the help of cat field : thus resulting in different scores for 1 and 3 (similarly for 2 and 4). : : 1.2942866 = (MATCH) boost(+(title:chair | cat:chair)~0.01 : (),map(query(cat:chair,def=-1.0),0.0,1000.0,1.0)), product of: ...my point before was to take cat:chair out of the main part of your query, and *only* put it in the boost function. if you are using dismax, the qf=cat^0 suggestion mentioned above *combined* with your boost function will probably get you what you want (i think) : I was thinking there should be some hook or plugin (or anything) which : could just change the score calculation formula *for a particular field*. : There is a function in DefaultSimilarity class - *public float tf(float : freq)* but that does not mention the field name. Is there a possibility to : look into this direction? on trunk, there is a distinct Similarity object per fieldtype, so you could certain look at that -- but you are correct that in 3x there is no way to override the tf() function on a per field basis. -Hoss
Re: Aggregated indexing of updating RSS feeds
: ..but the request I'm making is.. : /solr/myfeed?command=full-importrows=5000clean=false : : ..note the clean=false. I see it, but i also see this in the logs you provided... : INFO: [] webapp=/solr path=/myfeed params={command=full-import} status=0 : QTime=8 ...which means someone somewhere is executing full-import w/o using clean=false. are you absolutely certain that you are executing the request you think you are? can you find a request in your logs that includes clean=false? if it's not you and your code -- it is comming from somewhere, and that's what's causing DIH to trigger a deleteAll... : 10-Nov-2011 05:40:01 org.apache.solr.handler.dataimport.DataImporter : doFullImport : INFO: Starting Full Import : 10-Nov-2011 05:40:01 org.apache.solr.handler.dataimport.SolrWriter : readIndexerProperties : INFO: Read myfeed.properties : 10-Nov-2011 05:40:01 org.apache.solr.update.DirectUpdateHandler2 deleteAll : INFO: [] REMOVING ALL DOCUMENTS FROM INDEX -Hoss
Re: Add copyTo Field without re-indexing?
Please advise how we can reindex SOLR with having fields stored=false. we can not reindex data from the beginning just want to read and write indexes from the SOLRJ only. Please advise a solution. I know we can do it using lucene classes using indexreader and indexwriter but want to index all fields -- View this message in context: http://lucene.472066.n3.nabble.com/Add-copyTo-Field-without-re-indexing-tp3342253p3515020.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Add copyTo Field without re-indexing?
Am 17.11.2011 08:46, schrieb Kashif Khan: Please advise how we can reindex SOLR with having fields stored=false. we can not reindex data from the beginning just want to read and write indexes from the SOLRJ only. Please advise a solution. I know we can do it using lucene classes using indexreader and indexwriter but want to index all fields This is not possible. At least not when the index is modified in any way (stemmed, lowercased, tokenized, etc.). The original data is not saved when stored is false. You'll need your original source data to reindex then. -Kuli
Explicitly tell Solr the analyzed value when indexing a document
Hi, I have a couple of string fields. For some of them I want from my application to be able to index a lowercased string but store the original value. Is there some way to do this? Or would I have to come up with a new field type and implement an analyzer? /Tim