RE: DIH import and postImportDeleteQuery
Search the list for my post DIH - deleting documents, high performance (delta) imports, and passing parameters which shows my solution a similar problem. Ephraim Ofir -Original Message- From: Alexandre Rocco [mailto:alel...@gmail.com] Sent: Tuesday, May 24, 2011 11:24 PM To: solr-user@lucene.apache.org Subject: DIH import and postImportDeleteQuery Guys, I am facing a situation in one of our projects that I need to perform a cleanup to remove some documents after we perform an update via DIH. The big issue right now comes from the fact that when we call the DIH with clean=false, the postImportDeleteQuery is not executed. My setup is currently arranged like this: - A SQL Server stored procedure that receives a parameter (specified in the URL) and returns the records to be indexed - The procedure is able to return all the records (for a full-import) or only the updated records (for a delta-import) - This procedure returns valid and deleted records, from this point comes the need to run a postImportDeleteQuery to remove the deleted ones. Everything works fine when I run a full-import, I am running always with clean=true, and then the whole index is rebuilt. When I need to do an incremental update, the records are updated correctly, but the command to delete the other records is not executed. I've tried several combinations, with different results: - Running full-import with clean=false: the records are updated but the ones that needs to be deleted stays on the index - Running delta-import with clean=false: the records are updated but the ones that needs to be deleted stays on the index - Running delta-import with clean=true: all records are deleted from the index and then only the records returned by the procedure are on the index, except the deleted ones. I don't see any way to achieve my goal, without changing the process that I do to obtain the data. Since this is a very complex stored procedure, with tons of joins and custom processing, I am trying everything to avoid messing with it. See below a copy of my data-config.xml file. I made it simpler omitting all the fields, since it's out of scope of the issue: ?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource type=JdbcDataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://myserver;databaseName=mydb;user=username;password= password;responseBuffering=adaptive; / document entity name=entity_one pk=entityid transformer=RegexTransformer query=EXEC some_stored_procedure ${dataimporter.request.someid} preImportDeleteQuery=status:1 postImportDeleteQuery=status:1 field column=field1 name=field1 splitBy=; / field column=field2 name=field2 splitBy=; / field column=field3 name=field3 splitBy=; / /entity entity name=entity_two pk=entityid transformer=RegexTransformer query=EXEC someother_stored_procedure ${dataimporter.request.someotherid} preImportDeleteQuery=status:1 postImportDeleteQuery=status:1 field column=field1 name=field1 / field column=field2 name=field2 / field column=field3 name=field2 / /entity /document /dataConfig Any ideas or pointers that might help on this one? Many thanks, Alexandre
Re: MaxWarming Searcher
Maxwarm searcher should be 2 for best practices. Either Your commit frequency is high or you have autowarming the queries on master too in big numbers. - Thanx: Grijesh www.gettinhahead.co.in -- View this message in context: http://lucene.472066.n3.nabble.com/MaxWarming-Searcher-tp2982658p2983622.html Sent from the Solr - User mailing list archive at Nabble.com.
Returning documents using multi-valued field
Hi all, I'm quite new to Solr and I'm supporting an existing Solr search engine which was written by someone else. I've been reading on Solr for the last couple of weeks so I'd consider myself beyond the basics. A particular field, let's say name, is multi-valued. For example, a document has a field name with values Alice, Trudy. We want that the document is returned when Alice or Trudy is input and not when Alice Trudy is entered. Currently the document is even with Alice Trudy. How could this be done? Thanks a lot! Kurt
Re: Termscomponent sort question
No one has an idea? -- View this message in context: http://lucene.472066.n3.nabble.com/Termscomponent-sort-question-tp2980683p2983776.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: adding results external to index
Any help? It can be done out side of solr application but just wanted to know if solr has some features for supporting this -- View this message in context: http://lucene.472066.n3.nabble.com/adding-results-external-to-index-tp2946548p2983984.html Sent from the Solr - User mailing list archive at Nabble.com.
problem in setting field attribute in schema.xml
In my schema.xml file i made a filed attribute indexed=false and stored=true. ie. i am not indexing this field but still in my search results i am getting values for this field, why is so any idea? - Romi -- View this message in context: http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-schema-xml-tp2984126p2984126.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: problem in setting field attribute in schema.xml
Please reply, i am not getting any of my problems reply in this forum. - Romi -- View this message in context: http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-schema-xml-tp2984126p2984151.html Sent from the Solr - User mailing list archive at Nabble.com.
Escaping equals-sign in external file field
Hi, It seems i cannot escape the equals-sign in the source file for the external file field. Anyone knows another work-around? Except for not using values with that character of course ;) Cheers, -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: correctlySpelled and onlyMorePopular in 3.1
Any thoughts on this one? On Monday 23 May 2011 17:41:00 Markus Jelsma wrote: Hi, I know about the behaviour of the onlyMorePopular setting. It can return suggestions while the actual query is correctly spelled. There is, in my opinion, some bad behaviour, consider the following query that is correctly spelled and yields results and never suggestions: q=testspellcheck.onlyMorePopular=false bool name=correctlySpelledtrue/bool q=testspellcheck.onlyMorePopular=true bool name=correctlySpelledfalse/bool Now, also consider the following scenario with onlyMorePopular enabled. Both term_a and term_b are correctly spelled and in the index. q=term_a bool name=correctlySpelledtrue/bool str name=collationterm_b/str q=term_b bool name=correctlySpelledfalse/bool The value of correctlySpelled can be very counter intuitive when onlyMorePopular is enabled, isn't it? File an issue or live with it? Cheers, -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: problem in setting field attribute in schema.xml
if you never want to see a result for a field set stored = false. Best Regards, Bryan Rasmussen On Wed, May 25, 2011 at 2:37 PM, Romi romijain3...@gmail.com wrote: In my schema.xml file i made a filed attribute indexed=false and stored=true. ie. i am not indexing this field but still in my search results i am getting values for this field, why is so any idea? - Romi -- View this message in context: http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-schema-xml-tp2984126p2984126.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: problem in setting field attribute in schema.xml
if i do stored=false then it indexes the data but not shows the data in search result. but in my case i do not want to index the data for a field and to the my surprise even if i am doing indexed=false for this field, i am still able to get that data through the query *:* but not getting the data if i run filter query as field:value, its really confusing what solr is doing. - Romi -- View this message in context: http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-schema-xml-tp2984126p2984239.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: problem in setting field attribute in schema.xml
surely it indexes the data if you do indexed = true. if you put some data in the field that is unique to that document and then search do you get it? If not then it is because it is not indexed. If you do a search for another field in the same document but still get the non-indexed field shown it is because the non-indexed field is stored. Best Regards, Bryan Rasmussen On Wed, May 25, 2011 at 3:11 PM, Romi romijain3...@gmail.com wrote: if i do stored=false then it indexes the data but not shows the data in search result. but in my case i do not want to index the data for a field and to the my surprise even if i am doing indexed=false for this field, i am still able to get that data through the query *:* but not getting the data if i run filter query as field:value, its really confusing what solr is doing. - Romi -- View this message in context: http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-schema-xml-tp2984126p2984239.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: problem in setting field attribute in schema.xml
If i do uniqueField to indexed=false then it shows the exception org.apache.solr.common.SolrException: Schema Parsing Failed and in http://wiki.apache.org/solr/SchemaXml#Fields it is clearly mentioned that a non-indexed field is not searchable then why i am getting search result. why should stored=true matter if indexed=false - Romi -- View this message in context: http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-schema-xml-tp2984126p2984306.html Sent from the Solr - User mailing list archive at Nabble.com.
Bulk indexing, UpdateProcessor overwriteDupes and poor IO performances
Dear list, I'm posting here after some unsuccessful investigations. In my setup I push documents to Solr using the StreamingUpdateSolrServer. I'm sending a comfortable initial amount of documents (~250M) and wished to perform overwriting of duplicated documents at index time, during the update, taking advantage of the UpdateProcessorChain. At the beginning of the indexing stage, everything is quite fast; documents arrive at a rate of about 1000 doc/s. The only extra processing during the import is computation of a couple of hashes that are used to identify uniquely documents given their content, using both stock (MD5Signature) and custom (derived from Lookup3Signature) update processors. I send a commit command to the server every 500k documents sent. During a first period, the server is CPU bound. After a short while (~10 minutes), the rate at which documents are received starts to fall dramatically, the server being IO bound. I've been firstly thinking of a normal speed decrease during the commit, while my push client is waiting for the flush to occur. That would have been a normal slowdown. The thing that retained my attention was the fact that unexpectedly, the server was performing a lot of small reads, way more the number writes, which seem to be larger. The combination of the many small reads with the constant amount of bigger writes seem to be creating a lot of IO contention on my commodity SATA drive, and the ETA of my built index started to increase scarily =D I then restarted the JVM with JMX enabled so I could start investigating a little bit more. I've the realized that the UpdateHandler was performing many reads while processing the update request. Are there any known limitations around the UpdateProcessorChain, when overwriteDupes is set to true ? I turned that off, which of course breaks the intent of my built index, but for comparison purposes it's good. That did the trick, indexing is fast again, even with the periodic commits. I therefor have two questions, an interesting first one and a boring second one : 1 / What's the workflow of the UpdateProcessorChain when one or more processors have overwriting of duplicates turned on ? What happens under the hood ? I tried to answer that myself looking at DirectUpdateHandler2 and my understanding stopped at the following : - The document is added to the lucene IW - The duplicates are deleted from the lucene IW The dark magic I couldn't understand seems to occur around the idTerm and updateTerm things, in the addDoc method. The deletions seem to be buffered somewhere, I just didn't get it :-) I might be wrong since I didn't read the code more than that, but the point might be at how does solr handles deletions, which is something still unclear to me. In anyways, a lot of reads seem to occur for that precise task and it tends to produce a lot of IO, killing indexing performances when overwriteDupes is on. I don't even understand why so many read operations occur at this stage since my process had a comfortable amount of RAM (with Xms=Xmx=8GB), with only 4.5GB are used so far. Any help, recommandation or idea is welcome :-) 2 / In the case there isn't a simple fix for this, I'll have to do with duplicates in my index. I don't mind since solr offers a great grouping feature, which I already use in some other applications. The only thing I don't know yet is that if I do rely on grouping at search time, in combination with the Stats component (which is the intent of that index), and limiting the results to 1 document per group, will the computed statistics take those duplicates into account or not ? Shortly, how well does the Stats component behave when combined to hits collapsing ? I had firstly implemented my solution using overwriteDupes because it would have reduced both the target size of my index and the complexity of queries used to obtain statistics on the search results, at one time. Thank you very much in advance. -- Tanguy
Re: problem in setting field attribute in schema.xml
On 5/25/2011 9:29 AM, Romi wrote: and in http://wiki.apache.org/solr/SchemaXml#Fields it is clearly mentioned that a non-indexed field is not searchable then why i am getting search result. why should stored=true matter if indexed=false indexed controls whether you can find the document based on the content of this field. stored controls whether you will see the content of this field in the result.
RE: problem in setting field attribute in schema.xml
It's very strange. Even I tried the same now and am getting the same result. I have set both indexed=false and stored=false. But still if I search for a keyword using my default search, I get the results in these fields as well. But if I specify field:value, it shows 0 results. Can anyone explain? Regards Vignesh -Original Message- From: Romi [mailto:romijain3...@gmail.com] Sent: 25 May 2011 18:42 To: solr-user@lucene.apache.org Subject: Re: problem in setting field attribute in schema.xml if i do stored=false then it indexes the data but not shows the data in search result. but in my case i do not want to index the data for a field and to the my surprise even if i am doing indexed=false for this field, i am still able to get that data through the query *:* but not getting the data if i run filter query as field:value, its really confusing what solr is doing. - Romi -- View this message in context: http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-sch ema-xml-tp2984126p2984239.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Escaping equals-sign in external file field
Created issue and added simple patch: https://issues.apache.org/jira/browse/SOLR-2545 On Wednesday 25 May 2011 14:55:34 Markus Jelsma wrote: Hi, It seems i cannot escape the equals-sign in the source file for the external file field. Anyone knows another work-around? Except for not using values with that character of course ;) Cheers, -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: problem in setting field attribute in schema.xml
You probably get tricked by an old index which was created while you had stored=true Delete your index, restart Solr, re-index content and try again. Solr will happily serve what's in the Lucene index even if it does not match your current schema - that's why it's important to re-index everything if you make changes to the schema and you want those changes to be visible. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 25. mai 2011, at 15.47, Vignesh Raj wrote: It's very strange. Even I tried the same now and am getting the same result. I have set both indexed=false and stored=false. But still if I search for a keyword using my default search, I get the results in these fields as well. But if I specify field:value, it shows 0 results. Can anyone explain? Regards Vignesh -Original Message- From: Romi [mailto:romijain3...@gmail.com] Sent: 25 May 2011 18:42 To: solr-user@lucene.apache.org Subject: Re: problem in setting field attribute in schema.xml if i do stored=false then it indexes the data but not shows the data in search result. but in my case i do not want to index the data for a field and to the my surprise even if i am doing indexed=false for this field, i am still able to get that data through the query *:* but not getting the data if i run filter query as field:value, its really confusing what solr is doing. - Romi -- View this message in context: http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-sch ema-xml-tp2984126p2984239.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: problem in setting field attribute in schema.xml
Am 25.05.2011 15:47, schrieb Vignesh Raj: It's very strange. Even I tried the same now and am getting the same result. I have set both indexed=false and stored=false. But still if I search for a keyword using my default search, I get the results in these fields as well. But if I specify field:value, it shows 0 results. Can anyone explain? I guess you copy the field to your default search field. -Michael
copyField generates multiple values encountered for non multiValued field
Dear list, hope somebody can help me understand/avoid this. I am sending an add request with allowDuplicates=false to a Solr 1.4.1 instance. This is for debugging purposes, so I am sending the exact same data that are already stored in Solr's index. I am using the PHP PECL libraries, which fail completely in giving me any hint on what goes wrong. Only sending the same add request again gives me a proper SolrClientException that hints: ERROR: [288400] multiple values encountered for non multiValued field field2 [fieldvalue, fieldvalue] The scenario: - field1 is implicitly single value, type text, indexed and stored - field2 is generated via a copyField directive in schema.xml, implicitly single value, type string, indexed and stored What appears to happen: - On the first add (SolrClient::addDocuments(array(SolrInputDocument theDocument))), regular fields like field1 get overwritten as intended - field2, defined with a copyField, but still single value, gets _appended_ instead - When I retrieve the updated document in a query and try to add it again, it won't let me because of the inconsistent multi-value state - The PECL library, in addition, appears to hit some internal exception (that it doesn't handle properly) when encountering multiple values for a single value field. That gives me zero results querying a set that includes the document via PHP, while the document can be retrieved properly, though in inconsistent state, any other way. But: Solr appears to be generating the corrupted state itsself via copyField? What's going wrong? I'm pretty confused... Thank you, Alex
Re: very slow commits and overlapping commits
I am taking a snapshot after every commit. From looking at the snapshots, it does not look like the delay in caused by segments merging because I am not seeing any large new segments after a commit. I still can't figure out why there is a 2 minutes gap between start commit and SolrDelectionPolicy.onCommit. Will changing the deletion policy make any difference? I am using the default deletion policy now. Bill 2011/5/21 Erick Erickson erickerick...@gmail.com Well, committing less offside a possibilty g. Here's what's probably happening. When you pass certain thresholds, segments are merged which can take quite some time. His are you triggering commits? If it's external, think about using auto commit instead. Best Erick On May 20, 2011 6:04 PM, Bill Au bill.w...@gmail.com wrote: On my Solr 1.4.1 master I am doing commits regularly at a fixed interval. I noticed that from time to time commit will take longer than the commit interval, causing commits to overlap. Then things will get worse as commit will take longer and longer. Here is the logs for a long commit: [2011-05-18 23:47:30.071] start commit(optimize=false,waitFlush=false,waitSearcher=false,expungeDeletes=false) [2011-05-18 23:49:48.119] SolrDeletionPolicy.onCommit: commits:num=2 [2011-05-18 23:49:48.119] commit{dir=/var/opt/resin3/5062/solr/data/index,segFN=segments_5cpa,version=1247782702272,generation=249742,filenames=[_4dqu_2g.del, _4e66.tis, _4e3r.tis, _4e59.nrm, _4e68_1.del, _4e4n.prx, _4e4n.fnm, _4e67.fnm, _4e3r.frq, _4e3r.tii, _4e6d.fnm, _4e6c.prx, _4e68.fdx, _4e68.nrm, _4e6a.frq, _4e68.fdt, _4dqu.fnm, _4e4n.tii, _4e69.fdx, _4e69.fdt, _4e0e.nrm, _4e4n.tis, _4e6e.fnm, _4e3r.prx, _4e66.fnm, _4e3r.nrm, _4e0e.prx, _4e4c.fdx, _4dx1.prx, _4e5v.frq, _4e3r.fdt, _4e4c.tis, _4e41_6.del, _4e6b.tis, _4e6b_1.del, _4e4y_3.del, _4e6b.tii, _4e3r.fdx, _4dx1.nrm, _4e4y.frq, _4e4c.fdt, _4e4c.tii, _4e6d.fdt, _4e5k.fnm, _4e41.fnm, _4e69.fnm, _4e67.fdt, _4e0e.tii, _4dty_h.del, _4e6b.fnm, _4e0e_h.del, _4e6d.fdx, _4e67.fdx, _4e0e.tis, _4e5v.nrm, _4dx1.fnm, _4e5v.tii, _4dqu.fdt, segments_5cpa, _4e5v.prx, _4dqu.fdx, _4e59.fnm, _4e6d.prx, _4e59_5.del, _4e4c.prx, _4e4c.nrm, _4e5k.prx, _4e66.fdx, _4dty.frq, _4e6c.frq, _4e5v.tis, _4e6e.tii, _4e66.fdt, _4e6b.fdx, _4e68.prx, _4e59.fdx, _4e6e.fdt, _4e41.prx, _4dx1.tii, _4dx1.fdt, _4e6b.fdt, _4e5v_4.del, _4e4n.fdt, _4e6e.fdx, _4dx1.fdx, _4e41.nrm, _4e4n.fdx, _4e6e.tis, _4e66.tii, _4e4c.fnm, _4e6b.prx, _4e67.prx, _4e0e.fnm, _4e4n.nrm, _4e67.nrm, _4e5k.nrm, _4e6a.prx, _4e68.fnm, _4e4c_4.del, _4dx1.tis, _4e6e.nrm, _4e59.tii, _4e68.tis, _4e67.frq, _4e3r.fnm, _4dty.nrm, _4e4y.prx, _4e6e.prx, _4dty.tis, _4e4y.tis, _4e6b.nrm, _4e6a.fdt, _4e4n.frq, _4e6d.frq, _4e59.fdt, _4e6a.fdx, _4e6a.fnm, _4dqu.tii, _4e41.tii, _4e67_1.del, _4e41.tis, _4dty.fdt, _4e69.tis, _4dqu.frq, _4dty.fdx, _4dx1.frq, _4e6e.frq, _4e66_1.del, _4e69.prx, _4e6d.tii, _4e5k.tii, _4e0e.fdt, _4dqu.tis, _4e6d.tis, _4e69.nrm, _4dqu.prx, _4e4y.fnm, _4e67.tis, _4e69_1.del, _4e6d.nrm, _4e6c.tis, _4e0e.fdx, _4e6c.tii, _4dx1_n.del, _4e5v.fnm, _4e5k.tis, _4e59.tis, _4e67.tii, _4dqu.nrm, _4e5k_8.del, _4e6c.fdx, _4e6c.fdt, _4e41.frq, _4e4y.fdx, _4e69.frq, _4e6a.tis, _4dty.prx, _4e66.frq, _4e5k.frq, _4e6a.tii, _4e69.tii, _4e6c.nrm, _4dty.fnm, _4e59.prx, _4e59.frq, _4e66.prx, _4e68.frq, _4e5k.fdx, _4e4y.tii, _4e6c.fnm, _4e0e.frq, _4e6b.frq, _4e41.fdt, _4e4n_2.del, _4dty.tii, _4e4y.fdt, _4e66.nrm, _4e4c.frq, _4e6a.nrm, _4e5k.fdt, _4e3r_i.del, _4e5v.fdt, _4e4y.nrm, _4e68.tii, _4e5v.fdx, _4e41.fdx] [2011-05-18 23:49:48.119] commit{dir=/var/opt/resin3/5062/solr/data/index,segFN=segments_5cpb,version=1247782702273,generation=249743,filenames=[_4dqu_2g.del, _4e66.tis, _4e59.nrm, _4e3r.tis, _4e4n.fnm, _4e67.fnm, _4e3r.tii, _4e6d.fnm, _4e68.fdx, _4e68.fdt, _4dqu.fnm, _4e4n.tii, _4e69.fdx, _4e69.fdt, _4e4n.tis, _4e6e.fnm, _4e0e.prx, _4e4c.tis, _4e5v.frq, _4e4y_3.del, _4e6b_1.del, _4e4c.tii, _4e6f.fnm, _4e5k.fnm, _4e6c_1.del, _4e41.fnm, _4dx1.fnm, _4e5v.nrm, _4e5v.tii, _4e5v.prx, _4e5k.prx, _4e4c.nrm, _4dty.frq, _4e66.fdx, _4e5v.tis, _4e66.fdt, _4e6e.tii, _4e59.fdx, _4e6b.fdx, _4e41.prx, _4e6b.fdt, _4e41.nrm, _4e6e.tis, _4e4c.fnm, _4e66.tii, _4e6b.prx, _4e0e.fnm, _4e5k.nrm, _4e6a.prx, _4e6e.nrm, _4e59.tii, _4e67.frq, _4dty.nrm, _4e4y.tis, _4e6a.fdt, _4e6b.nrm, _4e59.fdt, _4e6a.fdx, _4e41.tii, _4e41.tis, _4e67_1.del, _4dty.fdt, _4dty.fdx, _4e69.tis, _4e66_1.del, _4e6e.frq, _4e5k.tii, _4dqu.prx, _4e67.tis, _4e69_1.del, _4e6c.tis, _4e6c.tii, _4e5v.fnm, _4e5k.tis, _4e59.tis, _4e67.tii, _4e6c.fdx, _4e4y.fdx, _4e41.frq, _4e6c.fdt, _4dty.prx, _4e66.frq, _4e69.tii, _4e6c.nrm, _4e59.frq, _4e66.prx, _4e5k.fdx, _4e68.frq, _4e4y.tii, _4e4n_2.del, _4e41.fdt, _4e6b.frq, _4e4y.fdt, _4e66.nrm, _4e4c.frq, _4e3r_i.del, _4e5k.fdt, _4e4y.nrm, _4e41.fdx, _4e4n.prx, _4e68_1.del, _4e3r.frq, _4e6f.fdt, _4e6f.fdx, _4e6c.prx, _4e68.nrm, _4e6a.frq,
RE: problem in setting field attribute in schema.xml
I tried deleting the index and trying it again. But still I get the same result. Regards Vignesh -Original Message- From: Jan Høydahl [mailto:jan@cominvent.com] Sent: 25 May 2011 19:30 To: solr-user@lucene.apache.org Subject: Re: problem in setting field attribute in schema.xml You probably get tricked by an old index which was created while you had stored=true Delete your index, restart Solr, re-index content and try again. Solr will happily serve what's in the Lucene index even if it does not match your current schema - that's why it's important to re-index everything if you make changes to the schema and you want those changes to be visible. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 25. mai 2011, at 15.47, Vignesh Raj wrote: It's very strange. Even I tried the same now and am getting the same result. I have set both indexed=false and stored=false. But still if I search for a keyword using my default search, I get the results in these fields as well. But if I specify field:value, it shows 0 results. Can anyone explain? Regards Vignesh -Original Message- From: Romi [mailto:romijain3...@gmail.com] Sent: 25 May 2011 18:42 To: solr-user@lucene.apache.org Subject: Re: problem in setting field attribute in schema.xml if i do stored=false then it indexes the data but not shows the data in search result. but in my case i do not want to index the data for a field and to the my surprise even if i am doing indexed=false for this field, i am still able to get that data through the query *:* but not getting the data if i run filter query as field:value, its really confusing what solr is doing. - Romi -- View this message in context: http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-sch ema-xml-tp2984126p2984239.html Sent from the Solr - User mailing list archive at Nabble.com.
Similarity per field
Hi all, I sent a mail in about this topic a week ago but now that I have more information about what I am doing, as well as a better understanding of how the similarity class works, I wanted to start a new thread with a bit more information about what I'm doing, what I want to do, and how I can make it work correctly. I have written a similarity class that I would like applied to a specific field. This is how I am defining the fieldType: fieldType name=edgengram_cust class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=1 side=front / /analyzer similarity class=my.package.similarity.MySimilarity/ /fieldType And then I assign a specific field to that fieldType: field name=myfield multiValued=true type=edgengram_cust indexed=true stored=true required=false omitNorms=true / Then, I restarted solr and did a fullimport. However, the changes I have made do not appear to be taking hold. For simplicity, right now I just have the idf function returning 1. When I do a search with debugQuery=on, the idf behaves as it normally does. However, when I search on this field, the idf should be 1 and that is not the case. To try and nail down where the problem occurs, I commented out the similarity class definition in the fieldType and added it globally to the schema file: similarity class=my.package.similarity.MySimilarity/ Then, I restarted solr and did a fullimport. This time, the idf scores were all 1. So it seems to me the problem is not with my similarity class but in trying to apply it to a specific fieldType. According to https://issues.apache.org/jira/browse/SOLR-2338, this should be in the trunk now yes? I have run svn up on both my lucene and solr installs and it still is not recognizing it on a per field basis. Is the tag different inside a fieldType? Did I not update solr correctly? Where is my mistake? Thanks, Brian Lamb
communication protocol between master and slave
Hi, I am just curious what is the communication protocol that a slave node get the index update from the master node in a replication settings? Is it through TCP? I assume it only gets the delta? Thanks very much in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/communication-protocol-between-master-and-slave-tp2985163p2985163.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH import and postImportDeleteQuery
Hi Ephraim, Thank you so much for the input. I was able to find your thread on the archives and got your solution to work. In fact, when using $deleteDocById and $skipDoc it worked like a charm. This feature is very useful, it's a shame it's not properly documented. The only downside is the one you mentioned that the stats are not updated, so if I update 13 documents and delete 2, DIH would tell me that only 13 documents were processed. This is bad in my case because I check the end result to generate an error e-mail if needed. You also mentioned that if the query contains only deletion records, a commit would not be automatically executed and it would be necessary to commit manually. How can I commit manually via DIH? I was not able to find any references on the documentation. Thanks! Alexandre On Wed, May 25, 2011 at 5:14 AM, Ephraim Ofir ephra...@icq.com wrote: Search the list for my post DIH - deleting documents, high performance (delta) imports, and passing parameters which shows my solution a similar problem. Ephraim Ofir -Original Message- From: Alexandre Rocco [mailto:alel...@gmail.com] Sent: Tuesday, May 24, 2011 11:24 PM To: solr-user@lucene.apache.org Subject: DIH import and postImportDeleteQuery Guys, I am facing a situation in one of our projects that I need to perform a cleanup to remove some documents after we perform an update via DIH. The big issue right now comes from the fact that when we call the DIH with clean=false, the postImportDeleteQuery is not executed. My setup is currently arranged like this: - A SQL Server stored procedure that receives a parameter (specified in the URL) and returns the records to be indexed - The procedure is able to return all the records (for a full-import) or only the updated records (for a delta-import) - This procedure returns valid and deleted records, from this point comes the need to run a postImportDeleteQuery to remove the deleted ones. Everything works fine when I run a full-import, I am running always with clean=true, and then the whole index is rebuilt. When I need to do an incremental update, the records are updated correctly, but the command to delete the other records is not executed. I've tried several combinations, with different results: - Running full-import with clean=false: the records are updated but the ones that needs to be deleted stays on the index - Running delta-import with clean=false: the records are updated but the ones that needs to be deleted stays on the index - Running delta-import with clean=true: all records are deleted from the index and then only the records returned by the procedure are on the index, except the deleted ones. I don't see any way to achieve my goal, without changing the process that I do to obtain the data. Since this is a very complex stored procedure, with tons of joins and custom processing, I am trying everything to avoid messing with it. See below a copy of my data-config.xml file. I made it simpler omitting all the fields, since it's out of scope of the issue: ?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource type=JdbcDataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://myserver;databaseName=mydb;user=username;password= password;responseBuffering=adaptive; / document entity name=entity_one pk=entityid transformer=RegexTransformer query=EXEC some_stored_procedure ${dataimporter.request.someid} preImportDeleteQuery=status:1 postImportDeleteQuery=status:1 field column=field1 name=field1 splitBy=; / field column=field2 name=field2 splitBy=; / field column=field3 name=field3 splitBy=; / /entity entity name=entity_two pk=entityid transformer=RegexTransformer query=EXEC someother_stored_procedure ${dataimporter.request.someotherid} preImportDeleteQuery=status:1 postImportDeleteQuery=status:1 field column=field1 name=field1 / field column=field2 name=field2 / field column=field3 name=field2 / /entity /document /dataConfig Any ideas or pointers that might help on this one? Many thanks, Alexandre
RE: DIH import and postImportDeleteQuery
The failure to commit bug with $deleteDocById can be fixed by applying patch SOLR-2492. This patch also partially fixes the no updated stats bug in that it increments 1 for every call to $deleteDocById and $deleteDocByQuery. Note that this might result in inaccurate counts if the id given with $deleteDocById doesn't exist or is duplicated. Obviously this is not a complete fix for stats using $deleteDocByQuery as this command would normally be used to delete 1 doc at a time. The patch is for Trunk but it might work with 3.1 also. If not, it likely only needs minor tweaking. The jira ticket is here: https://issues.apache.org/jira/browse/SOLR-2492 James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Alexandre Rocco [mailto:alel...@gmail.com] Sent: Wednesday, May 25, 2011 12:54 PM To: solr-user@lucene.apache.org Subject: Re: DIH import and postImportDeleteQuery Hi Ephraim, Thank you so much for the input. I was able to find your thread on the archives and got your solution to work. In fact, when using $deleteDocById and $skipDoc it worked like a charm. This feature is very useful, it's a shame it's not properly documented. The only downside is the one you mentioned that the stats are not updated, so if I update 13 documents and delete 2, DIH would tell me that only 13 documents were processed. This is bad in my case because I check the end result to generate an error e-mail if needed. You also mentioned that if the query contains only deletion records, a commit would not be automatically executed and it would be necessary to commit manually. How can I commit manually via DIH? I was not able to find any references on the documentation. Thanks! Alexandre On Wed, May 25, 2011 at 5:14 AM, Ephraim Ofir ephra...@icq.com wrote: Search the list for my post DIH - deleting documents, high performance (delta) imports, and passing parameters which shows my solution a similar problem. Ephraim Ofir -Original Message- From: Alexandre Rocco [mailto:alel...@gmail.com] Sent: Tuesday, May 24, 2011 11:24 PM To: solr-user@lucene.apache.org Subject: DIH import and postImportDeleteQuery Guys, I am facing a situation in one of our projects that I need to perform a cleanup to remove some documents after we perform an update via DIH. The big issue right now comes from the fact that when we call the DIH with clean=false, the postImportDeleteQuery is not executed. My setup is currently arranged like this: - A SQL Server stored procedure that receives a parameter (specified in the URL) and returns the records to be indexed - The procedure is able to return all the records (for a full-import) or only the updated records (for a delta-import) - This procedure returns valid and deleted records, from this point comes the need to run a postImportDeleteQuery to remove the deleted ones. Everything works fine when I run a full-import, I am running always with clean=true, and then the whole index is rebuilt. When I need to do an incremental update, the records are updated correctly, but the command to delete the other records is not executed. I've tried several combinations, with different results: - Running full-import with clean=false: the records are updated but the ones that needs to be deleted stays on the index - Running delta-import with clean=false: the records are updated but the ones that needs to be deleted stays on the index - Running delta-import with clean=true: all records are deleted from the index and then only the records returned by the procedure are on the index, except the deleted ones. I don't see any way to achieve my goal, without changing the process that I do to obtain the data. Since this is a very complex stored procedure, with tons of joins and custom processing, I am trying everything to avoid messing with it. See below a copy of my data-config.xml file. I made it simpler omitting all the fields, since it's out of scope of the issue: ?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource type=JdbcDataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://myserver;databaseName=mydb;user=username;password= password;responseBuffering=adaptive; / document entity name=entity_one pk=entityid transformer=RegexTransformer query=EXEC some_stored_procedure ${dataimporter.request.someid} preImportDeleteQuery=status:1 postImportDeleteQuery=status:1 field column=field1 name=field1 splitBy=; / field column=field2 name=field2 splitBy=; / field column=field3 name=field3 splitBy=; / /entity entity name=entity_two pk=entityid transformer=RegexTransformer query=EXEC someother_stored_procedure ${dataimporter.request.someotherid} preImportDeleteQuery=status:1 postImportDeleteQuery=status:1 field column=field1 name=field1 / field column=field2 name=field2 / field column=field3 name=field2 / /entity /document /dataConfig Any ideas or pointers
Re: communication protocol between master and slave
I'm pretty sure it's over HTTP, although I don't know the details of the requests/responses. The slave will download any index files that have changed on master. A Solr index is split up amongst a number of seperate files on disk. There's no way for slave to get a delta beyond getting a complete index file if and only if it's changed --- index files that haven't changed won't be downloaded, index files that are new will be downloaded. (I think 'new' is basically the same as 'changed', I am not sure if index files are ever actually changed, rather than new ones being created and old ones (after a merge/optimize operation) being deleted). One side effect of this is if an 'optimize' is run on master, then typically all index files will have to be downloaded. (Likewise if an optimize is run on slave, next replication all index files will be downloaded. There's generally no good reason to run an optimize on slave, or otherwise do anything to change the index at all on slave). On 5/25/2011 1:11 PM, antoniosi wrote: Hi, I am just curious what is the communication protocol that a slave node get the index update from the master node in a replication settings? Is it through TCP? I assume it only gets the delta? Thanks very much in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/communication-protocol-between-master-and-slave-tp2985163p2985163.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Similarity per field
I looked at the patch page and saw the files that were changed. I went into my install and looked at those same files and found that they had indeed been changed. So it looks like I have the correct version of solr. On Wed, May 25, 2011 at 1:01 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, I sent a mail in about this topic a week ago but now that I have more information about what I am doing, as well as a better understanding of how the similarity class works, I wanted to start a new thread with a bit more information about what I'm doing, what I want to do, and how I can make it work correctly. I have written a similarity class that I would like applied to a specific field. This is how I am defining the fieldType: fieldType name=edgengram_cust class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=1 side=front / /analyzer similarity class=my.package.similarity.MySimilarity/ /fieldType And then I assign a specific field to that fieldType: field name=myfield multiValued=true type=edgengram_cust indexed=true stored=true required=false omitNorms=true / Then, I restarted solr and did a fullimport. However, the changes I have made do not appear to be taking hold. For simplicity, right now I just have the idf function returning 1. When I do a search with debugQuery=on, the idf behaves as it normally does. However, when I search on this field, the idf should be 1 and that is not the case. To try and nail down where the problem occurs, I commented out the similarity class definition in the fieldType and added it globally to the schema file: similarity class=my.package.similarity.MySimilarity/ Then, I restarted solr and did a fullimport. This time, the idf scores were all 1. So it seems to me the problem is not with my similarity class but in trying to apply it to a specific fieldType. According to https://issues.apache.org/jira/browse/SOLR-2338, this should be in the trunk now yes? I have run svn up on both my lucene and solr installs and it still is not recognizing it on a per field basis. Is the tag different inside a fieldType? Did I not update solr correctly? Where is my mistake? Thanks, Brian Lamb
Re: DIH import and postImportDeleteQuery
Hi James, Thanks for the heads up! I am currently on version 1.4.1, so I can apply this patch and see if it works. Just need to assess if it's best to apply the patch or to check on the backend system to see if only delete requests were generated and then do not call DIH. Previously, I found another open issue, created from Ephraim: https://issues.apache.org/jira/browse/SOLR-2104 It's the same issue, but it hasn't had any updates yet. Regards, Alexandre On Wed, May 25, 2011 at 3:17 PM, Dyer, James james.d...@ingrambook.comwrote: The failure to commit bug with $deleteDocById can be fixed by applying patch SOLR-2492. This patch also partially fixes the no updated stats bug in that it increments 1 for every call to $deleteDocById and $deleteDocByQuery. Note that this might result in inaccurate counts if the id given with $deleteDocById doesn't exist or is duplicated. Obviously this is not a complete fix for stats using $deleteDocByQuery as this command would normally be used to delete 1 doc at a time. The patch is for Trunk but it might work with 3.1 also. If not, it likely only needs minor tweaking. The jira ticket is here: https://issues.apache.org/jira/browse/SOLR-2492 James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Alexandre Rocco [mailto:alel...@gmail.com] Sent: Wednesday, May 25, 2011 12:54 PM To: solr-user@lucene.apache.org Subject: Re: DIH import and postImportDeleteQuery Hi Ephraim, Thank you so much for the input. I was able to find your thread on the archives and got your solution to work. In fact, when using $deleteDocById and $skipDoc it worked like a charm. This feature is very useful, it's a shame it's not properly documented. The only downside is the one you mentioned that the stats are not updated, so if I update 13 documents and delete 2, DIH would tell me that only 13 documents were processed. This is bad in my case because I check the end result to generate an error e-mail if needed. You also mentioned that if the query contains only deletion records, a commit would not be automatically executed and it would be necessary to commit manually. How can I commit manually via DIH? I was not able to find any references on the documentation. Thanks! Alexandre On Wed, May 25, 2011 at 5:14 AM, Ephraim Ofir ephra...@icq.com wrote: Search the list for my post DIH - deleting documents, high performance (delta) imports, and passing parameters which shows my solution a similar problem. Ephraim Ofir -Original Message- From: Alexandre Rocco [mailto:alel...@gmail.com] Sent: Tuesday, May 24, 2011 11:24 PM To: solr-user@lucene.apache.org Subject: DIH import and postImportDeleteQuery Guys, I am facing a situation in one of our projects that I need to perform a cleanup to remove some documents after we perform an update via DIH. The big issue right now comes from the fact that when we call the DIH with clean=false, the postImportDeleteQuery is not executed. My setup is currently arranged like this: - A SQL Server stored procedure that receives a parameter (specified in the URL) and returns the records to be indexed - The procedure is able to return all the records (for a full-import) or only the updated records (for a delta-import) - This procedure returns valid and deleted records, from this point comes the need to run a postImportDeleteQuery to remove the deleted ones. Everything works fine when I run a full-import, I am running always with clean=true, and then the whole index is rebuilt. When I need to do an incremental update, the records are updated correctly, but the command to delete the other records is not executed. I've tried several combinations, with different results: - Running full-import with clean=false: the records are updated but the ones that needs to be deleted stays on the index - Running delta-import with clean=false: the records are updated but the ones that needs to be deleted stays on the index - Running delta-import with clean=true: all records are deleted from the index and then only the records returned by the procedure are on the index, except the deleted ones. I don't see any way to achieve my goal, without changing the process that I do to obtain the data. Since this is a very complex stored procedure, with tons of joins and custom processing, I am trying everything to avoid messing with it. See below a copy of my data-config.xml file. I made it simpler omitting all the fields, since it's out of scope of the issue: ?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource type=JdbcDataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://myserver;databaseName=mydb;user=username;password= password;responseBuffering=adaptive; / document entity name=entity_one pk=entityid transformer=RegexTransformer
RE: DIH import and postImportDeleteQuery
Great. I wasn't aware of the other issue. I put a link on the 2 issues in JIRA so people can know in the future. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Alexandre Rocco [mailto:alel...@gmail.com] Sent: Wednesday, May 25, 2011 2:34 PM To: solr-user@lucene.apache.org Subject: Re: DIH import and postImportDeleteQuery Hi James, Thanks for the heads up! I am currently on version 1.4.1, so I can apply this patch and see if it works. Just need to assess if it's best to apply the patch or to check on the backend system to see if only delete requests were generated and then do not call DIH. Previously, I found another open issue, created from Ephraim: https://issues.apache.org/jira/browse/SOLR-2104 It's the same issue, but it hasn't had any updates yet. Regards, Alexandre On Wed, May 25, 2011 at 3:17 PM, Dyer, James james.d...@ingrambook.comwrote: The failure to commit bug with $deleteDocById can be fixed by applying patch SOLR-2492. This patch also partially fixes the no updated stats bug in that it increments 1 for every call to $deleteDocById and $deleteDocByQuery. Note that this might result in inaccurate counts if the id given with $deleteDocById doesn't exist or is duplicated. Obviously this is not a complete fix for stats using $deleteDocByQuery as this command would normally be used to delete 1 doc at a time. The patch is for Trunk but it might work with 3.1 also. If not, it likely only needs minor tweaking. The jira ticket is here: https://issues.apache.org/jira/browse/SOLR-2492 James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Alexandre Rocco [mailto:alel...@gmail.com] Sent: Wednesday, May 25, 2011 12:54 PM To: solr-user@lucene.apache.org Subject: Re: DIH import and postImportDeleteQuery Hi Ephraim, Thank you so much for the input. I was able to find your thread on the archives and got your solution to work. In fact, when using $deleteDocById and $skipDoc it worked like a charm. This feature is very useful, it's a shame it's not properly documented. The only downside is the one you mentioned that the stats are not updated, so if I update 13 documents and delete 2, DIH would tell me that only 13 documents were processed. This is bad in my case because I check the end result to generate an error e-mail if needed. You also mentioned that if the query contains only deletion records, a commit would not be automatically executed and it would be necessary to commit manually. How can I commit manually via DIH? I was not able to find any references on the documentation. Thanks! Alexandre On Wed, May 25, 2011 at 5:14 AM, Ephraim Ofir ephra...@icq.com wrote: Search the list for my post DIH - deleting documents, high performance (delta) imports, and passing parameters which shows my solution a similar problem. Ephraim Ofir -Original Message- From: Alexandre Rocco [mailto:alel...@gmail.com] Sent: Tuesday, May 24, 2011 11:24 PM To: solr-user@lucene.apache.org Subject: DIH import and postImportDeleteQuery Guys, I am facing a situation in one of our projects that I need to perform a cleanup to remove some documents after we perform an update via DIH. The big issue right now comes from the fact that when we call the DIH with clean=false, the postImportDeleteQuery is not executed. My setup is currently arranged like this: - A SQL Server stored procedure that receives a parameter (specified in the URL) and returns the records to be indexed - The procedure is able to return all the records (for a full-import) or only the updated records (for a delta-import) - This procedure returns valid and deleted records, from this point comes the need to run a postImportDeleteQuery to remove the deleted ones. Everything works fine when I run a full-import, I am running always with clean=true, and then the whole index is rebuilt. When I need to do an incremental update, the records are updated correctly, but the command to delete the other records is not executed. I've tried several combinations, with different results: - Running full-import with clean=false: the records are updated but the ones that needs to be deleted stays on the index - Running delta-import with clean=false: the records are updated but the ones that needs to be deleted stays on the index - Running delta-import with clean=true: all records are deleted from the index and then only the records returned by the procedure are on the index, except the deleted ones. I don't see any way to achieve my goal, without changing the process that I do to obtain the data. Since this is a very complex stored procedure, with tons of joins and custom processing, I am trying everything to avoid messing with it. See below a copy of my data-config.xml file. I made it simpler omitting all the
Edgengram
Hi all, I'm running into some confusion with the way edgengram works. I have the field set up as: fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=100 side=front / /analyzer /fieldType I've also set up my own similarity class that returns 1 as the idf score. What I've found this does is if I match a string abcdefg against a field containing abcdefghijklmnop, then the idf will score that as a 7: 7.0 = idf(myfield: a=51 ab=23 abc=2 abcd=2 abcde=2 abcdef=2 abcdefg=2) I get why that's happening, but is there a way to avoid that? Do I need to do a new field type to achieve the desired affect? Thanks, Brian Lamb
Re: Termscomponent sort question
Help me please... -- View this message in context: http://lucene.472066.n3.nabble.com/Termscomponent-sort-question-tp2980683p2986185.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: communication protocol between master and slave
Thanks for the prompt reply. -- View this message in context: http://lucene.472066.n3.nabble.com/communication-protocol-between-master-and-slave-tp2985163p2986413.html Sent from the Solr - User mailing list archive at Nabble.com.
indexing numbers
Hi, How does solr index a numeric value? Does it index it as a string or does it keep it as a numeric value? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-numbers-tp2986424p2986424.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: indexing numbers
the default schema.xml provided in the Solr distribution is well-documented, and a good place to get started (including numeric fieldTypes): http://wiki.apache.org/solr/SchemaXml Lucid Imagination also provides a nice reference guide: http://www.lucidimagination.com/Downloads/LucidWorks-for-Solr/Reference-Guide hope that helps, rob On Wed, May 25, 2011 at 6:20 PM, antoniosi antonio...@gmail.com wrote: Hi, How does solr index a numeric value? Does it index it as a string or does it keep it as a numeric value? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-numbers-tp2986424p2986424.html Sent from the Solr - User mailing list archive at Nabble.com.
Minimum Should Match not enforced with External Field + Function Query with boost
Hello Minimum Should Match does not seem to be working when I am using the boost with external field scoring (I followed http://dev.tailsweep.com/solr-external-scoring/ example to implement external field scoring.) I am using a month old solr trunk build (4.0). Thanks for help. Ajay Here are input parameters to dismax request handler: d=6sfield=latlongroup.main=truewt=jsonrows=10debugQuery=truefl=*,scorestart=0q={!boost+b=dishRating+v=$qq}pt=42.35864,-71.05666group.field=resnamegroup=trueqq=hot+chicken+wingsfq={!bbox} MM field is defined in the default list (solrconfig.xml) as str name=mm3/str Debug information: [debug] = array(11) { [rawquerystring] = string(27) {!boost b=dishRating v=$qq} [querystring] = string(27) {!boost b=dishRating v=$qq} [parsedquery] = string(249) BoostedQuery(boost(text:hot (text:chicken text:chickn text:poultri text:murgh text:pollo) (text:wing text:wingett),FileFloatSource(field=dishRating,keyField=id,defVal=0.0,dataDir=/solr/dish/data/))) [parsedquery_toString] = string(235) boost(text:hot (text:chicken text:chickn text:poultri text:murgh text:pollo) (text:wing text:wingett),FileFloatSource(field=dishRating,keyField=id,defVal=0.0,dataDir=/solr/dish/data/)) [explain] = array(10) { [US-MA-2256-862-240311] = string(1397) 0.62424326 = (MATCH) boost(text:hot (text:chicken text:chickn text:poultri text:murgh text:pollo) (text:wing text:wingett),FileFloatSource(field=dishRating,keyField=id,defVal=0.0,dataDir=/solr/dish/data/)), product of: 0.15606081 = (MATCH) product of: 0.23409122 = (MATCH) sum of: 0.13103496 = (MATCH) weight(text:hot in 221464), product of: 0.18647969 = queryWeight(text:hot), product of: 4.497132 = idf(docFreq=16595, maxDocs=548010) 0.04146636 = queryNorm 0.70267683 = (MATCH) fieldWeight(text:hot in 221464), product of: 1.0 = tf(termFreq(text:hot)=1) 4.497132 = idf(docFreq=16595, maxDocs=548010) 0.15625 = fieldNorm(field=text, doc=221464) 0.103056274 = (MATCH) sum of: 0.103056274 = (MATCH) weight(text:chicken in 221464), product of: 0.11693921 = queryWeight(text:chicken), product of: 2.8200984 = idf(docFreq=88782, maxDocs=548010) 0.04146636 = queryNorm 0.8812808 = (MATCH) fieldWeight(text:chicken in 221464), product of: 2.0 = tf(termFreq(text:chicken)=4) 2.8200984 = idf(docFreq=88782, maxDocs=548010) 0.15625 = fieldNorm(field=text, doc=221464) 0.667 = coord(2/3) 4.0 = float(dishRating{type=dishRatingFile,properties=omitTermFreqAndPositions})=4.0 -- View this message in context: http://lucene.472066.n3.nabble.com/Minimum-Should-Match-not-enforced-with-External-Field-Function-Query-with-boost-tp2985564p2985564.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Special character in a field used by sort parameter
Marc SCHNEIDER marc.schneider73 at gmail.com writes: Hi, I have a field called test-id but I can't use it when sorting, for example : Doesn't work : (undefined field test) http://localhost:8180/solr/test-public/select/?q=test-id:1sort=test-id+asc http://localhost:8180/solr/test-public/select/?q=test-id:1sort=test\-id+asc When removing the sort parameter then it works... Is there a way of escaping the field name in sort parameter? Thanks in advance, Marc. I've also got a similar issue. When the field name has a hyphen and the first character is alphabetical, upon sorting solr says my field is undefined. a) It sorts fine when the first character is numerical, and b) I've tried encoding the url but hyphens don't encode. If anyone has a fix, I would be stoked to hear it. J
Tools?
Hello, Are there any tools that can be used for analyzing the solr logs? Regards Sujatha
Re: Termscomponent sort question
Hi antonio, Can you sort yourself on client side? Are you trying to sort the terms with the same count in reverse order of their lengths? On Tue, May 24, 2011 at 8:18 PM, antonio antonio...@email.it wrote: Hi, i use solr 3.1. I implemented my autocomplete with TermsComponent. I'm finding, if there is, a way to sort my finding terms by score. Example, i there are two terms: Rome and Near Rome, that have the same count (that is 1), i would that Rome will be before Near Rome. Because count is the same, if i use index as sort, Near Rome is lexically before Rome. Is there a way to use score like in dismax for termscomponents? Using dismax, for example, if i search Rome, the word Rome has max score than Near Rome. I would the same behavior with TermComponent. Is it possible? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Termscomponent-sort-question-tp2980683p2980683.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Dmitry Kan
analyzer type - does it default to index or query?
Hi, When specifying an analyzer for a fieldType, I can say type=index or type=query What if I don't spcify the type for an analyzer? Does it default to index or query or both? Thanks.
Re: problem in setting field attribute in schema.xml
indexed controls whether you can find the document based on the content of this field. stored controls whether you will see the content of this field in the result. ya...but when i set indexed=false for a particular field, and i search as *:* then it will search all documents thats true, but what i think is it should not contain the field which i set as indexed=true. for example in a document fields are id, author,title. and i for author field i set indexed=false, then author should not be indexed and when i perform search as *:* it should show all documents as doc string name= id id1/string string name=titlet1/string string name=authora1/string /doc but if i search author:a1, then 0 result will be found, why so?? to be very clear i am performing full-import where every time new indexes are created then also for safer side i deleted the indexes and recreated them then too i am facing the same problem. - Romi -- View this message in context: http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-schema-xml-tp2984126p2987530.html Sent from the Solr - User mailing list archive at Nabble.com.
FieldCache
Hi All, Since there is no way of controlling the size of Lucene's internal FieldCache, how can we make sure that we are making good use of it? One of my shard has close to 1.5M documents and the fieldCache only contains about 10 elements. Is there anything we can do to control this? Thanks
What is omitNorms
hi, I want to know what is omitNorms for a field in schema.xml and what will be its effect on indexing and searching if I set it to true or false, please suggest me some suitable example. - Romi -- View this message in context: http://lucene.472066.n3.nabble.com/What-is-omitNorms-tp2987547p2987547.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: What is omitNorms
and i also wanted to know what is difference if i set omitNorms in fieldType or if i set it in field. - Romi -- View this message in context: http://lucene.472066.n3.nabble.com/What-is-omitNorms-tp2987547p2987562.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: What is omitNorms
This is an advance option. pls see the details on following link http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr#d0e71 On Thu, May 26, 2011 at 11:12 AM, Romi romijain3...@gmail.com wrote: and i also wanted to know what is difference if i set omitNorms in fieldType or if i set it in field. - Romi -- View this message in context: http://lucene.472066.n3.nabble.com/What-is-omitNorms-tp2987547p2987562.html Sent from the Solr - User mailing list archive at Nabble.com. -- Chandan Tamrakar * *
Re: problem in setting field attribute in schema.xml
even though i am running command for full-import, then also i deleted old indexes , re created indexes, i am not using defaultSearchFiled and copyingField attribute, still i am getting the search result for the field which i set as indexed=true, really strange, please help me to get rid of this problem. Thanks. - Romi -- View this message in context: http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-schema-xml-tp2984126p2987628.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query on facet field¹s count
Sorry for the late reply to this thread. I implemented the same patch (solr 2242 )in Solr 1.4.1. Now I am able to get distinct facet terms count across single index. But this does not work for distributed process(sharding)..Is there a recent patch that has same functionality for distributed process? It works for the below query: http://localhost:8983/solr/select?q=*:*facet=truefacet.field=StudyIDfacet.mincount=1facet.limit=-1f.StudyID.facet.namedistinct=1 It doesn't work for : http://localhost:8983/solr/select?q=*:*facet=truefacet.field=StudyIDfacet.mincount=1facet.limit=-1f.StudyID.facet.namedistinct=1 shards=localhost:8090/solr2 It gets matched result set from both the cores but facet results are only from first core. Rajani On Sat, Mar 12, 2011 at 10:35 AM, rajini maski rajinima...@gmail.comwrote: Thanks Bill Bell . .This query works after applying the patch you refered to, is it? Please can you let me know how do I need to update the current war (apache solr 1.4.1 )file with this new patch? Thanks a lot. Thanks, Rajani On Sat, Mar 12, 2011 at 8:56 AM, Bill Bell billnb...@gmail.com wrote: http://localhost:8983/solr/select?q=*:*facet=truefacet.field=StudyIDface t.mincount=1facet.limit=-1f.StudyID.facet.namedistinct=1http://localhost:8983/solr/select?q=*:*facet=truefacet.field=StudyIDfacet.mincount=1facet.limit=-1f.StudyID.facet.namedistinct=1 Would do what you want I believe... On 3/11/11 8:51 AM, Bill Bell billnb...@gmail.com wrote: There is my patch to do that. SOLR-2242 Bill Bell Sent from mobile On Mar 11, 2011, at 1:34 AM, rajini maski rajinima...@gmail.com wrote: Query on facet field results... When I run a facet query on some field say : facet=on facet.field=StudyID I get list of distinct StudyID list with the count that tells that how many times did this study occur in the search query. But I also needed the count of these distinct StudyID list.. Any solr query to get count of it.. Example: lst name=*facet_fields* lst name= StudyID int name=*105*135164/int int name=*179*79820/int int name=*107*70815/int int name=*120*37076/int int name=*134*35276/int /lst /lst I wanted the count attribute that shall return the count of number of different studyID occurred .. In above example it could be : Count = 5 (105,179,107,120,134) lst name=*facet_fields* lst name= StudyID COUNT=5 int name=*105*135164/int int name=*179*79820/int int name=*107*70815/int int name=*120*37076/int int name=*134*35276/int /lst /lst