Re: highlighter not respecting sentence boundry
Any help on this one? Seems like highlighting component does not always start the snippet from starting of snippet. I tried several options. Has anyone successfully got this working? -- View this message in context: http://lucene.472066.n3.nabble.com/highlighter-not-respecting-sentence-boundry-tp3984327p3987718.html Sent from the Solr - User mailing list archive at Nabble.com.
random results at specific slots
Hi, I would like to return results sorted by score (desc), but i would like to insert random results into some predefined slots (lets say 10, 14 and 18). The reason I want to do that is I boost click-through rate based features significantly and i want to give a chance to documents which doesnt have enough click through rate data. This would help the results stay fresh. I looked into solr code and it looks like i need a custom QueryComponent where once the top results are ordered, i can insert some random results at my predefined slots and then return. I am wondering whether there is any other way I can achieve the same? Thanks Srini -- View this message in context: http://lucene.472066.n3.nabble.com/random-results-at-specific-slots-tp3987719.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Add HTTP-header from ResponseWriter
Thanks, i'll check the issues. -Original message- From:Jack Krupansky j...@basetechnology.com Sent: Mon 04-Jun-2012 17:19 To: solr-user@lucene.apache.org Subject: Re: Add HTTP-header from ResponseWriter There is some commented-out code in SolrDispatchFilter.doFilter: // add info to http headers //TODO: See SOLR-232 and SOLR-267. /*try { NamedList solrRspHeader = solrRsp.getResponseHeader(); for (int i=0; isolrRspHeader.size(); i++) { ((javax.servlet.http.HttpServletResponse) response).addHeader((Solr- + solrRspHeader.getName(i)), String.valueOf(solrRspHeader.getVal(i))); } } catch (ClassCastException cce) { log.log(Level.WARNING, exception adding response header log information, cce); }*/ And there is a comment from Grant on SOLR-267 that The changes to SolrDispatchFilter can screw up SolrJ when you have explicit=all ... so I'm going to ... comment out #2 and put a TODO: there and someone can address it on SOLR-232. I did not see a separate Jira issue for arbitrarily setting HTTP headers from response writers. -- Jack Krupansky -Original Message- From: Markus Jelsma Sent: Monday, June 04, 2012 7:10 AM To: solr-user@lucene.apache.org Subject: Add HTTP-header from ResponseWriter Hi, There has been discussion before on how to add/set a HTTP-header from a ResponseWriter. That was about adding the number of found documents for a CSVResponseWriter. We also need to set the number of found documents, in this case for the JSONResponseWriter. or any ResponseWriter. Is there any progress or open issue i am not aware of? Can the current (trunk) response framework already set or add an HTTP-header? Thanks, Markus
Re: random results at specific slots
Other option I could think of is to write a custom component which implements handleResponses, where i can pick random documents from across shards and insert it into the ResponseBuilder's resultIds ? I would place this component at the end (or after QueryCOmponent). will that work ? is there a better solution ? -- View this message in context: http://lucene.472066.n3.nabble.com/random-results-at-specific-slots-tp3987719p3987725.html Sent from the Solr - User mailing list archive at Nabble.com.
maxScore always returned
Hi, On trunk the maxScore response attribute is always returned even if score is not part of fl. Is this intentional? Thanks,
Re: Multi-words synonyms matching
The reason multi word synonyms work better if you use LUCENE_33 is because then Solr uses the SlowSynonymFilter instead of SynonymFilterFactory (FSTSynonymFilterFactory). But I don't know if the difference between them is a bug or not. Maybe someone has more insight? Bernd Fehling-2 wrote Are you sure with LUCENE_33 (Use of BitVector)? Am 31.05.2012 17:20, schrieb O. Klein: I have been struggling with this as well and found that using LUCENE_33 gives the best results. But as it will be deprecated this is no everlasting solution. May somebody knows one? -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-words-synonyms-matching-tp3898950p3987728.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strip html
Hello, I advanced on my problem. The index and fieldtype are good : I forgot copyfield body_strip_html on text, the defaultSearchField. Newbie's mistake. Now, solr returns all xml files i want. But, in php, the text isn't displayed for 2 xml files (with term castor snipped by html or xml tags like exemple). Look: http://lucene.472066.n3.nabble.com/file/n3987731/recherche_solr_tei.jpg The php file: Thanks you for your help. -- View this message in context: http://lucene.472066.n3.nabble.com/Strip-html-tp3987051p3987731.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi-words synonyms matching
Do you have test cases? What are you sending to your SynonymFilterFactory? What are you expecting it should return? What is it returning when setting to Version.LUCENE_33? What is it returning when setting to Version.LUCENE_36? Am 05.06.2012 10:56, schrieb O. Klein: The reason multi word synonyms work better if you use LUCENE_33 is because then Solr uses the SlowSynonymFilter instead of SynonymFilterFactory (FSTSynonymFilterFactory). But I don't know if the difference between them is a bug or not. Maybe someone has more insight? Bernd Fehling-2 wrote Are you sure with LUCENE_33 (Use of BitVector)? Am 31.05.2012 17:20, schrieb O. Klein: I have been struggling with this as well and found that using LUCENE_33 gives the best results. But as it will be deprecated this is no everlasting solution. May somebody knows one? -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-words-synonyms-matching-tp3898950p3987728.html Sent from the Solr - User mailing list archive at Nabble.com.
Search timeout for Solrcloud
Hi, We use solrcloud in production, and we are facing some issues with queries that take very long specially deep paging queries, these queries keep our servers very busy. i am looking for a way to stop (kill) queries taking longer than a specific amount of time (say 5 seconds), i checked timeAllowed but it doesn't work (again query runs completely). Also i noticed that there are connTimeout and socketTimeout for distributed searches, but i am not sure if they kill the thread (i want to save resources by killing the query, not just returning a timeout). Also, if i could get partial results that would be ideal. Any suggestions? Thanks, arin -- View this message in context: http://lucene.472066.n3.nabble.com/Search-timeout-for-Solrcloud-tp3987716.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: maxScore always returned
maybe look into your solrconfig.xml file whether fl not set by default on your request handler requestHandler/requestHandler -- View this message in context: http://lucene.472066.n3.nabble.com/maxScore-always-returned-tp3987727p3987733.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Correct way to deal with source data that may include a multivalued field that needs to be used for sorting?
Older versions of Solr didn't really sort correctly on multivalued fields, they just didn't complain G. Hmmm. Off the top of my head, you can: 1 You don't say what the documents to be indexed are. Are they Solr-style documents on disk or do you process them with, say, a SolrJ program? If the latter, you can simply inspect them as you construct them and decide which of the multi-valued field values you want to use to sort and copy that single value into a new field and sort on that. 2 You could write a custom UpdateRequestProcessorFactory/UpdateRequestProcessor pair and do the same thing in the processAdd method. Best Erick On Mon, Jun 4, 2012 at 10:17 PM, Aaron Daubman daub...@gmail.com wrote: Greetings, I have dirty source data where some documents being indexed, although unlikely, may contain multivalued fields that are also required for sorting. In previous versions of Solr, sorting on this field worked fine (possibly because few or no multivalued fields were ever encountered?), however, as of 3.6.0, thanks to https://issues.apache.org/jira/browse/SOLR-2339 attempting to sort on this field now throws an error: [2012-06-04 17:20:01,691] ERROR org.apache.solr.common.SolrException org.apache.solr.common.SolrException: can not sort on multivalued field: f_normalizedValue The relevant bits of the schema.xml are: fieldType name=sfloat class=solr.TrieFloatField precisionStep=0 positionIncrementGap=0 sortMissingLast=true/ dynamicField name=f_* type=sfloat indexed=true stored=true required=false multiValued=true/ Assuming that the source documents being indexed cannot be changed (which, at least for now, they cannot), what would be the next best way to allow for both the possibility of multiple f_normalizedValue fields appearing in indexed documents, as wel as being able to sort by f_normalizedValue? Thank you, Aaron
ReadTimeout on commit
Hi, I'm indexing documents in batches of 100 docs. Then commit. Sometimes I get this exception: org.apache.solr.client.solrj.SolrServerException: java.net.SocketTimeoutException: Read timed out at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS olrServer.java:475) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS olrServer.java:249) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractU pdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:178) I found some similar postings in the web, all recommending autocommit. This is unfortunately not an option for me, because I have to know whether solr committed or not. What is causing this timeout? I'm using these settings in solrj: server.setSoTimeout(1000); server.setConnectionTimeout(100); server.setDefaultMaxConnectionsPerHost(100); server.setMaxTotalConnections(100); server.setFollowRedirects(false); server.setAllowCompression(true); server.setMaxRetries(1); Thank you
Solr instances: many singles vs multi-core
Hi, I'm runing a cluster of Solr serveres for an index split up in a lot of shards. Each shard is replicated. Current setup is one Tomcat instance per shard, even if the Tomcats are running on the same machine. My question is this: Would it be more advisable to run one Tomcat per machine with all the shards as cores, or is the current setup the best, where each shard is running in its own Tomcat. As I see it, i would think that One Tomcat running multiple cores is better as it reduces the overhead of having many Tomcat instances, and it there is the possibility to let the cores share all available memory after how much they actually need. In the One Shard/One Tomcat scenario, each instance must have it predefined memory settings wether or not it needs more or less. Any opinions on the matter? Med venlig hilsen / Best Regards Christian von Wendt-Jensen
RE: maxScore always returned
Hi. We set fl in the request handler's default without score. thanks -Original message- From:darul daru...@gmail.com Sent: Tue 05-Jun-2012 12:05 To: solr-user@lucene.apache.org Subject: Re: maxScore always returned maybe look into your solrconfig.xml file whether fl not set by default on your request handler requestHandler/requestHandler -- View this message in context: http://lucene.472066.n3.nabble.com/maxScore-always-returned-tp3987727p3987733.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrDispatchFilter, no hits in response NamedList if distrib=true
Hi, I'm adding the numFound to the HTTP response header in a custom SolrDispatchFilter in the writeResponse() method, similar to the commented code in doFilter(). This works just fine but not for distributed requests. I'm trying to read hits from the SolrQueryResponse but it is not there for distrib=true requests. Any idea what i'm doing wrong? Thanks, Markus
Re: Search timeout for Solrcloud
There isn't a solution for killing long running queries that works. On Tue, Jun 5, 2012 at 1:34 AM, arin_g arin...@gmail.com wrote: Hi, We use solrcloud in production, and we are facing some issues with queries that take very long specially deep paging queries, these queries keep our servers very busy. i am looking for a way to stop (kill) queries taking longer than a specific amount of time (say 5 seconds), i checked timeAllowed but it doesn't work (again query runs completely). Also i noticed that there are connTimeout and socketTimeout for distributed searches, but i am not sure if they kill the thread (i want to save resources by killing the query, not just returning a timeout). Also, if i could get partial results that would be ideal. Any suggestions? Thanks, arin -- View this message in context: http://lucene.472066.n3.nabble.com/Search-timeout-for-Solrcloud-tp3987716.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Search timeout for Solrcloud
There's an open issue for improving deep paging performance: https://issues.apache.org/jira/browse/SOLR-1726 -Original message- From:arin_g arin...@gmail.com Sent: Tue 05-Jun-2012 12:03 To: solr-user@lucene.apache.org Subject: Search timeout for Solrcloud Hi, We use solrcloud in production, and we are facing some issues with queries that take very long specially deep paging queries, these queries keep our servers very busy. i am looking for a way to stop (kill) queries taking longer than a specific amount of time (say 5 seconds), i checked timeAllowed but it doesn't work (again query runs completely). Also i noticed that there are connTimeout and socketTimeout for distributed searches, but i am not sure if they kill the thread (i want to save resources by killing the query, not just returning a timeout). Also, if i could get partial results that would be ideal. Any suggestions? Thanks, arin -- View this message in context: http://lucene.472066.n3.nabble.com/Search-timeout-for-Solrcloud-tp3987716.html Sent from the Solr - User mailing list archive at Nabble.com.
filtering number and repeated contents
Is it possible to filter out numbers and disclaimer ( repeated contents) while indexing to SOLR? These are all surplus information and do not want to index it I have tried using boilerpipe algorithm as well to remove surplus infromation from web pages such as navigational elements, templates, and advertisements , I think it works well but looking forward to see If I could filter out disclaimer information too mainly in email texts. -- Thanks, *Nipen Mark *
Is it faster to search over many different fields or one field that combines the values of all those other fields?
Say I have various categories of 'tags'. I want a keyword search to search through my index of articles. So I search over: 1) the title. 2) the body 3) about 10 of these tag-categories. Each tag category is multivalued with a few words per value. Without considering the affect on 'relevance', and using the standard lucene query parser, would it be faster to specify each of these 10 fields in q (q = cat1:keyword OR cat2:keyword OR ... ), or to copyfield the stuff in those 10 fields into one combined field? Or is it such that I should be slapped in the face for even thinking about performance in this scenario? -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-faster-to-search-over-many-different-fields-or-one-field-that-combines-the-values-of-all-those-tp3987766.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strip html
I resolve my problem: I had to specify the field to return with my query. Thanks A LOT for your help ! -- View this message in context: http://lucene.472066.n3.nabble.com/Strip-html-tp3987051p398.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Correct way to deal with source data that may include a multivalued field that needs to be used for sorting?
By saying dirty data you imply that only one of the values is good or clean and that the others can be safely discarded/ignored, as opposed to true multi-valued data where each value is there for good reason and needs to be preserved. In any case, how do you know/decide which value should be used for sorting - and did you just get lucky that Solr happened to use the right one? The preferred technique would be the preprocess and clean the data before it is handed to Solr or SolrJ, even if the source must remain dirty. Baring that a preprocessor or a custom update processor certainly. Please clarify exactly how the data is being fed into Solr. And if you really do need to preserve the multiple values, simply store them in a separate field that is not sorted. An update processor can do this as well. -- Jack Krupansky -Original Message- From: Erick Erickson Sent: Tuesday, June 05, 2012 6:34 AM To: solr-user@lucene.apache.org Subject: Re: Correct way to deal with source data that may include a multivalued field that needs to be used for sorting? Older versions of Solr didn't really sort correctly on multivalued fields, they just didn't complain G. Hmmm. Off the top of my head, you can: 1 You don't say what the documents to be indexed are. Are they Solr-style documents on disk or do you process them with, say, a SolrJ program? If the latter, you can simply inspect them as you construct them and decide which of the multi-valued field values you want to use to sort and copy that single value into a new field and sort on that. 2 You could write a custom UpdateRequestProcessorFactory/UpdateRequestProcessor pair and do the same thing in the processAdd method. Best Erick On Mon, Jun 4, 2012 at 10:17 PM, Aaron Daubman daub...@gmail.com wrote: Greetings, I have dirty source data where some documents being indexed, although unlikely, may contain multivalued fields that are also required for sorting. In previous versions of Solr, sorting on this field worked fine (possibly because few or no multivalued fields were ever encountered?), however, as of 3.6.0, thanks to https://issues.apache.org/jira/browse/SOLR-2339 attempting to sort on this field now throws an error: [2012-06-04 17:20:01,691] ERROR org.apache.solr.common.SolrException org.apache.solr.common.SolrException: can not sort on multivalued field: f_normalizedValue The relevant bits of the schema.xml are: fieldType name=sfloat class=solr.TrieFloatField precisionStep=0 positionIncrementGap=0 sortMissingLast=true/ dynamicField name=f_* type=sfloat indexed=true stored=true required=false multiValued=true/ Assuming that the source documents being indexed cannot be changed (which, at least for now, they cannot), what would be the next best way to allow for both the possibility of multiple f_normalizedValue fields appearing in indexed documents, as wel as being able to sort by f_normalizedValue? Thank you, Aaron
Re: Can't index sub-entitties in DIH
Hi Gora, Your configuration files look fine. It would seem that something is going wrong with the SELECT in Oracle, or with the JDBC driver used to access Oracle. Could you try: * Manually doing the SELECT for the entity, and sub-entity to ensure that things are working. The SELECTs are working OK. * Check the JDBC settings. I'm using tha last version of jdbc6.jar for Oracle 11g. It seems JDBC setting is OK because solr brings data. Sorry, I do not have access to Oracle so that I cannot try this out myself. Also, have you checked the Solr logs for any error messages? Finally, I just noticed that you have extra quotes in: ...where usuario_idusuario = '${usuario.idusuario}' I doubt that is the cause of your problem, but you could try removing them. If I remove quotes, there is an error about this: SEVERE: Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT nombre FROM tipodocumento WHERE idtipodocumento = Processing Document # 1 at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:264) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:426) Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT nombre FROM tipodocumento WHERE idtipodocumento = Processing Document # 1 at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:621) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225) ... 3 more Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT nombre FROM tipodocumento WHERE idtipodocumento = Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.pullRow(EntityProcessorWrapper.java:330) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:296) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:683) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:709) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:619) ... 5 more Caused by: java.sql.SQLSyntaxErrorException: ORA-00936: missing expression at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445) at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396) at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:879) at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:450) at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192) at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531) at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:193) at oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:873) at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1167) at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1289) at oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.java:1909) at oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1871) at oracle.jdbc.driver.OracleStatementWrapper.execute(OracleStatementWrapper.java:318) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:246 My config files using Oracle are: db-data-config.xml dataConfig dataSource driver=oracle.jdbc.OracleDriver url=jdbc:oracle:thin:@localhost:1521:solr user=solr password=solr / document entity name=documento query=SELECT iddocumento,nrodocumento,asunto,autor,tipodocumento FROM documento field column=iddocumento name=iddocumento / field column=nrodocumento name=nrodocumento / field column=asunto name=asuntodocumento / field column=autor name=autor / field column=tipodocumento name=tipodocumento / entity name=tipodocumento1 query=SELECT nombre FROM tipodocumento WHERE idtipodocumento = '${documento.tipodocumento}' field column=nombre name=nombre / /entity /entity
Re: random results at specific slots
Take a look at query elevation. It may do exactly want you want, but at a minimum, it would show you how this kind of thing can be done. See: http://wiki.apache.org/solr/QueryElevationComponent -- Jack Krupansky -Original Message- From: srinir Sent: Tuesday, June 05, 2012 3:08 AM To: solr-user@lucene.apache.org Subject: random results at specific slots Hi, I would like to return results sorted by score (desc), but i would like to insert random results into some predefined slots (lets say 10, 14 and 18). The reason I want to do that is I boost click-through rate based features significantly and i want to give a chance to documents which doesnt have enough click through rate data. This would help the results stay fresh. I looked into solr code and it looks like i need a custom QueryComponent where once the top results are ordered, i can insert some random results at my predefined slots and then return. I am wondering whether there is any other way I can achieve the same? Thanks Srini -- View this message in context: http://lucene.472066.n3.nabble.com/random-results-at-specific-slots-tp3987719.html Sent from the Solr - User mailing list archive at Nabble.com.
HypericHQ plugins?
Hello SOLR users, is there someone who wrote plugins for HypericHQ to monitor the very many metrics SOLR exposes through JMX? I am a kind of newbie to JMX and the tutorials of Hyperic aren't simple enough to my taste... so I'd be helped if someone did it already. thanks in advance Paul
RE: Can't index sub-entitties in DIH
I sucessfully use Oracle with DIH although none of my imports have sub-entities. (slight difference, I'm on ojdbc5.jar w/10g...). It may be you have a driver that doesn't play well with DIH in some cases. You might want to try these possible workarounds: - rename the columns in SELECT with AS clauses. - in cases the columns are the same in SELECT as what you have in schema.xml, omit the field / tags (see http://wiki.apache.org/solr/DataImportHandler#A_shorter_data-config) These are shot-in-the-dark guesses. I wouldn't expect this to matter but you might as well try it. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Rafael Taboada [mailto:kaliman.fore...@gmail.com] Sent: Tuesday, June 05, 2012 8:58 AM To: solr-user@lucene.apache.org Subject: Re: Can't index sub-entitties in DIH Hi Gora, Your configuration files look fine. It would seem that something is going wrong with the SELECT in Oracle, or with the JDBC driver used to access Oracle. Could you try: * Manually doing the SELECT for the entity, and sub-entity to ensure that things are working. The SELECTs are working OK. * Check the JDBC settings. I'm using tha last version of jdbc6.jar for Oracle 11g. It seems JDBC setting is OK because solr brings data. Sorry, I do not have access to Oracle so that I cannot try this out myself. Also, have you checked the Solr logs for any error messages? Finally, I just noticed that you have extra quotes in: ...where usuario_idusuario = '${usuario.idusuario}' I doubt that is the cause of your problem, but you could try removing them. If I remove quotes, there is an error about this: SEVERE: Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT nombre FROM tipodocumento WHERE idtipodocumento = Processing Document # 1 at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:264) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:426) Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT nombre FROM tipodocumento WHERE idtipodocumento = Processing Document # 1 at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:621) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225) ... 3 more Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT nombre FROM tipodocumento WHERE idtipodocumento = Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.pullRow(EntityProcessorWrapper.java:330) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:296) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:683) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:709) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:619) ... 5 more Caused by: java.sql.SQLSyntaxErrorException: ORA-00936: missing expression at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445) at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396) at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:879) at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:450) at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192) at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531) at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:193) at oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:873) at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1167) at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1289) at oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.java:1909) at oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1871) at oracle.jdbc.driver.OracleStatementWrapper.execute(OracleStatementWrapper.java:318) at
Re: score filter
Hello Grant, I need to frame a query that is a combination of two query parts and I use a 'function' query to prepare the same. Something like: q={!type=func q.op=AND df=text}product(query($uq,0.0),query($cq,0.1)) where $uq and $cq are two queries. Now, I want a search result returned only if I get a hit on $uq. So, I specify default value of $uq query as 0.0 in order for the final score to be zero in cases where $uq doesn't record a hit. Even though, the scoring works as expected (i.e, document that don't match $uq have a score of zero), all the documents are returned as search results. Is there a way to filter search results that have a score of zero? Thanks for your help, Debdoot -- View this message in context: http://lucene.472066.n3.nabble.com/score-filter-tp493438p3987791.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can't index sub-entitties in DIH
Hi, One of the possibilities for this kind of issue to occur may be the case sensitivity of column names in Oracle. Can you apply a transformer and check the entity map which actually contains the keys and their values ? Also, please try specifying upper case field names for Oracle and try if that works. something like entity name=tipodocumento query=SELECT *NOMBRE* FROM tipodocumento where *IDTIPODOCUMENTO* = '${documento.*TIPODOCUMENTO*}' field column=*NOMBRE* name=nombre / /entity On Tue, Jun 5, 2012 at 9:57 AM, Rafael Taboada kaliman.fore...@gmail.comwrote: Hi Gora, Your configuration files look fine. It would seem that something is going wrong with the SELECT in Oracle, or with the JDBC driver used to access Oracle. Could you try: * Manually doing the SELECT for the entity, and sub-entity to ensure that things are working. The SELECTs are working OK. * Check the JDBC settings. I'm using tha last version of jdbc6.jar for Oracle 11g. It seems JDBC setting is OK because solr brings data. Sorry, I do not have access to Oracle so that I cannot try this out myself. Also, have you checked the Solr logs for any error messages? Finally, I just noticed that you have extra quotes in: ...where usuario_idusuario = '${usuario.idusuario}' I doubt that is the cause of your problem, but you could try removing them. If I remove quotes, there is an error about this: SEVERE: Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT nombre FROM tipodocumento WHERE idtipodocumento = Processing Document # 1 at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:264) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:426) Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT nombre FROM tipodocumento WHERE idtipodocumento = Processing Document # 1 at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:621) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225) ... 3 more Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT nombre FROM tipodocumento WHERE idtipodocumento = Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.pullRow(EntityProcessorWrapper.java:330) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:296) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:683) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:709) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:619) ... 5 more Caused by: java.sql.SQLSyntaxErrorException: ORA-00936: missing expression at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445) at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396) at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:879) at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:450) at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192) at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531) at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:193) at oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:873) at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1167) at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1289) at oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.java:1909) at oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1871) at oracle.jdbc.driver.OracleStatementWrapper.execute(OracleStatementWrapper.java:318) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:246 My config files using Oracle are: db-data-config.xml dataConfig
Re: Is it faster to search over many different fields or one field that combines the values of all those other fields?
I don't have the answer to your question, but I certainly don't think anybody should be slapped in the face for asking a question! Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Tue, Jun 5, 2012 at 8:50 AM, santamaria2 aravinda@contify.com wrote: Say I have various categories of 'tags'. I want a keyword search to search through my index of articles. So I search over: 1) the title. 2) the body 3) about 10 of these tag-categories. Each tag category is multivalued with a few words per value. Without considering the affect on 'relevance', and using the standard lucene query parser, would it be faster to specify each of these 10 fields in q (q = cat1:keyword OR cat2:keyword OR ... ), or to copyfield the stuff in those 10 fields into one combined field? Or is it such that I should be slapped in the face for even thinking about performance in this scenario? -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-faster-to-search-over-many-different-fields-or-one-field-that-combines-the-values-of-all-those-tp3987766.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can't index sub-entitties in DIH
Hi, Sorry, I am stumped, and cannot help further without access to Oracle. Please disregard the bit about the quotes: I was reading a single quote followed by a double quote as three single quotes. There was no issue there. Since your configurations for Oracle, and mysql are different, are you using different Solr cores/instances, or making sure to restart Solr in between configuration changes? Regards, Gora
Re: Can't index sub-entitties in DIH
Hi James. Thanks for your advice. As I said, alias works for me. I use joins instead of sub-entities... Heavily... These config files work for me... db-data-config.xml dataConfig dataSource type=JdbcDataSource name=jdbc driver=oracle.jdbc.OracleDriver url=jdbc:oracle:thin:@localhost:1521:solr user=solr password=solr / document entity name=documento query=SELECT d.iddocumento,d.nrodocumento,d.asunto AS asuntodocumento,d.autor,d.estado AS estadodocumento,d.fechacreacion AS fechacreaciondocumento,td.idtipodocumento,td.nombre AS nombretipodocumento,e.idexpediente,e.nroexpediente,e.nrointerno,e.asunto AS asuntoexpediente,e.clienterazonsocial,e.clienteapellidomaterno,e.clienteapellidopaterno,e.clientenombres,e.clientedireccionprincipal,e.estado AS estadoexpediente,e.fechacreacion AS fechacreacionexpediente,p.idproceso,p.nombre AS nombreproceso,o.idusuario AS idpropietario,o.nombres AS nombrespropietario,o.apellidos AS apellidospropietario,u.idunidad,u.nombre AS nombreunidad FROM documento d LEFT OUTER JOIN usuario o ON (d.propietario = o.idusuario) LEFT OUTER JOIN unidad u ON (o.idunidad = u.idunidad) LEFT OUTER JOIN tipodocumento td ON (d.tipodocumento = td.idtipodocumento) LEFT OUTER JOIN expediente e ON (d.expediente = e.idexpediente) LEFT OUTER JOIN proceso p ON (e.proceso = p.idproceso) field column=iddocumento name=iddocumento / field column=nrodocumento name=nrodocumento / field column=asuntodocumento name=asuntodocumento / field column=autor name=autor / field column=estadodocumento name=estadodocumento / field column=fechacreaciondocumento name=fechacreaciondocumento / field column=idtipodocumento name=idtipodocumento / field column=nombretipodocumento name=nombretipodocumento / field column=idexpediente name=idexpediente / field column=nroexpediente name=nroexpediente / field column=nrointerno name=nrointerno / field column=asuntoexpediente name=asuntoexpediente / field column=clienterazonsocial name=clienterazonsocial / field column=clienteapellidomaterno name=clienteapellidomaterno / field column=clienteapellidopaterno name=clienteapellidopaterno / field column=clientenombres name=clientenombres / field column=clientedireccionprincipal name=clientedireccionprincipal / field column=estadoexpediente name=estadoexpediente / field column=fechacreacionexpediente name=fechacreacionexpediente / field column=idproceso name=idproceso / field column=nombreproceso name=nombreproceso / field column=idpropietario name=idpropietario / field column=nombrespropietario name=nombrespropietario / field column=apellidospropietario name=apellidospropietario / field column=idunidad name=idunidad / field column=nombreunidad name=nombreunidad / /entity /document /dataConfig schema.xml ?xml version=1.0 ? schema name=siged version=1.1 types fieldtype name=string class=solr.StrField sortMissingLast=true omitNorms=true / fieldType name=tint class=solr.TrieIntField precisionStep=8 positionIncrementGap=0 / fieldType name=tdate class=solr.TrieDateField precisionStep=6 positionIncrementGap=0 / fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType /types fields !-- SIGED Documentos -- field name=iddocumento type=tint indexed=true stored=true required=true / field name=nrodocumento type=text_general indexed=true stored=true / field name=asuntodocumento type=text_general indexed=true stored=true / field name=autor type=text_general indexed=true stored=true / field name=estadodocumento type=string indexed=true stored=true / field name=fechacreaciondocumento type=string indexed=true stored=true / field name=idtipodocumento type=tint indexed=true stored=true / field name=nombretipodocumento type=text_general indexed=true stored=true / field name=idexpediente type=tint indexed=true stored=true / field name=nroexpediente type=text_general indexed=true stored=true / field name=nrointerno type=text_general indexed=true stored=true / field name=asuntoexpediente type=text_general indexed=true
Re: filtering number and repeated contents
My (very limited) understanding of boilerpipe in Tika is that it strips out short text, which is great for all the menu and navigation text, but the typical disclaimer at the bottom of an email is not very short and frequently can be longer than the email message body itself. You may have to resort to a custom update processor that is programmed with some disclaimer signature text strings to be removed from field values. -- Jack Krupansky -Original Message- From: Mark , N Sent: Tuesday, June 05, 2012 8:28 AM To: solr-user@lucene.apache.org Subject: filtering number and repeated contents Is it possible to filter out numbers and disclaimer ( repeated contents) while indexing to SOLR? These are all surplus information and do not want to index it I have tried using boilerpipe algorithm as well to remove surplus infromation from web pages such as navigational elements, templates, and advertisements , I think it works well but looking forward to see If I could filter out disclaimer information too mainly in email texts. -- Thanks, *Nipen Mark *
Re: Can't index sub-entitties in DIH
Hi Gora, Yes, I restart Solr for each change I do. Thanks for your help... An small question Is DIH work well with Oracle database? Using all the features It can do? On Tue, Jun 5, 2012 at 9:32 AM, Gora Mohanty g...@mimirtech.com wrote: Hi, Sorry, I am stumped, and cannot help further without access to Oracle. Please disregard the bit about the quotes: I was reading a single quote followed by a double quote as three single quotes. There was no issue there. Since your configurations for Oracle, and mysql are different, are you using different Solr cores/instances, or making sure to restart Solr in between configuration changes? Regards, Gora -- Rafael Taboada /* * Phone 992 741 026 */
Re: Can't index sub-entitties in DIH
On 5 June 2012 20:05, Rafael Taboada kaliman.fore...@gmail.com wrote: Hi James. Thanks for your advice. As I said, alias works for me. I use joins instead of sub-entities... Heavily... These config files work for me... [...] How about NULL values in the column that you are doing a left outer join on? Cannot test this right now, but I believe that a left outer join behaves differently from a DIH entity/sub-entity when it comes to NULLs. Regards, Gora
Re: Can't index sub-entitties in DIH
On 5 June 2012 20:08, Rafael Taboada kaliman.fore...@gmail.com wrote: Hi Gora, Yes, I restart Solr for each change I do. Thanks for your help... An small question Is DIH work well with Oracle database? Using all the features It can do? Unfortunately, I have never used DIH with Oracle. However, this should be a simple enough use case that it should just work. I think that we must be missing something obvious. For the sub-entity with Oracle case, what message do you get when the data-import concludes? Is the number of indexed documents correct? Are there any relevant messages in the Solr log files? Regards, Gora
Re: Is it faster to search over many different fields or one field that combines the values of all those other fields?
There may be a raw performance advantage to having all values in a single combined field, but then you loose the opportunity to boost title and tag field hits. With the extended dismax query parser you have the ability to specify the field list in the qf request parameter so that the query can simply be the keywords and operators without all of the extra OR operators. qf also lets you specify the boost for each field. -- Jack Krupansky -Original Message- From: santamaria2 Sent: Tuesday, June 05, 2012 8:50 AM To: solr-user@lucene.apache.org Subject: Is it faster to search over many different fields or one field that combines the values of all those other fields? Say I have various categories of 'tags'. I want a keyword search to search through my index of articles. So I search over: 1) the title. 2) the body 3) about 10 of these tag-categories. Each tag category is multivalued with a few words per value. Without considering the affect on 'relevance', and using the standard lucene query parser, would it be faster to specify each of these 10 fields in q (q = cat1:keyword OR cat2:keyword OR ... ), or to copyfield the stuff in those 10 fields into one combined field? Or is it such that I should be slapped in the face for even thinking about performance in this scenario? -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-faster-to-search-over-many-different-fields-or-one-field-that-combines-the-values-of-all-those-tp3987766.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: London OSS search social - meetup 6th June
Quick reminder, we're meeting at The Plough in Bloomsbury tomorrow night. Details and RSVP on the meetup page: http://www.meetup.com/london-search-social/events/65873032/ -- Richard Marr On 3 Jun 2012, at 00:29, Richard Marr richard.m...@gmail.com wrote: Apologies for the short notice guys, we're meeting up at The Plough in Bloomsbury on Wednesday 6th June. As usual the format is open and there's a healthy mix of experience and backgrounds. Please come and share wisdom, ask questions, geek out, etc. in the presence of beverages. -- Richard Marr
Re: Correct way to deal with source data that may include a multivalued field that needs to be used for sorting?
Thanks for the responses, By saying dirty data you imply that only one of the values is good or clean and that the others can be safely discarded/ignored, as opposed to true multi-valued data where each value is there for good reason and needs to be preserved. In any case, how do you know/decide which value should be used for sorting - and did you just get lucky that Solr happened to use the right one? I haven't gone back and checked the old version's docs where this was working, however, I suspect that either the field never ended up appearing in docs more than once, or if it did, it had the same value repeated... The real issue here is that the docs are created externally, and the producer won't (yet) guarantee that fields that should appear once will actually appear once. Because of this, I don't want to declare the field as multiValued=false as I don't want to cause indexing errors. It would be great for me (and apparently many others after searching) if there were an option as simple as forceSingleValued=true - where some deterministic behavior such as use first field encountered, ignore all others, would occur. The preferred technique would be the preprocess and clean the data before it is handed to Solr or SolrJ, even if the source must remain dirty. Baring that a preprocessor or a custom update processor certainly. I could write preprocessors (this is really what will eventually happen when the producer cleans their data), custom processors, etc... however, for something this simple it would be great not to be producing more code that would have to be maintained. Please clarify exactly how the data is being fed into Solr. I am using generic code to read from a key/value store and compose documents. This is another reason fixing the data at this point would not be desirable, the currently generic code would need to be made specific to look for these particular fields and then coerce them to single values... Thanks again, Aaron
Re: Is it faster to search over many different fields or one field that combines the values of all those other fields?
IRC, Lucene in Action book loops around this point almost every chapter: multifield query is faster. On Tue, Jun 5, 2012 at 7:04 PM, Jack Krupansky j...@basetechnology.comwrote: There may be a raw performance advantage to having all values in a single combined field, but then you loose the opportunity to boost title and tag field hits. With the extended dismax query parser you have the ability to specify the field list in the qf request parameter so that the query can simply be the keywords and operators without all of the extra OR operators. qf also lets you specify the boost for each field. -- Jack Krupansky -Original Message- From: santamaria2 Sent: Tuesday, June 05, 2012 8:50 AM To: solr-user@lucene.apache.org Subject: Is it faster to search over many different fields or one field that combines the values of all those other fields? Say I have various categories of 'tags'. I want a keyword search to search through my index of articles. So I search over: 1) the title. 2) the body 3) about 10 of these tag-categories. Each tag category is multivalued with a few words per value. Without considering the affect on 'relevance', and using the standard lucene query parser, would it be faster to specify each of these 10 fields in q (q = cat1:keyword OR cat2:keyword OR ... ), or to copyfield the stuff in those 10 fields into one combined field? Or is it such that I should be slapped in the face for even thinking about performance in this scenario? -- View this message in context: http://lucene.472066.n3.** nabble.com/Is-it-faster-to-**search-over-many-different-** fields-or-one-field-that-**combines-the-values-of-all-** those-tp3987766.htmlhttp://lucene.472066.n3.nabble.com/Is-it-faster-to-search-over-many-different-fields-or-one-field-that-combines-the-values-of-all-those-tp3987766.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Is it faster to search over many different fields or one field that combines the values of all those other fields?
On 5 June 2012 22:05, Mikhail Khludnev mkhlud...@griddynamics.com wrote: IRC, Lucene in Action book loops around this point almost every chapter: multifield query is faster. [...] Surely this is dependent on the type, and volume of one's data? As with many issues, isn't the answer that it depends, i.e., one should prototype, and have objective measures on one's own data-sets. Would love to be educated otherwise. Regards, Gora P.S. Have to get that book.
Re: Search timeout for Solrcloud
I'm curious... how deep is it that is becoming problematic? Tens of pages, hundreds, thousands, millions? And when you say deep paging, are you incrementing through all pages down to the depth or gapping to some very large depth outright? If the former, I am wondering if the Solr cache is building up with all those previous results. And is it that the time is simply moderately beyond expectations (e.g. 10 or 30 seconds or a minute compared to 1 second), or... are we talking about a situation where a core is terminally thrashing with garbage collection/OOM issues? -- Jack Krupansky -Original Message- From: arin_g Sent: Tuesday, June 05, 2012 1:34 AM To: solr-user@lucene.apache.org Subject: Search timeout for Solrcloud Hi, We use solrcloud in production, and we are facing some issues with queries that take very long specially deep paging queries, these queries keep our servers very busy. i am looking for a way to stop (kill) queries taking longer than a specific amount of time (say 5 seconds), i checked timeAllowed but it doesn't work (again query runs completely). Also i noticed that there are connTimeout and socketTimeout for distributed searches, but i am not sure if they kill the thread (i want to save resources by killing the query, not just returning a timeout). Also, if i could get partial results that would be ideal. Any suggestions? Thanks, arin -- View this message in context: http://lucene.472066.n3.nabble.com/Search-timeout-for-Solrcloud-tp3987716.html Sent from the Solr - User mailing list archive at Nabble.com.
Boost by Nested Query / Join Needed?
Hi, First off, I'm about a week into all things Solr, and still trying to figure out how to fit my relational-shaped peg through a denormalized hole. Please forgive my ignorance below :-D I have the need store a One-to-N type relationship, and perform a boost a related field. Let's say I want to index a number of different types of candy, and also a customer's preference for each type of candy (which I index/update when a customer makes a purchase), and then boost by that preference on search. Here is my paired-down attempt at a denormalized schema: ! -- Common Fields -- field name=id type=string indexed=true stored=true required=true / field name=type type=string indexed=true stored=true required=true / ! -- Fields for 'candy' -- field name=name type=text_general indexed=true stored=true/ field name=description type=text_general indexed=true stored=true/ ! -- Fields for Customer-Candy Preference ('preference') -- field name=user type=integer indexed=true stored=true field name=candy type=integer indexed=true stored=true field name=weight type=integer indexed=true stored=true default=0 I am indexing 'candy' and 'preferences' separately, and when indexing one, I leave the fields of the other empty (with the exception of the required 'id' and 'type'). Ignoring the query score, this is effectively what I'm looking to do in SQL: SELECT candy.id, candy.name, candy.description FROM candy LEFT JOIN preference ON (preference.candy = candy.id AND preference.customer = 'someCustomerID') // Where some match is made on query against candy.name or candy.description ORDER BY preference.weight DESC My questions are: 1.) Am I making any assumptions with respect to what are effectively different document types in the schema that will not scale well? I don't think I want to be duplicating each 'candy' entry for every customer, or maybe that wouldn't be such a big deal in Solr. 2.) Can someone point me in the right direction on how to perform this type of boost in a Solr query? Thanks in advance, Nick -- View this message in context: http://lucene.472066.n3.nabble.com/Boost-by-Nested-Query-Join-Needed-tp3987818.html Sent from the Solr - User mailing list archive at Nabble.com.
Is FileFloatSource's WeakHashMap cache only cleaned by GC?
We've encountered GC spikes at Etsy after adding new ExternalFileFields a decent number of times. I was always a little confused by this behavior -- isn't it just one big float[]? why does that cause problems for the GC? -- but looking at the FileFloatSource code a little more carefully, I wonder if this is due to using a WeakHashMap that is only cleaned by GC or manual invocation of a request handler. FileFloatSource stores a WeakHashMap containing IndexReader,float[] or CreationPlaceholder. In the code[1], it mentions that the implementation is modeled after the FieldCache implementation. However, the FieldCacheImpl adds listeners for IndexReader close events and uses those to purge its caches. [2] Should we be doing the same in FileFloatSource? Here's a mostly untested patch[3] with a possible implementation. There are probably better ways to do it (e.g. I don't love using another WeakHashMap), but I found it tough to hook into the IndexReader lifecycle without a) relying on classes other than FileFloatSource b) changing the public API of FIleFloatSource or c) changing the implementation too much. There is a RequestHandler inside of FileFloatSource (ReloadCacheRequestHandler) that can be used to clear the cache entirely[4], but this is sub-optimal for us for a few reasons: --It clears the entire cache. ExternalFileFields often take some non-trivial time to load and we prefer to do so during SolrCore warmups. Clearing the entire cache while serving traffic would likely cause user-facing requests to timeout. --It forces an extra commit with its consequent cache cycling, etc.. I'm thinking of ways to monitor the size of FileFloatSource's cache to track its size against GC pause times, but it seems tricky because even calling WeakHashMap#size() has side-effects. Any ideas? Overall, what do you think? Does relying on GC to clean this cache make sense as a possible cause of GC spikiness? If so, does the patch [3] look like a decent approach? Thanks! --Gregg [1] https://github.com/apache/lucene-solr/blob/a3914cb5c0243913b827762db2d616ad7cc6801d/solr/core/src/java/org/apache/solr/search/function/FileFloatSource.java#L135 [2] https://github.com/apache/lucene-solr/blob/1c0eee5c5cdfddcc715369dad9d35c81027bddca/lucene/core/src/java/org/apache/lucene/search/FieldCacheImpl.java#L166 [3] https://gist.github.com/2876371 [4] https://github.com/apache/lucene-solr/blob/a3914cb5c0243913b827762db2d616ad7cc6801d/solr/core/src/java/org/apache/solr/search/function/FileFloatSource.java#L310
Re: Search timeout for Solrcloud
for example when we set the start parameter to 1000, 2000 or higher (page 100, 200 ...), it takes very long (20, 30 seconds, sometimes even 100 seconds). this usually happens when there is a big gap between pages, mostly hit by web crawlers (when they crawl the last page link on our website). -- View this message in context: http://lucene.472066.n3.nabble.com/Search-timeout-for-Solrcloud-tp3987716p3987834.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.0 Clean Commit for production use
: Hey guys, I am trying to upgrade to Solr 4.0. Do you know where I get a clean Clarification: 4.0 does not exist yet. What does exist is the 4x branch, from which you can build snapshots that should be very similar to what will eventually be released as 4.0. : http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/ : and it looks like they have migrated to 5.0. From the link below it looks Correct, a 4x branch has been created off of trunk in anticipation of the 4.0 release process, so that more agressive experimental work beyond hte scope of 4.0 can continue on trunk. I've update the wiki to try and outline this based on the discussion from previous dev@lucene threads... https://wiki.apache.org/solr/Solr4.0 : My second question would be: Are there any known compatibility : issues/restrictions with previous versions of Lucene? (I just want to make : sure I can still use my data indexed with previous Solr/Lucene versions). The best thing to do is review the Upgrade instructions in CHANGES.txt, however those instructions hsould not be consdiered Final untill the final release is voted on -- there may be mistakes/ommissions, but the best way to help find those mistakes/ommisions is for users to help try out nightly builds and point them out when you notice them. -Hoss
Re: Solr 4.0 Clean Commit for production use
The Nightly Build wiki still says it is 4.x even though it is now 5.x. See: https://wiki.apache.org/solr/NightlyBuilds AFAIK, there isn't a 4.x nightly build running. (Is that going to happen soon??) You can checkout the repo for the 4x branch: http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x My (limited) understanding is that 4.x can read and write 3.x indexes, but any new/modified indexes will be incompatable with 3.x. And you have to be careful upgrading master/slave configurations, as noted in CHANGES.txt. -- Jack Krupansky -Original Message- From: Chris Hostetter Sent: Tuesday, June 05, 2012 5:37 PM To: solr-user@lucene.apache.org Subject: Re: Solr 4.0 Clean Commit for production use : Hey guys, I am trying to upgrade to Solr 4.0. Do you know where I get a clean Clarification: 4.0 does not exist yet. What does exist is the 4x branch, from which you can build snapshots that should be very similar to what will eventually be released as 4.0. : http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/ : and it looks like they have migrated to 5.0. From the link below it looks Correct, a 4x branch has been created off of trunk in anticipation of the 4.0 release process, so that more agressive experimental work beyond hte scope of 4.0 can continue on trunk. I've update the wiki to try and outline this based on the discussion from previous dev@lucene threads... https://wiki.apache.org/solr/Solr4.0 : My second question would be: Are there any known compatibility : issues/restrictions with previous versions of Lucene? (I just want to make : sure I can still use my data indexed with previous Solr/Lucene versions). The best thing to do is review the Upgrade instructions in CHANGES.txt, however those instructions hsould not be consdiered Final untill the final release is voted on -- there may be mistakes/ommissions, but the best way to help find those mistakes/ommisions is for users to help try out nightly builds and point them out when you notice them. -Hoss
Re: Solr 4.0 Clean Commit for production use
: The Nightly Build wiki still says it is 4.x even though it is now 5.x. : See: : https://wiki.apache.org/solr/NightlyBuilds : : AFAIK, there isn't a 4.x nightly build running. (Is that going to happen : soon??) Yes... http://mail-archives.apache.org/mod_mbox/lucene-dev/201205.mbox/%3c3fd307e7-7cd2-4042-8ba7-8a4561dbf...@email.android.com%3E -Hoss
Re: using Tika (ExtractingRequestHandler)
I've updated the wiki to try and fill in some of these holes... http://wiki.apache.org/solr/ExtractingRequestHandler : i'm looking at using Tika to index a bunch of documents. the wiki page seems to be a little bit out of date (// TODO: this is out of date as of Solr 1.4 - dist/apache-solr-cell-1.4.jar and all of contrib/extraction/lib are needed) and it also looks a little incomplete. : : is there an actual list of all the required jar files? i'm not sure they are in the same place in the 3.6.0 distribution as they were in 1.4, and having an actual list would be very helpful in figuring out where they are. : : as for Sending Documents to Solr, is there any plan to address this todo: // TODO: describe the different ways to send the documents to solr (POST body, form encoded, remoteStreaming). this is really just a nice to have, i can see how to accomplish my goals using a method that is currently documented. : : thanks, :richard : -Hoss
Re: Correct way to deal with source data that may include a multivalued field that needs to be used for sorting?
: The real issue here is that the docs are created externally, and the : producer won't (yet) guarantee that fields that should appear once will : actually appear once. Because of this, I don't want to declare the field as : multiValued=false as I don't want to cause indexing errors. It would be : great for me (and apparently many others after searching) if there were an : option as simple as forceSingleValued=true - where some deterministic : behavior such as use first field encountered, ignore all others, would : occur. This will be trivial in Solr 4.0, using one of the new FieldValueSubsetUpdateProcessorFactory classes that are now available -- just pick your rule... https://builds.apache.org/view/G-L/view/Lucene/job/Solr-trunk/javadoc/org/apache/solr/update/processor/FieldValueSubsetUpdateProcessorFactory.html Direct Known Subclasses: FirstFieldValueUpdateProcessorFactory, LastFieldValueUpdateProcessorFactory, MaxFieldValueUpdateProcessorFactory, MinFieldValueUpdateProcessorFactory -Hoss
Re: TermComponent and Optimize
: It seems that TermComponent is looking at all versions of documents in the index. : : Does this is the expected behavior for TermComponent? Any suggestion about how to solve this? Yes... http://wiki.apache.org/solr/TermsComponent The doc frequencies returned are the number of documents that match the term, including any documents that have been marked for deletion but not yet removed from the index. If you delete/replace a document in the index, it still contributes to the doc freq for that term until the deletion is expunged (either because of a natural segment merge, or forced merging due to optimize) The reason TermsComponent is so fast, is because it only looks at the raw terms, if you want to fix the counts to represent visible documents, you have to use something like faceting, which will be slower becuase it checks the actual (live) document counts. -Hoss
Re: using Tika (ExtractingRequestHandler)
Hoss, In your edit, I noticed that the wiki makes SolrPlugin a link, but to a nonexistent page, although the page SolrPlugins does exist. See: it is provided as a SolrPlugin, http://wiki.apache.org/solr/ExtractingRequestHandler I also noticed a few other things: 1. Reference to the /site directory that does not exist. So, the statement Note, the /site directory in the solr download contains some nice example docs to try is not terribly useful. 2. The path to tutorial.html should be ../../docs/api/doc-files 3. There is no tutorial.pdf file as referenced in the curl examples. -- Jack Krupansky -Original Message- From: Chris Hostetter Sent: Tuesday, June 05, 2012 6:47 PM To: solr-user@lucene.apache.org Subject: Re: using Tika (ExtractingRequestHandler) I've updated the wiki to try and fill in some of these holes... http://wiki.apache.org/solr/ExtractingRequestHandler : i'm looking at using Tika to index a bunch of documents. the wiki page seems to be a little bit out of date (// TODO: this is out of date as of Solr 1.4 - dist/apache-solr-cell-1.4.jar and all of contrib/extraction/lib are needed) and it also looks a little incomplete. : : is there an actual list of all the required jar files? i'm not sure they are in the same place in the 3.6.0 distribution as they were in 1.4, and having an actual list would be very helpful in figuring out where they are. : : as for Sending Documents to Solr, is there any plan to address this todo: // TODO: describe the different ways to send the documents to solr (POST body, form encoded, remoteStreaming). this is really just a nice to have, i can see how to accomplish my goals using a method that is currently documented. : : thanks, :richard : -Hoss
Re: Solr instances: many singles vs multi-core
It probably can work out reasonably well in both scenarios, but you do get some additional flexibility with multiple Tomcat instances: 1. Any per-instance Tomcat limits become per-core rather than for all cores on that machine. 2. If you have to restart Tomcat, only a single shard is impacted. 3. There are probably a fair number of little details that work better and with more parallelism if each Solr core is a separate JVM. E.g. BooleanQuery.maxTerms is across the whole JVM; PDFBox for Tika in SolrCell can have threads blocked due to a shared resource that is shared across cores in the JVM (was an issue - not sure if still an issue). But of course your usage may not run into any of them. It will depend a lot as well on how many CPU cores you have. -- Jack Krupansky -Original Message- From: Christian von Wendt-Jensen Sent: Tuesday, June 05, 2012 7:22 AM To: solr-user@lucene.apache.org Subject: Solr instances: many singles vs multi-core Hi, I'm runing a cluster of Solr serveres for an index split up in a lot of shards. Each shard is replicated. Current setup is one Tomcat instance per shard, even if the Tomcats are running on the same machine. My question is this: Would it be more advisable to run one Tomcat per machine with all the shards as cores, or is the current setup the best, where each shard is running in its own Tomcat. As I see it, i would think that One Tomcat running multiple cores is better as it reduces the overhead of having many Tomcat instances, and it there is the possibility to let the cores share all available memory after how much they actually need. In the One Shard/One Tomcat scenario, each instance must have it predefined memory settings wether or not it needs more or less. Any opinions on the matter? Med venlig hilsen / Best Regards Christian von Wendt-Jensen
Re: index special characters solr
Thanks for your reply! I tried using the types field in WordDelimiterFilterFactory wherein I was passing a text file which contained % $ as alphabets. But even then it didnt get indexed and neither did it show up in search results. Am I missing something? Thanks, Kushal -- View this message in context: http://lucene.472066.n3.nabble.com/index-special-characters-solr-tp3987157p3987888.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: I got ERROR, Unable to execute query
I used 3.x mysql. After I migrate to 5.x mysql, I don't get same error just like ' Unable to execute query'. Maybe low version of mysql and Solr have some problems, I don't know exactly. 2012/6/5 Jihyun Suh jhsuh.ourli...@gmail.com That's why I made a new DB for dataimport test. So my tables have no access or activity. Those are just dormant ones. -- My current suspicion is that there is activity in that table that is preventing DIH access. I mean, like maybe the table is being updated when DIH is failing. Maybe somebody is emptying the table and then regenerating it and your DIH run is catching the table when it is being emptied. Or something like that. -- Jack Krupansky 2012/6/4 Jihyun Suh jhsuh.ourli...@gmail.com I read your answer. Thank you. But I don't get that error from same table. This time I get error from test_5. but when I try to dataimport again, I can index test_5, but from test_7 I get that error. I don't know the reason. Could you help me? -- Is test_5 created by a stored procedure? If so, is there a possibility that the stored procedure may have done an update and not returned data - but just sometimes? -- Jack Krupansky 2012/6/2 Jihyun Suh jhsuh.ourli...@gmail.com I use many tables for indexing. During dataimport, I get errors for some tables like Unable to execute query. But next time, when I try to dataimport for that table, I can do successfully without any error. [Thread-17] ERROR o.a.s.h.d.EntityProcessorWrapper - Exception in entity : test_5:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT Title, url, synonym, description FROM test_5 WHERE status in ('1','s') Processing Document # 11046 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408) I use many tables for indexing. During dataimport, I get errors for some tables like Unable to execute query. But next time, when I try to dataimport for that table, I can do successfully without any error. [Thread-17] ERROR o.a.s.h.d.EntityProcessorWrapper - Exception in entity : test_5:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT Title, url, synonym, description FROM test_5 WHERE status in ('1','s') Processing Document # 11046 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
Solr, I have perfomance problem for indexing.
I have 128 tables of mysql 5.x and each table have 3,5000 rows. When I start dataimport(indexing) in Solr, it takes 5 minutes for one table. But When Solr indexs 20th table, it takes around 10 minutes for one table. And then When it indexs 40th table, it takes around 20 minutes for one table. Solr has some performance problem for too many documents? Should I set some configuration?
Re: index special characters solr
Thanks Jack for your help! I found my mistake, rather than classifying those special characters as ALPHA , I classified it as a DIGIT. Also I missed the same entry for search analyzer. So probably that was the reason for not getting relevant results. I spent a lot of time figuring this out. So I'll paste my code snippet of schema.xml which was changed for newbies so that they dont waste so much time in this. I classified my field as text in which I wanted to search for keywords including special characters. In fieldType definition modify the filter class=solr.WordDelimiterFilterFactory filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1 types=characters.txt / in BOTH analyzer type=index and analyzer type=query And make a new characters.txt in the same folder as schema.xml and add the content : $ = ALPHA % = ALPHA (i wanted $ and % to behave as alphabets so that they could be searched) Then restart jetty/tomcat This is how i solved this problem. Hope this would help someone :) -- View this message in context: http://lucene.472066.n3.nabble.com/index-special-characters-solr-tp3987157p3987891.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: index special characters solr
Thanks. I'm sure someone else will have the same issue at some point. -- Jack Krupansky -Original Message- From: KPK Sent: Tuesday, June 05, 2012 9:51 PM To: solr-user@lucene.apache.org Subject: Re: index special characters solr Thanks Jack for your help! I found my mistake, rather than classifying those special characters as ALPHA , I classified it as a DIGIT. Also I missed the same entry for search analyzer. So probably that was the reason for not getting relevant results. I spent a lot of time figuring this out. So I'll paste my code snippet of schema.xml which was changed for newbies so that they dont waste so much time in this. I classified my field as text in which I wanted to search for keywords including special characters. In fieldType definition modify the filter class=solr.WordDelimiterFilterFactory filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1 types=characters.txt / in BOTH analyzer type=index and analyzer type=query And make a new characters.txt in the same folder as schema.xml and add the content : $ = ALPHA % = ALPHA (i wanted $ and % to behave as alphabets so that they could be searched) Then restart jetty/tomcat This is how i solved this problem. Hope this would help someone :) -- View this message in context: http://lucene.472066.n3.nabble.com/index-special-characters-solr-tp3987157p3987891.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr, I have perfomance problem for indexing.
You wrote 3,5000, but is that 35 hundred (3,500) or 35 thousand (35,000)?? Your numbers seem far worse than what many people typically see with Solr and DIH. Is the database running on the same machine? Check the Solr log file to see if some errors (or warnings) might be occurring frequently. Check the log for the first table from when it starts to when it ends. How often is it committing (according to the log)? Does there seem to be any odd activity during that period? -- Jack Krupansky -Original Message- From: Jihyun Suh Sent: Tuesday, June 05, 2012 9:25 PM To: solr-user-h...@lucene.apache.org ; solr-user@lucene.apache.org Subject: Solr, I have perfomance problem for indexing. I have 128 tables of mysql 5.x and each table have 3,5000 rows. When I start dataimport(indexing) in Solr, it takes 5 minutes for one table. But When Solr indexs 20th table, it takes around 10 minutes for one table. And then When it indexs 40th table, it takes around 20 minutes for one table. Solr has some performance problem for too many documents? Should I set some configuration?
Re: Solr, I have perfomance problem for indexing.
Which Solr do you run? On Tue, Jun 5, 2012 at 8:02 PM, Jack Krupansky j...@basetechnology.com wrote: You wrote 3,5000, but is that 35 hundred (3,500) or 35 thousand (35,000)?? Your numbers seem far worse than what many people typically see with Solr and DIH. Is the database running on the same machine? Check the Solr log file to see if some errors (or warnings) might be occurring frequently. Check the log for the first table from when it starts to when it ends. How often is it committing (according to the log)? Does there seem to be any odd activity during that period? -- Jack Krupansky -Original Message- From: Jihyun Suh Sent: Tuesday, June 05, 2012 9:25 PM To: solr-user-h...@lucene.apache.org ; solr-user@lucene.apache.org Subject: Solr, I have perfomance problem for indexing. I have 128 tables of mysql 5.x and each table have 3,5000 rows. When I start dataimport(indexing) in Solr, it takes 5 minutes for one table. But When Solr indexs 20th table, it takes around 10 minutes for one table. And then When it indexs 40th table, it takes around 20 minutes for one table. Solr has some performance problem for too many documents? Should I set some configuration? -- Lance Norskog goks...@gmail.com
Hiring multiple Lucene/Solr Search Engineers
Hi, We are hiring multiple Lucene/Solr engineers, tech leads, architects based in Minneapolis - both full time and consulting for developing new search platform. Please reach out to me - svamb...@gmail.com Thanks, Venkat Ambati Sr. Manager, Best Buy
Replication
We are using SOLR 1.4, and we are experiencing full index replication every 15 minutes. I have checked the solrconfig and it has maxsegments set to 20. It appears like it is indexing a segment, but replicating the whole index. How can I verify it and possibly fix the issue? -- Bill Bell billnb...@gmail.com cell 720-256-8076