solr suggester not working with shards
I try to use the suggest component (solr 4.6) with multiple cores. I added a search component and a request handler in my solrconfig. That works fine for 1 core but querying my solr instance with the shards parameter does not work. searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggestDictionary/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.FSTLookupFactory/str str name=fieldsuggest/str float name=threshold0.0005/float str name=buildOnCommittrue/str /lst /searchComponent requestHandler name=/suggest class=org.apache.solr.handler.component.SearchHandler lst name=defaults str name=echoParamsnone/str str name=wtxml/str str name=indentfalse/str str name=spellchecktrue/str str name=spellcheck.dictionarysuggestDictionary/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count5/str str name=spellcheck.collatefalse/str str name=qt/suggest/str str name=shards.qt/suggest/str str name=shardslocalhost:8080/cores/core1,localhost:8080/cores/core2/str bool name=distribfalse/bool /lst arr name=components strsuggest/str /arr shardHandlerFactory class=HttpShardHandlerFactory int name=socketTimeOut1000/int int name=connTimeOut5000/int /shardHandlerFactory /requestHandler -- View this message in context: http://lucene.472066.n3.nabble.com/solr-suggester-not-working-with-shards-tp4163261.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search multiple values with wildcards
Hi Jack, Ahmet, Thanks for your tips! In the end I found this the best way to do it: q=proprietaryMessage_tis:(25++23456*++32A++130202US*) All the best -- View this message in context: http://lucene.472066.n3.nabble.com/Search-multiple-values-with-wildcards-tp4161916p4163263.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr suggester not working with shards
One more thing : suggest is not working with multiple cores using shard but 'did you mean' (spell check ) is working fine with multiple cores. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-suggester-not-working-with-shards-tp4163261p4163265.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: eDisMax parser and special characters
Hi, It seems me like there is difference in tokens generated during query and indexing time, you can tell us the your field type and the analyzers you are using to index that field. With Regards Aman Tandon On Wed, Oct 8, 2014 at 11:09 AM, Lanke,Aniruddha aniruddha.la...@cerner.com wrote: We are using a eDisMax parser in our configuration. When we search using the query term that has a ‘-‘ we don’t get any results back. Search term: red - yellow This doesn’t return any data back but Search term: red yellow Will give back result ‘red - yellow’ How does eDisMax treat special characters? What tweaks do we need to do, so when a user enters a ‘-‘ in the query e.g. red - yellow, we get the appropriate result back? Thanks, CONFIDENTIALITY NOTICE This message and any included attachments are from Cerner Corporation and are intended only for the addressee. The information contained in this message is confidential and may constitute inside or non-public information under international, federal, or state securities laws. Unauthorized forwarding, printing, copying, distribution, or use of such information is strictly prohibited and may be unlawful. If you are not the addressee, please promptly delete this message and notify the sender of the delivery error by e-mail or you may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
Re: dismax query does not match with additional field in qf
The query is not from a real use-case. We used it to test edge cases. I just asked to better understand the parser as its behavior did not match my expectations. Anyway, one use-case I can think of is a free search field for end-users where they can search in both ID and text fields including phrases - without specifying whether their query is an ID or full-text. Users typically just expect the right thing to happen. So application developers have to be aware of such effects. Maybe the newer simple query parser would be a better fit for us. There were also some good comments in SOLR-6602, especially a link to SOLR-3085 which describes a more realistic case with stopword removal. Thanks everybody! Regards, Andreas Jack Krupansky wrote on 10/07/2014 06:16 PM: Your query term seems particularly inappropriate for dismax - think simple keyword queries. Also, don't confuse dismax and edismax - maybe you want the latter. The former is for... simple keyword queries. I'm still not sure what your actual use case really is. In particular, are you trying to do a full, exact match on the string field, or a substring match? You can do the latter with wildcards or regex, but normally the former (exact match) is used. Maybe simply enclosing the complex term in quotes to make it a phrase query is what you need - that would do an exact match on the string field, but a tokenized phrase match on the text field, and support partial matches on the text field as a phrase of contiguous terms. -- Jack Krupansky -Original Message- From: Andreas Hubold Sent: Tuesday, October 7, 2014 12:08 PM To: solr-user@lucene.apache.org Subject: Re: dismax query does not match with additional field in qf Okay, sounds reasonable. However I didn't expect this when reading the documentation of the dismax query parser. Especially the need to escape special characters (and which ones) was not clear to me as the dismax query parser is designed to process simple phrases (without complex syntax) entered by users and special characters (except AND and OR) are escaped by the parser - as written on https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser Do you know if the new Simple Query Parser has the same behaviour when searching across multiple fields? Or could it be used instead to search across text_general and string fields of arbitrary content without additional query preprocessing to get results for matches in any of these fields (as in field1:STUFF OR field2:STUFF). Thank you, Andreas Jack Krupansky wrote on 10/07/2014 05:24 PM: I think what is happening is that your last term, the naked apostrophe is analyzing to zero terms and simply being ignored, but when you add the extra field, a string field, you now have another term in the query, and you have mm set to 100%, so that new term must match. It probably fails because you have no naked apostrophe term in that field in the index. Probably none of your string field terms were matching before, but that wasn't apparent since the tokenized text matched. But with this naked apostrophe term, there is no way to tell Lucene to match no term, so it requried the string term to match, which won't happen since only the full string is indexed. Generally, you need to escape all special characters in a query. Then hopefully your string field will match. -- Jack Krupansky -Original Message- From: Andreas Hubold Sent: Tuesday, September 30, 2014 11:14 AM To: solr-user@lucene.apache.org Subject: dismax query does not match with additional field in qf Hi, I ran into a problem with the Solr dismax query parser. We're using Solr 4.10.0 and the field types mentioned below are taken from the example schema.xml. In a test we have a document with rather strange content in a field named name_tokenized of type text_general: abc_iframe src='loadLocale.js' onload='javascript:document.XSSed=name' width=0 height=0 (It's a test for XSS bug detection, but that doesn't matter here.) I can find the document when I use the following dismax query with qf set to field name_tokenized only: http://localhost:44080/solr/studio/editor?deftype=dismaxq=abc_%3Ciframe+src%3D%27loadLocale.js%27+onload%3D%27javascript%3Adocument.XSSed%3D%22name%22%27debug=trueechoParams=allqf=name_tokenized^2 If I submit exactly the same query but add another field feederstate to the qf parameter, I don't get any results anymore. The field is of type string. http://localhost:44080/solr/studio/editor?deftype=dismaxq=abc_%3Ciframe+src%3D%27loadLocale.js%27+onload%3D%27javascript%3Adocument.XSSed%3D%22name%22%27debug=trueechoParams=allqf=name_tokenized^2%20feederstate The decoded value of q is: abc_iframe src='loadLocale.js' onload='javascript:document.XSSed=name' and it seems the trailing single-quote causes problems here. (In fact, I can find the document when I remove the last char) The parsed query for the latter case is ( +((
RE: Solr configuration, memory usage and MMapDirectory
Hi I'm currently setting up jconsole but as I have to remotely monitor (no gui capability on the server) I have to wait before I can restart solr with a JMX port setup. In the meantime I looked at top and given the calculations you said based on your top output and this top of my java process from the node that handles the querying, the indexing node has a similar memory profile: https://www.dropbox.com/s/pz85dm4e7qpepco/SolrTop.png?dl=0 It would seem I need a monstrously large heap in the 60GB region? We do use a lot of navigators/filters so I have set the caches to be quite large for these, are these what are using up the memory? Thanks Si -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: 06 October 2014 16:56 To: solr-user@lucene.apache.org Subject: Re: Solr configuration, memory usage and MMapDirectory On 10/6/2014 9:24 AM, Simon Fairey wrote: I've inherited a Solr config and am doing some sanity checks before making some updates, I'm concerned about the memory settings. System has 1 index in 2 shards split across 2 Ubuntu 64 bit nodes, each node has 32 CPU cores and 132GB RAM, we index around 500k files a day spread out over the day in batches every 10 minutes, a portion of these are updates to existing content, maybe 5-10%. Currently MergeFactor is set to 2 and commit settings are: autoCommit maxTime6/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime90/maxTime /autoSoftCommit Currently each node has around 25M docs in with an index size of 45GB, we prune the data every few weeks so it never gets much above 35M docs per node. On reading I've seen a recommendation that we should be using MMapDirectory, currently it's set to NRTCachingDirectoryFactory. However currently the JVM is configured with -Xmx131072m, and for MMapDirectory I've read you should use less memory for the JVM so there is more available for the OS caching. Looking at the dashboard in the JVM memory usage I see: enter image description here Not sure I understand the 3 bands, assume 127.81 is Max, dark grey is in use at the moment and the light grey is allocated as it was used previously but not been cleaned up yet? I'm trying to understand if this will help me know how much would be a good value to change Xmx to, i.e. say 64GB based on light grey? Additionally once I've changed the max heap size is it a simple case of changing the config to use MMapDirectory or are there things i need to watch out for? NRTCachingDirectoryFactory is a wrapper directory implementation. The wrapped Directory implementation is used with some code between that implementation and the consumer (Solr in this case) that does caching for NRT indexing. The wrapped implementation is MMapDirectory, so you do not need to switch, you ARE using MMap. Attachments rarely make it to the list, and that has happened in this case, so I cannot see any of your pictures. Instead, look at one of mine, and the output of a command from the same machine, running Solr 4.7.2 with Oracle Java 7: https://www.dropbox.com/s/91uqlrnfghr2heo/solr-memory-sorted-top.png?dl=0 [root@idxa1 ~]# du -sh /index/solr4/data/ 64G /index/solr4/data/ I've got 64GB of index data on this machine, used by about 56 million documents. I've also got 64GB of RAM. The solr process shows a virtual memory size of 54GB, a resident size of 16GB, and a shared size of 11GB. My max heap on this process is 6GB. If you deduct the shared memory size from the resident size, you get about 5GB. The admin dashboard for this machine says the current max heap size is 5.75GB, so that 5GB is pretty close to that, and probably matches up really well when you consider that the resident size may be considerably more than 16GB and the shared size may be just barely over 11GB. My system has well over 9GB free memory and 44GB is being used for the OS disk cache. This system is NOT facing memory pressure. The index is well-cached and there is even memory that is not used *at all*. With an index size of 45GB and 132GB of RAM, you're unlikely to be having problems with memory unless your heap size is *ENORMOUS*. You *should* have your garbage collection highly tuned, especially if your max heap larger than 2 or 3GB. I would guess that a 4 to 6GB heap is probably enough for your needs, unless you're doing a lot with facets, sorting, or Solr's caches, then you may need more. Here's some info about heap requirements, followed by information about garbage collection tuning: http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap Your automatic commit settings do not raise any red flags with me. Those are sensible settings. Thanks, Shawn
How to link tables based on range values solr data-config
Hi , Businessmasters Business_id Business_point 13.4 22.8 38.0 Business_Colors business_colors_id business_rating_from business_rating_to rating 1 2 5 OK 2 5 10 GOOD 310 15 Excellent I want to link the two tables based business_rating_from and business_rating_to like SELECT business_colors_id,business_rating_from,business_rating_to,rating where business_rating_from = 2 AND business_rating_to 5; Now i want to index them into solr.This is how my data-config file looks entity name=business_colors query=SELECT business_colors_id, business_rating_from,business_rating_to,business_text,hex_colors,rgb_colors,business_colors_modify from business_colors where business_rating_from gt;= '${businessmasters.business_point}' AND business_rating_to lt; '${businessmasters.business_point}' deltaQuery=select business_colors_id from business_colors where business_colors_modify'${dih.last_index_time}' parentDeltaQuery=select business_id from businessmasters where business_point lt; ${business_colors.business_rating_from} AND business_point gt;= ${business_colors.business_rating_from} field column=business_colors_id name=id/ field column=business_rating_from name=business_rating_from indexed=true stored=true / field column=business_rating_to name=business_rating_to indexed=true stored=true / field column=business_text name=business_text indexed=true stored=true / field column=hex_colors name=hex_colors indexed=true stored=true / field column=rgb_colors name=rgb_colors indexed=true stored=true / field column=business_colors_modify name=business_colors_modify indexed=true stored=true/ When i click full indexing data does not get index and no error is shown. What is wrong with this,Can any one help and advise. How do i achieve what i want to do I also have this question posted on stack over flow http://stackoverflow.com/questions/26256344/how-to-link-tables-based-on-range-values-solr-data-config -- Regards Madhav Bahuguna
NullPointerException for ExternalFileField when key field has no terms
Hi, I use various ID fields as the keys for various ExternalFileField fields, and I have noticed that I will sometimes get the following error: ERROR org.apache.solr.servlet.SolrDispatchFilter û null:java.lang.NullPointerException at org.apache.solr.search.function.FileFloatSource.getFloats(FileFloatSource.java:273) at org.apache.solr.search.function.FileFloatSource.access$000(FileFloatSource.java:51) at org.apache.solr.search.function.FileFloatSource$2.createValue(FileFloatSource.java:147) at org.apache.solr.search.function.FileFloatSource$Cache.get(FileFloatSource.java:190) at org.apache.solr.search.function.FileFloatSource.getCachedFloats(FileFloatSource.java:141) at org.apache.solr.search.function.FileFloatSource.getValues(FileFloatSource.java:84) at org.apache.solr.response.transform.ValueSourceAugmenter.transform(ValueSourceAugmenter.java:95) at org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:252) at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:170) at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:184) at org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:300) at org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:96) at org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:61) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:765) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source) The source code referenced in the error is below (FileFloatSource.java:273): TermsEnum termsEnum = MultiFields.getTerms(reader, idName).iterator(null); So if there are no terms in the index for the key field, then getTerms will return null, and of course trying to call iterator on null will cause the exception. For my use-case, it makes sense that the key field may have no terms (initially) because there are various types of documents sharing the index, and they will not all exist at the onset. The default value for the EFF would suffice in those cases. Is this worthy of a JIRA? I have gone through whatever documentation I can find for ExternalFileField and I can't seem to find anything about requiring key terms first. It seems that this error is not encountered often because users generally set the unique key field as the external file key field, so it always exists. The workaround is to ensure at least
RE: NullPointerException for ExternalFileField when key field has no terms
Hi - yes it is worth a ticket as the javadoc says it is ok: http://lucene.apache.org/solr/4_10_1/solr-core/org/apache/solr/schema/ExternalFileField.html -Original message- From:Matthew Nigl matthew.n...@gmail.com Sent: Wednesday 8th October 2014 14:48 To: solr-user@lucene.apache.org Subject: NullPointerException for ExternalFileField when key field has no terms Hi, I use various ID fields as the keys for various ExternalFileField fields, and I have noticed that I will sometimes get the following error: ERROR org.apache.solr.servlet.SolrDispatchFilter û null:java.lang.NullPointerException at org.apache.solr.search.function.FileFloatSource.getFloats(FileFloatSource.java:273) at org.apache.solr.search.function.FileFloatSource.access$000(FileFloatSource.java:51) at org.apache.solr.search.function.FileFloatSource$2.createValue(FileFloatSource.java:147) at org.apache.solr.search.function.FileFloatSource$Cache.get(FileFloatSource.java:190) at org.apache.solr.search.function.FileFloatSource.getCachedFloats(FileFloatSource.java:141) at org.apache.solr.search.function.FileFloatSource.getValues(FileFloatSource.java:84) at org.apache.solr.response.transform.ValueSourceAugmenter.transform(ValueSourceAugmenter.java:95) at org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:252) at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:170) at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:184) at org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:300) at org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:96) at org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:61) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:765) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source) The source code referenced in the error is below (FileFloatSource.java:273): TermsEnum termsEnum = MultiFields.getTerms(reader, idName).iterator(null); So if there are no terms in the index for the key field, then getTerms will return null, and of course trying to call iterator on null will cause the exception. For my use-case, it makes sense that the key field may have no terms (initially) because there are various types of documents
Re: SolrCloud with client ssl
Hi, I answered at https://issues.apache.org/jira/browse/SOLR-6595: * Does it work with createNodeSet when using plain SolrCloud without SSL? * Please provide the exact CollectionApi request you used when it failed, so we can see if the syntax is correct. Also, is 443 your secure port number in Jetty/Tomcat? ...but perhaps keep the conversation going here until it is a confirmed bug :) -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 7. okt. 2014 kl. 06:57 skrev Sindre Fiskaa s...@dips.no: Followed the description https://cwiki.apache.org/confluence/display/solr/Enabling+SSL and generated a self signed key pair. Configured a few solr-nodes and used the collection api to crate a new collection. I get error message when specify the nodes with the createNodeSet param. When I don't use createNodeSet param the collection gets created without error on random nodes. Could this be a bug related to the createNodeSet param? response lst name=responseHeaderint name=status0/intint name=QTime185/int/lstlst name=failurestrorg.apache.solr.client.solrj.SolrServerException:IOException occured when talking to server at:https://vt-searchln04:443/solr/str/lsthttps://vt-searchln04/solr%3C/str%3E%3C/lst%3E /response
Re: NullPointerException for ExternalFileField when key field has no terms
Thanks Markus. I initially interpreted the line It's OK to have a keyField value that can't be found in the index as meaning that the key field value in the external file does not have to exist as a term in the index. On 8 October 2014 23:56, Markus Jelsma markus.jel...@openindex.io wrote: Hi - yes it is worth a ticket as the javadoc says it is ok: http://lucene.apache.org/solr/4_10_1/solr-core/org/apache/solr/schema/ExternalFileField.html -Original message- From:Matthew Nigl matthew.n...@gmail.com Sent: Wednesday 8th October 2014 14:48 To: solr-user@lucene.apache.org Subject: NullPointerException for ExternalFileField when key field has no terms Hi, I use various ID fields as the keys for various ExternalFileField fields, and I have noticed that I will sometimes get the following error: ERROR org.apache.solr.servlet.SolrDispatchFilter û null:java.lang.NullPointerException at org.apache.solr.search.function.FileFloatSource.getFloats(FileFloatSource.java:273) at org.apache.solr.search.function.FileFloatSource.access$000(FileFloatSource.java:51) at org.apache.solr.search.function.FileFloatSource$2.createValue(FileFloatSource.java:147) at org.apache.solr.search.function.FileFloatSource$Cache.get(FileFloatSource.java:190) at org.apache.solr.search.function.FileFloatSource.getCachedFloats(FileFloatSource.java:141) at org.apache.solr.search.function.FileFloatSource.getValues(FileFloatSource.java:84) at org.apache.solr.response.transform.ValueSourceAugmenter.transform(ValueSourceAugmenter.java:95) at org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:252) at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:170) at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:184) at org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:300) at org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:96) at org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:61) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:765) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source) The source code referenced
Re: eDisMax parser and special characters
Try escaping special chars with a \ On 10/08/2014 01:39 AM, Lanke,Aniruddha wrote: We are using a eDisMax parser in our configuration. When we search using the query term that has a ‘-‘ we don’t get any results back. Search term: red - yellow This doesn’t return any data back but
Re: Filter cache pollution during sharded edismax queries
On 01/10/2014 09:55, jim ferenczi wrote: I think you should test with facet.shard.limit=-1 this will disallow the limit for the facet on the shards and remove the needs for facet refinements. I bet that returning every facet with a count greater than 0 on internal queries is cheaper than using the filter cache to handle a lot of refinements. I'm happy to report that in our case setting facet.limit=-1 has a significant impact on performance, cache hit ratios and reduced CPU load. Thanks to all who replied! Cheers Charlie Flax Jim 2014-10-01 10:24 GMT+02:00 Charlie Hull char...@flax.co.uk: On 30/09/2014 22:25, Erick Erickson wrote: Just from a 20,000 ft. view, using the filterCache this way seems...odd. +1 for using a different cache, but that's being quite unfamiliar with the code. Here's a quick update: 1. LFUCache performs worse so we returned to LRUCache 2. Making the cache smaller than the default 512 reduced performance. 3. Raising the cache size to 2048 didn't seem to have a significant effect on performance but did reduce CPU load significantly. This may help our client as they can reduce their system spec considerably. We're continuing to test with our client, but the upshot is that even if you think you don't need the filter cache, if you're doing distributed faceting you probably do, and you should size it based on experimentation. In our case there is a single filter but the cache needs to be considerably larger than that! Cheers Charlie On Tue, Sep 30, 2014 at 1:53 PM, Alan Woodward a...@flax.co.uk wrote: Once all the facets have been gathered, the co-ordinating node then asks the subnodes for an exact count for the final top-N facets, What's the point to refine these counts? I've thought that it make sense only for facet.limit ed requests. Is it correct statement? can those who suffer from the low performance, just unlimit facet.limit to avoid that distributed hop? Presumably yes, but if you've got a sufficiently high cardinality field then any gains made by missing out the hop will probably be offset by having to stream all the return values back again. Alan -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk
Re: eDisMax parser and special characters
There's not much information here. What's the doc look like? What is the analyzer chain for it? What is the output when you add debug=query? Details matter. A lot ;) Best, Erick On Wed, Oct 8, 2014 at 6:26 AM, Michael Joyner mich...@newsrx.com wrote: Try escaping special chars with a \ On 10/08/2014 01:39 AM, Lanke,Aniruddha wrote: We are using a eDisMax parser in our configuration. When we search using the query term that has a ‘-‘ we don’t get any results back. Search term: red - yellow This doesn’t return any data back but
WhitespaceTokenizer to consider incorrectly encoded c2a0?
Hi, For some crazy reason, some users somehow manage to substitute a perfectly normal space with a badly encoded non-breaking space, properly URL encoded this then becomes %c2a0 and depending on the encoding you use to view you probably see  followed by a space. For example: Because c2a0 is not considered whitespace (indeed, it is not real whitespace, that is 00a0) by the Java Character class, the WhitespaceTokenizer won't split on it, but the WordDelimiterFilter still does, somehow mitigating the problem as it becomes: HTMLSCF een abonnement WT een abonnement WDF een eenabonnement abonnement Should the WhitespaceTokenizer not include this weird edge case? Cheers, Markus
Re: SolrCloud with client ssl
Yes, running SolrCloud without SSL it works fine with the createNodeSet param. I run this with Tomcat application server and 443 enabled. Although I receive this error message the collection and the shards gets created and the clusterstate.json updated, but the cores are missing. I manual add them one by one in the admin console so I get my cloud up running and the solr-nodes are able to talk to each other - no certificate issues or SSL handshake error between the nodes. curl -E solr-ssl.pem:secret12 -k 'https://vt-searchln03:443/solr/admin/collections?action=CREATEnumShards=3 replicationFactor=2name=multishardingcreateNodeSet=vt-searchln03:443_sol r,vt-searchln04:443_solr,vt-searchln01:443_solr,vt-searchln02:443_solr,vt-s earchln05:443_solr,vt-searchln06:443_solr' ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime206/int/lstlst name=failurestrorg.apache.solr.client.solrj.SolrServerException:IOExce ption occured when talking to server at: https://vt-searchln03:443/solr https://vt-searchln03/solr/strstrorg.apache.solr.client.solrj.SolrSer verException:IOException occured when talking to server at: https://vt-searchln04:443/solr https://vt-searchln04/solr/strstrorg.apache.solr.client.solrj.SolrSer verException:IOException occured when talking to server at: https://vt-searchln06:443/solr https://vt-searchln06/solr/strstrorg.apache.solr.client.solrj.SolrSer verException:IOException occured when talking to server at: https://vt-searchln05:443/solr https://vt-searchln05/solr/strstrorg.apache.solr.client.solrj.SolrSer verException:IOException occured when talking to server at: https://vt-searchln01:443/solr https://vt-searchln01/solr/strstrorg.apache.solr.client.solrj.SolrSer verException:IOException occured when talking to server at: https://vt-searchln02:443/solr https://vt-searchln02/solr/str/lst /response -Sindre On 08.10.14 15:14, Jan Høydahl jan@cominvent.com wrote: Hi, I answered at https://issues.apache.org/jira/browse/SOLR-6595: * Does it work with createNodeSet when using plain SolrCloud without SSL? * Please provide the exact CollectionApi request you used when it failed, so we can see if the syntax is correct. Also, is 443 your secure port number in Jetty/Tomcat? ...but perhaps keep the conversation going here until it is a confirmed bug :) -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 7. okt. 2014 kl. 06:57 skrev Sindre Fiskaa s...@dips.no: Followed the description https://cwiki.apache.org/confluence/display/solr/Enabling+SSL and generated a self signed key pair. Configured a few solr-nodes and used the collection api to crate a new collection. I get error message when specify the nodes with the createNodeSet param. When I don't use createNodeSet param the collection gets created without error on random nodes. Could this be a bug related to the createNodeSet param? response lst name=responseHeaderint name=status0/intint name=QTime185/int/lstlst name=failurestrorg.apache.solr.client.solrj.SolrServerException:IOEx ception occured when talking to server at:https://vt-searchln04:443/solr/str/lsthttps://vt-searchln04/solr% 3C/str%3E%3C/lst%3E /response
Re: WhitespaceTokenizer to consider incorrectly encoded c2a0?
Is this a suggestion for JIRA ticket? Or a question on how to solve it? If the later, you could probably stick a RegEx replacement in the UpdateRequestProcessor chain and be done with it. As to why? I would look for the rest of the MSWord-generated artifacts, such as smart quotes, extra-long dashes, etc. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 8 October 2014 09:59, Markus Jelsma markus.jel...@openindex.io wrote: Hi, For some crazy reason, some users somehow manage to substitute a perfectly normal space with a badly encoded non-breaking space, properly URL encoded this then becomes %c2a0 and depending on the encoding you use to view you probably see  followed by a space. For example: Because c2a0 is not considered whitespace (indeed, it is not real whitespace, that is 00a0) by the Java Character class, the WhitespaceTokenizer won't split on it, but the WordDelimiterFilter still does, somehow mitigating the problem as it becomes: HTMLSCF een abonnement WT een abonnement WDF een eenabonnement abonnement Should the WhitespaceTokenizer not include this weird edge case? Cheers, Markus
RE: WhitespaceTokenizer to consider incorrectly encoded c2a0?
Alexandre - i am sorry if i was not clear, this is about queries, this all happens at query time. Yes we can do the substitution in with the regex replace filter, but i would propose this weird exception to be added to WhitespaceTokenizer so Lucene deals with this by itself. Markus -Original message- From:Alexandre Rafalovitch arafa...@gmail.com Sent: Wednesday 8th October 2014 16:12 To: solr-user solr-user@lucene.apache.org Subject: Re: WhitespaceTokenizer to consider incorrectly encoded c2a0? Is this a suggestion for JIRA ticket? Or a question on how to solve it? If the later, you could probably stick a RegEx replacement in the UpdateRequestProcessor chain and be done with it. As to why? I would look for the rest of the MSWord-generated artifacts, such as smart quotes, extra-long dashes, etc. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 8 October 2014 09:59, Markus Jelsma markus.jel...@openindex.io wrote: Hi, For some crazy reason, some users somehow manage to substitute a perfectly normal space with a badly encoded non-breaking space, properly URL encoded this then becomes %c2a0 and depending on the encoding you use to view you probably see  followed by a space. For example: Because c2a0 is not considered whitespace (indeed, it is not real whitespace, that is 00a0) by the Java Character class, the WhitespaceTokenizer won't split on it, but the WordDelimiterFilter still does, somehow mitigating the problem as it becomes: HTMLSCF een abonnement WT een abonnement WDF een eenabonnement abonnement Should the WhitespaceTokenizer not include this weird edge case? Cheers, Markus
Using Velocity with Child Documents?
Hi - I am trying to index a collection that has child documents. I have successfully loaded the data into my index using SolrJ, and I have verified that I can search correctly using the child of method in my fq variable. Now, I would like to use Velocity (Solritas) to display the parent records with some details of the child records underneath. Is there an easy way to do this? Is there an example somewhere that I can look at? Thanks, Josh Edwards The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.
Re: WhitespaceTokenizer to consider incorrectly encoded c2a0?
The source code uses that Java Character.isWhitespace method which specifically excludes the non-breaking white space characters. The Javadoc contract for WhitespaceTokenizer is too vague, especially since Unicode has so many... subtleties. Personally, I'd go along with treating non-breaking white space as white space here. And update the Lucene Javadoc contract to be more explicit. -- Jack Krupansky -Original Message- From: Markus Jelsma Sent: Wednesday, October 8, 2014 10:16 AM To: solr-user@lucene.apache.org ; solr-user Subject: RE: WhitespaceTokenizer to consider incorrectly encoded c2a0? Alexandre - i am sorry if i was not clear, this is about queries, this all happens at query time. Yes we can do the substitution in with the regex replace filter, but i would propose this weird exception to be added to WhitespaceTokenizer so Lucene deals with this by itself. Markus -Original message- From:Alexandre Rafalovitch arafa...@gmail.com Sent: Wednesday 8th October 2014 16:12 To: solr-user solr-user@lucene.apache.org Subject: Re: WhitespaceTokenizer to consider incorrectly encoded c2a0? Is this a suggestion for JIRA ticket? Or a question on how to solve it? If the later, you could probably stick a RegEx replacement in the UpdateRequestProcessor chain and be done with it. As to why? I would look for the rest of the MSWord-generated artifacts, such as smart quotes, extra-long dashes, etc. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 8 October 2014 09:59, Markus Jelsma markus.jel...@openindex.io wrote: Hi, For some crazy reason, some users somehow manage to substitute a perfectly normal space with a badly encoded non-breaking space, properly URL encoded this then becomes %c2a0 and depending on the encoding you use to view you probably see  followed by a space. For example: Because c2a0 is not considered whitespace (indeed, it is not real whitespace, that is 00a0) by the Java Character class, the WhitespaceTokenizer won't split on it, but the WordDelimiterFilter still does, somehow mitigating the problem as it becomes: HTMLSCF een abonnement WT een abonnement WDF een eenabonnement abonnement Should the WhitespaceTokenizer not include this weird edge case? Cheers, Markus
Re: solr suggester not working with shards
Hi, You have defined the suggester in the old way of implementing it but you do mention the SuggestComponent. Can you try it out using the documentation given here - https://cwiki.apache.org/confluence/display/solr/Suggester Secondly how are you firing your queries? On Wed, Oct 8, 2014 at 12:39 PM, rsi...@ambrac.nl rsi...@ambrac.nl wrote: One more thing : suggest is not working with multiple cores using shard but 'did you mean' (spell check ) is working fine with multiple cores. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-suggester-not-working-with-shards-tp4163261p4163265.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Varun Thacker http://www.vthacker.in/
Custom Solr Query Post Filter
Code: http://pastebin.com/tNjzDbmy Solr 4.9.0 Tomcat 7 Java 7 I took Erik Hatcher's example for creating a PostFilter and have modified it so it would work with Solr 4.x. Right now it works...the first time. If I were to run this query it would work right: http://localhost:8080/solr/plugintest/select?q=*:*sort=uniqueId%20descfq={!classif%20creds=ABC} However, if I ran this one: http://localhost:8080/solr/plugintest/select?q=*:*sort=uniqueId%20descfq={!classif%20creds=XYZ} I would get the results from the first query. I could do a different query, like: http://localhost:8080/solr/plugintest/select?q=uniqueId[* TO *]sort=uniqueId%20descfq={!classif%20creds=XYZ} and I'd get the XYZ tagged items. But if I tried to find ABC with that one: http://localhost:8080/solr/plugintest/select?q=uniqueId[* TO *]sort=uniqueId%20descfq={!classif%20creds=ABC} it would just list the XYZ items. I'm not sure what is persisting where to cause this to happen. Anybody have some tips/pointers for building filters like this for Solr 4.x? Thanks! -- Chris
Re: Custom Solr Query Post Filter
The results are being cached in the QueryResultCache most likely. You need to implement equals() and hashCode() on the query object, which is part of the cache key. In your case the creds param must be included in the hashCode and equals logic. Joel Bernstein Search Engineer at Heliosearch On Wed, Oct 8, 2014 at 1:17 PM, Christopher Gross cogr...@gmail.com wrote: Code: http://pastebin.com/tNjzDbmy Solr 4.9.0 Tomcat 7 Java 7 I took Erik Hatcher's example for creating a PostFilter and have modified it so it would work with Solr 4.x. Right now it works...the first time. If I were to run this query it would work right: http://localhost:8080/solr/plugintest/select?q=*:*sort=uniqueId%20descfq={!classif%20creds=ABC} However, if I ran this one: http://localhost:8080/solr/plugintest/select?q=*:*sort=uniqueId%20descfq={!classif%20creds=XYZ} I would get the results from the first query. I could do a different query, like: http://localhost:8080/solr/plugintest/select?q=uniqueId[* TO *]sort=uniqueId%20descfq={!classif%20creds=XYZ} and I'd get the XYZ tagged items. But if I tried to find ABC with that one: http://localhost:8080/solr/plugintest/select?q=uniqueId[* TO *]sort=uniqueId%20descfq={!classif%20creds=ABC} it would just list the XYZ items. I'm not sure what is persisting where to cause this to happen. Anybody have some tips/pointers for building filters like this for Solr 4.x? Thanks! -- Chris
Re: Custom Solr Query Post Filter
That did the trick! Thanks Joel. -- Chris On Wed, Oct 8, 2014 at 2:05 PM, Joel Bernstein joels...@gmail.com wrote: The results are being cached in the QueryResultCache most likely. You need to implement equals() and hashCode() on the query object, which is part of the cache key. In your case the creds param must be included in the hashCode and equals logic. Joel Bernstein Search Engineer at Heliosearch On Wed, Oct 8, 2014 at 1:17 PM, Christopher Gross cogr...@gmail.com wrote: Code: http://pastebin.com/tNjzDbmy Solr 4.9.0 Tomcat 7 Java 7 I took Erik Hatcher's example for creating a PostFilter and have modified it so it would work with Solr 4.x. Right now it works...the first time. If I were to run this query it would work right: http://localhost:8080/solr/plugintest/select?q=*:*sort=uniqueId%20descfq={!classif%20creds=ABC} However, if I ran this one: http://localhost:8080/solr/plugintest/select?q=*:*sort=uniqueId%20descfq={!classif%20creds=XYZ} I would get the results from the first query. I could do a different query, like: http://localhost:8080/solr/plugintest/select?q=uniqueId[* TO *]sort=uniqueId%20descfq={!classif%20creds=XYZ} and I'd get the XYZ tagged items. But if I tried to find ABC with that one: http://localhost:8080/solr/plugintest/select?q=uniqueId[* TO *]sort=uniqueId%20descfq={!classif%20creds=ABC} it would just list the XYZ items. I'm not sure what is persisting where to cause this to happen. Anybody have some tips/pointers for building filters like this for Solr 4.x? Thanks! -- Chris
Re: Using Velocity with Child Documents?
Velocity is just taking the Solr response and displaying selected bits in HTML. So assuming the information you want is in the reponse packet (which you can tell just by doing the query from the browser) it's just a matter of pulling it out of the response and displaying it. Mostly when I started down this path I poked around the velocity directory it was just a bit of hunt int to figure things out, with some help from the Apache Velocity page. Not much help, but the short form is there's much of an example that I know of for your specific problem. Erick On Wed, Oct 8, 2014 at 8:54 AM, Edwards, Joshua joshua.edwa...@capitalone.com wrote: Hi - I am trying to index a collection that has child documents. I have successfully loaded the data into my index using SolrJ, and I have verified that I can search correctly using the child of method in my fq variable. Now, I would like to use Velocity (Solritas) to display the parent records with some details of the child records underneath. Is there an easy way to do this? Is there an example somewhere that I can look at? Thanks, Josh Edwards The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.
Edismax parser and boosts
Hi, I use edismax query with q parameter set as below: q=foo^1.0+AND+bar For such a query for the same document I see different (lower) scoring value than for q=foo+AND+bar By default boost of term is 1 as far as i know so why the scoring differs? When I check debugQuery parameter in parsedQuery for foo^1.0+AND+bar I see Boolean query which one of clauses is a phrase query foo 1.0 bar. It seems that edismax parser takes whole q parameter as a phrase without removing boost value and add it as a boolean clause. Is it a bug or it should work like that? -- Paweł Róg
Re: Solr configuration, memory usage and MMapDirectory
On 10/8/2014 4:02 AM, Simon Fairey wrote: I'm currently setting up jconsole but as I have to remotely monitor (no gui capability on the server) I have to wait before I can restart solr with a JMX port setup. In the meantime I looked at top and given the calculations you said based on your top output and this top of my java process from the node that handles the querying, the indexing node has a similar memory profile: https://www.dropbox.com/s/pz85dm4e7qpepco/SolrTop.png?dl=0 It would seem I need a monstrously large heap in the 60GB region? We do use a lot of navigators/filters so I have set the caches to be quite large for these, are these what are using up the memory? With a VIRT size of 189GB and a RES size of 73GB, I believe you probably have more than 45GB of index data. This might be a combination of old indexes and the active index. Only the indexes (cores) that are being actively used need to be considered when trying to calculate the total RAM needed. Other indexes will not affect performance, even though they increase your virtual memory size. With MMap, part of the virtual memory size is the size of the index data that has been opened on the disk. This is not memory that's actually allocated. There's a very good reason that mmap has been the default in Lucene and Solr for more than two years. http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html You stated originally that you have 25 million document and 45GB of index data on each node. With those numbers and a conservative configuration, I would expect that you need about 4GB of heap, maybe as much as 8GB. I cannot think of any reason that you would NEED a heap 60GB or larger. Each field that you sort on, each field that you facet on with the default facet.method of fc, and each filter that you cache will use a large block of memory. The size of that block of memory is almost exclusively determined by the number of documents in the index. With 25 million documents, each filterCache entry will be approximately 3MB -- one bit for every document. I do not know how big each FieldCache entry is for a sort field and a facet field, but assume that they are probably larger than the 3MB entries on the filterCache. I've got a filterCache sized at 64, with an autowarmCount of 4. With larger autowarmCount values, I was seeing commits take 30 seconds or more, because each of those filters can take a few seconds to execute. Cache sizes in the thousands are rarely necessary, and just chew up a lot of memory with no benefit. Large autowarmCount values are also rarely necessary. Every time a new searcher is opened by a commit, add up all your autowarmCount values and realize that the searcher likely needs to execute that many queries before it is available. If you need to set up remote JMX so you can remotely connect jconsole, I have done this in the redhat init script I've built -- see JMX_OPTS here: http://wiki.apache.org/solr/ShawnHeisey#Init_script It's never a good idea to expose Solr directly to the internet, but if you use that JMX config, *definitely* don't expose it to the Internet. It doesn't use any authentication. We might need to back up a little bit and start with the problem that you are trying to figure out, not the numbers that are being reported. http://people.apache.org/~hossman/#xyproblem Your original note said that you're sanity checking. Toward that end, the only insane thing that jumps out at me is that your max heap is *VERY* large, and you probably don't have the proper GC tuning. My recommendations for initial action are to use -Xmx8g on the servlet container startup and include the GC settings you can find on the wiki pages I've given you. It would be a very good idea to set up remote JMX so you can use jconsole or jvisualvm remotely. Thanks, Shawn
Re: eDisMax parser and special characters
Sorry for a delayed reply here is more information - Schema that we are using - http://pastebin.com/WQAJCCph Request Handler in config - http://pastebin.com/Y0kP40WF Some analysis - Search term: red - Parser eDismax No results show up str name=parsedquery(+((DisjunctionMaxQuery((name_starts_with:red^9.0 | name_parts_starts_with:red^6.0 | s_detail:red | name:red^12.0 | s_detail_starts_with:red^3.0 | s_detail_parts_starts_with:red^2.0)) DisjunctionMaxQuery((name_starts_with:-^9.0 | s_detail_starts_with:-^3.0)))~2))/no_coord/str Search term: red - Parser dismax Results are returned str name=parsedquery(+DisjunctionMaxQuery((name_starts_with:red^9.0 | name_parts_starts_with:red^6.0 | s_detail:red | name:red^12.0 | s_detail_starts_with:red^3.0 | s_detail_parts_starts_with:red^2.0)) ())/no_coord/str Why do we see the variation in the results between dismax and eDismax? On Oct 8, 2014, at 8:59 AM, Erick Erickson erickerick...@gmail.commailto:erickerick...@gmail.com wrote: There's not much information here. What's the doc look like? What is the analyzer chain for it? What is the output when you add debug=query? Details matter. A lot ;) Best, Erick On Wed, Oct 8, 2014 at 6:26 AM, Michael Joyner mich...@newsrx.commailto:mich...@newsrx.com wrote: Try escaping special chars with a \ On 10/08/2014 01:39 AM, Lanke,Aniruddha wrote: We are using a eDisMax parser in our configuration. When we search using the query term that has a ‘-‘ we don’t get any results back. Search term: red - yellow This doesn’t return any data back but CONFIDENTIALITY NOTICE This message and any included attachments are from Cerner Corporation and are intended only for the addressee. The information contained in this message is confidential and may constitute inside or non-public information under international, federal, or state securities laws. Unauthorized forwarding, printing, copying, distribution, or use of such information is strictly prohibited and may be unlawful. If you are not the addressee, please promptly delete this message and notify the sender of the delivery error by e-mail or you may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
RE: Solr configuration, memory usage and MMapDirectory
Hi Thanks for this I will investigate further after reading a number of your points in more detail, I do have a feeling they've setup too many entries in the filter cache (1000s) so will revisit that. Just a note on numbers, those were valid when I made the post but obviously they change as the week progresses before a regular clean-up of content, current numbers for info (if it's at all relevant) from the index admin view on one of the 2 nodes is: Last Modified: 18 minutes ago Num Docs: 24590368 Max Doc:29139255 Deleted Docs: 4548887 Version:1297982 Segment Count: 28 Version Gen Size Master: 1412798583558 402364 52.98 GB Top: 2996 tomcat6 20 0 189g 73g 1.5g S 15 58.7 58034:04 java And the only GC option I can see that is on is - XX:+UseConcMarkSweepGC Regarding the XY problem, you are very likely correct, unfortunately I wasn't involved in the config and I very much suspect when it was done many of the defaults were used and then if it didn't work or there was say an out of memory error they just upped the heap to solve the symptom without investigating the cause. The luxury of having more than enough RAM I guess! I'm going to get some late night downtime soon at which point I'm hoping to change the heap size, GC settings and add the JMX, it's not exposed to the internet so no security is fine. Right off to do some reading! Cheers Si -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: 08 October 2014 21:09 To: solr-user@lucene.apache.org Subject: Re: Solr configuration, memory usage and MMapDirectory On 10/8/2014 4:02 AM, Simon Fairey wrote: I'm currently setting up jconsole but as I have to remotely monitor (no gui capability on the server) I have to wait before I can restart solr with a JMX port setup. In the meantime I looked at top and given the calculations you said based on your top output and this top of my java process from the node that handles the querying, the indexing node has a similar memory profile: https://www.dropbox.com/s/pz85dm4e7qpepco/SolrTop.png?dl=0 It would seem I need a monstrously large heap in the 60GB region? We do use a lot of navigators/filters so I have set the caches to be quite large for these, are these what are using up the memory? With a VIRT size of 189GB and a RES size of 73GB, I believe you probably have more than 45GB of index data. This might be a combination of old indexes and the active index. Only the indexes (cores) that are being actively used need to be considered when trying to calculate the total RAM needed. Other indexes will not affect performance, even though they increase your virtual memory size. With MMap, part of the virtual memory size is the size of the index data that has been opened on the disk. This is not memory that's actually allocated. There's a very good reason that mmap has been the default in Lucene and Solr for more than two years. http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html You stated originally that you have 25 million document and 45GB of index data on each node. With those numbers and a conservative configuration, I would expect that you need about 4GB of heap, maybe as much as 8GB. I cannot think of any reason that you would NEED a heap 60GB or larger. Each field that you sort on, each field that you facet on with the default facet.method of fc, and each filter that you cache will use a large block of memory. The size of that block of memory is almost exclusively determined by the number of documents in the index. With 25 million documents, each filterCache entry will be approximately 3MB -- one bit for every document. I do not know how big each FieldCache entry is for a sort field and a facet field, but assume that they are probably larger than the 3MB entries on the filterCache. I've got a filterCache sized at 64, with an autowarmCount of 4. With larger autowarmCount values, I was seeing commits take 30 seconds or more, because each of those filters can take a few seconds to execute. Cache sizes in the thousands are rarely necessary, and just chew up a lot of memory with no benefit. Large autowarmCount values are also rarely necessary. Every time a new searcher is opened by a commit, add up all your autowarmCount values and realize that the searcher likely needs to execute that many queries before it is available. If you need to set up remote JMX so you can remotely connect jconsole, I have done this in the redhat init script I've built -- see JMX_OPTS here: http://wiki.apache.org/solr/ShawnHeisey#Init_script It's never a good idea to expose Solr directly to the internet, but if you use that JMX config, *definitely* don't expose it to the Internet. It doesn't use any authentication. We might need to back up a little bit and start with the problem that you are trying to figure out, not the
Re: eDisMax parser and special characters
Hyphen is a prefix operator and is normally followed by a term to indicate that the term must not be present. So, your query has a syntax error. The two query parsers differ in how they handle various errors. In the case of edismax, it quotes operators and then tries again, so the hyphen gets quoted, and then analyzed to nothing for text fields but is still a string for string fields. -- Jack Krupansky -Original Message- From: Lanke,Aniruddha Sent: Wednesday, October 8, 2014 4:38 PM To: solr-user@lucene.apache.org Subject: Re: eDisMax parser and special characters Sorry for a delayed reply here is more information - Schema that we are using - http://pastebin.com/WQAJCCph Request Handler in config - http://pastebin.com/Y0kP40WF Some analysis - Search term: red - Parser eDismax No results show up str name=parsedquery(+((DisjunctionMaxQuery((name_starts_with:red^9.0 | name_parts_starts_with:red^6.0 | s_detail:red | name:red^12.0 | s_detail_starts_with:red^3.0 | s_detail_parts_starts_with:red^2.0)) DisjunctionMaxQuery((name_starts_with:-^9.0 | s_detail_starts_with:-^3.0)))~2))/no_coord/str Search term: red - Parser dismax Results are returned str name=parsedquery(+DisjunctionMaxQuery((name_starts_with:red^9.0 | name_parts_starts_with:red^6.0 | s_detail:red | name:red^12.0 | s_detail_starts_with:red^3.0 | s_detail_parts_starts_with:red^2.0)) ())/no_coord/str Why do we see the variation in the results between dismax and eDismax? On Oct 8, 2014, at 8:59 AM, Erick Erickson erickerick...@gmail.commailto:erickerick...@gmail.com wrote: There's not much information here. What's the doc look like? What is the analyzer chain for it? What is the output when you add debug=query? Details matter. A lot ;) Best, Erick On Wed, Oct 8, 2014 at 6:26 AM, Michael Joyner mich...@newsrx.commailto:mich...@newsrx.com wrote: Try escaping special chars with a \ On 10/08/2014 01:39 AM, Lanke,Aniruddha wrote: We are using a eDisMax parser in our configuration. When we search using the query term that has a ‘-‘ we don’t get any results back. Search term: red - yellow This doesn’t return any data back but CONFIDENTIALITY NOTICE This message and any included attachments are from Cerner Corporation and are intended only for the addressee. The information contained in this message is confidential and may constitute inside or non-public information under international, federal, or state securities laws. Unauthorized forwarding, printing, copying, distribution, or use of such information is strictly prohibited and may be unlawful. If you are not the addressee, please promptly delete this message and notify the sender of the delivery error by e-mail or you may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
Re: Edismax parser and boosts
Definitely sounds like a bug! File a Jira. Thanks for reporting this. What release of Solr? -- Jack Krupansky -Original Message- From: Pawel Rog Sent: Wednesday, October 8, 2014 3:57 PM To: solr-user@lucene.apache.org Subject: Edismax parser and boosts Hi, I use edismax query with q parameter set as below: q=foo^1.0+AND+bar For such a query for the same document I see different (lower) scoring value than for q=foo+AND+bar By default boost of term is 1 as far as i know so why the scoring differs? When I check debugQuery parameter in parsedQuery for foo^1.0+AND+bar I see Boolean query which one of clauses is a phrase query foo 1.0 bar. It seems that edismax parser takes whole q parameter as a phrase without removing boost value and add it as a boolean clause. Is it a bug or it should work like that? -- Paweł Róg
Re: Using Velocity with Child Documents?
: I am trying to index a collection that has child documents. I have : successfully loaded the data into my index using SolrJ, and I have : verified that I can search correctly using the child of method in my : fq variable. Now, I would like to use Velocity (Solritas) to display : the parent records with some details of the child records underneath. : Is there an easy way to do this? Is there an example somewhere that I : can look at? Step #1 is to forget about velocity and focus on getting the data you want about the children into the response. To do that you'll need to use the [child] DocTransformer... https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents ala... fl=id,[child parentFilter=doc_type:book childFilter=doc_type:chapter limit=100] If you are using this in conjunction with a block join query, you can use local params to eliminate some redundency... q=some_parent_field:foo parents=content_type:parentDoc fq={!parent which=$parents}child_field:bar fl=id,[child parentFilter=$parents childFilter=content_type:childDoc limit=100] Step #2: once you have the children in the response data, then you can use velocity to access each of the children of the docs that match your query via SolrDocument.getChildDocuments() -Hoss http://www.lucidworks.com/
Re: Best way to index wordpress blogs in solr
The LucidWorks product has builtin crawler support so you could crawl one or more web sites. http://lucidworks.com/product/fusion/ -- Jack Krupansky -Original Message- From: Vishal Sharma Sent: Tuesday, October 7, 2014 2:08 PM To: solr-user@lucene.apache.org Subject: Best way to index wordpress blogs in solr Hi, I am trying to get some help on finding out if there is any best practice to index wordpress blogs in solr index? Can someone help with architecture I shoudl be setting up? Do, I need to write separate scripts to crawl wordpress and then pump posts back to Solr using its API? *Vishal Sharma**TL, Grazitti Interactive*T: +1 650 641 1754 E: vish...@grazitti.com www.grazitti.com [image: Description: LinkedIn] http://www.linkedin.com/company/grazitti-interactive[image: Description: Twitter] https://twitter.com/grazitti[image: fbook] https://www.facebook.com/grazitti.interactive*dreamforce®*Oct 13-16, 2014 *Meet us at the Cloud Expo* Booth N2341 Moscone North, San Francisco Schedule a Meeting http://www.vcita.com/v/grazittiinteractive/online_scheduling#/schedule | Follow us https://twitter.com/grazittiZakCalendar Dreamforce® Featured App https://appexchange.salesforce.com/listingDetail?listingId=a0N300B5UPKEA3
Re: Custom Solr Query Post Filter
Also just took a quick look at the code. This will likely be a performance problem if you have a large result set: String classif = context.reader().document(docId).get(classification); Instead of using the stored field, you'll want to get the BytesRef for the field using either the FieldCache or DocValues. Recent releases of DocValues will likely be the fastest docID-BytesRef lookup. Joel Bernstein Search Engineer at Heliosearch On Wed, Oct 8, 2014 at 2:20 PM, Christopher Gross cogr...@gmail.com wrote: That did the trick! Thanks Joel. -- Chris On Wed, Oct 8, 2014 at 2:05 PM, Joel Bernstein joels...@gmail.com wrote: The results are being cached in the QueryResultCache most likely. You need to implement equals() and hashCode() on the query object, which is part of the cache key. In your case the creds param must be included in the hashCode and equals logic. Joel Bernstein Search Engineer at Heliosearch On Wed, Oct 8, 2014 at 1:17 PM, Christopher Gross cogr...@gmail.com wrote: Code: http://pastebin.com/tNjzDbmy Solr 4.9.0 Tomcat 7 Java 7 I took Erik Hatcher's example for creating a PostFilter and have modified it so it would work with Solr 4.x. Right now it works...the first time. If I were to run this query it would work right: http://localhost:8080/solr/plugintest/select?q=*:*sort=uniqueId%20descfq={!classif%20creds=ABC} However, if I ran this one: http://localhost:8080/solr/plugintest/select?q=*:*sort=uniqueId%20descfq={!classif%20creds=XYZ} I would get the results from the first query. I could do a different query, like: http://localhost:8080/solr/plugintest/select?q=uniqueId[* TO *]sort=uniqueId%20descfq={!classif%20creds=XYZ} and I'd get the XYZ tagged items. But if I tried to find ABC with that one: http://localhost:8080/solr/plugintest/select?q=uniqueId[* TO *]sort=uniqueId%20descfq={!classif%20creds=ABC} it would just list the XYZ items. I'm not sure what is persisting where to cause this to happen. Anybody have some tips/pointers for building filters like this for Solr 4.x? Thanks! -- Chris
Re: Add multiple JSON documents with boost
: i try to add documents to the index and boost them (hole document) but i : get this error message: : : ERROR org.apache.solr.core.SolrCore – : org.apache.solr.common.SolrException: Error parsing JSON field value. : Unexpected OBJECT_START : : Any ideas? The top level structure you are sending is a JSON array (because you start with [) which is how you tell solr you want to send a simple list of documents to add. In order to send explicit commands (like add) your top level JSON structure needs to be JSON Object (aka: Map), which contains add as a key. there are examples of this in the ref guide... https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-SendingArbitraryJSONUpdateCommands so basically, just take your list containing 2 objects that each have 1 key of add and replace it with a single obejct that has 2 add keys... : { : add: { : boost: 1, : doc: { : store_id: 1, : created_at: 2007-08-23T01:03:05Z, : sku: {boost: 10, value: n2610}, : status: 1, : tax_class_id_t: 2, : color_t: Black, : visibility: 4, : name: {boost: -60, value: Nokia 2610 Phone}, : url_key: nokia-2610-phone, : image: \/n\/o\/nokia-2610-phone-2.jpg, : small_image: \/n\/o\/nokia-2610-phone-2.jpg, : thumbnail: \/n\/o\/nokia-2610-phone-2.jpg, : msrp_enabled_t: 2, : msrp_display_actual_price_type_t: 4, : model_t: 2610, : dimension_t: 4.1 x 1.7 x 0.7 inches, : meta_keyword_t: Nokia 2610, cell, phone,, : short_description: The words \entry level\ no longer : mean \low-end,\ especially when it comes to the Nokia 2610. Offering : advanced media and calling features without breaking the bank, : price: 149.99, : in_stock: 1, : id: 16_1, : product_id: 16, : content_type: product, : attribute_set_id: 38, : type_id: simple, : has_options: 0, : required_options: 0, : entity_type_id: 10, : category: [ : 8, : 13 : ] : } : } , : add: { : boost: 1, : doc: { : store_id: 1, : created_at: 2007-08-23T03:40:26Z, : sku: {boost: 10, value: bb8100}, : color_t: Silver, : status: 1, : tax_class_id_t: 2, : visibility: 4, : name: {boost: -60, value: BlackBerry 8100 Pearl}, : url_key: blackberry-8100-pearl, : thumbnail: \/b\/l\/blackberry-8100-pearl-2.jpg, : small_image: \/b\/l\/blackberry-8100-pearl-2.jpg, : image: \/b\/l\/blackberry-8100-pearl-2.jpg, : model_t: 8100, : dimension_t: 4.2 x 2 x 0.6 inches, : meta_keyword_t: Blackberry, 8100, pearl, cell, phone, : short_description: The BlackBerry 8100 Pearl is a : departure from the form factor of previous BlackBerry devices. This : BlackBerry handset is far more phone-like, and RIM's engineers have managed : to fit a QWERTY keyboard onto the handset's slim frame., : price: 349.99, : in_stock: 1, : id: 17_1, : product_id: 17, : content_type: product, : attribute_set_id: 38, : type_id: simple, : has_options: 0, : required_options: 0, : entity_type_id: 10, : category: [ : 8, : 13 : ] : } : } : } -Hoss http://www.lucidworks.com/
Re: Having an issue with pivot faceting
: Subject: Having an issue with pivot faceting Ok - first off -- your example request doens't include any facet.pivot params, so you aren't using pivot faceting .. which makes me concerned that if you aren't using the feature you think you are, or don't understand the feature you are using. : I'm having an issue getting pivot faceting working as expected. I'm trying : to filter by a specific criteria, and then first facet by one of my document : attributes called item_generator, then facet those results into 2 sets each: : the first set is the count of documents satisfying that facet with : number_of_items_generated set to 0, the other set counting the documents : satisfying that facet with number_of_items_generated greater than 0. Seems second:: interval faceting is just a fancy, more efficient, way of using facet.query if your queries are always over ranges. there's nothing about interval faceting that is directly related to pivot faceting. third: there isn't currently any generic support for faceting by a field, and then facet those results by some other field/criteria. This is actively being worked on in issues like SOLR-6348 - but it doens't exist yet. fourth: because you ultimately have a specific citeria for how you want to divide the facets, something similar to the behavior you are asking is available using taged exclusions on facets https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-LocalParametersforFaceting ...the basic idea you could follow is that you send additional fq params for each of the 2 criteria you want to lump things into (number_of_items_generated0 and number_of_items_generated0) but you tag those filters so they can individuall be excluded from facets -- then you use facet.field on your item_generator field twice (with different keys) and in each case you exclude only one of those filters. Here's a similar example to what you describe using the sample data that comes with solr... http://localhost:8983/solr/select?rows=0debug=queryq=inStock:truefq={!tag=pricey}price:[100%20TO%20*]fq={!tag=cheap}price:[*%20TO%20100}facet=truefacet.field={!key=cheap_cats%20ex=pricey}catfacet.field={!key=pricey_cats%20ex=cheap}cat so cheap_cats gives you facet counts on the cat field but only for the cheap products (because it excludes the pricey fq) and pricey_cats gives you facet counts on the cat field for the pricey products by excluding the cheap fq. note however that the numFound is 0 -- this works fine for getting the facet counts you wnat, but you'd need a second query q/ the filters to get the main result set since (i'm pretty sure) it's not possible to use ex on the main query to exclude filters from affecting the main result set. -Hoss http://www.lucidworks.com/
Solr Index to Helio Search
When I try to simple copy index from native SOLR to Heliosearch, i get exception: Caused by: java.lang.IllegalArgumentException: A SPI class of type org.apache.lu cene.codecs.Codec with name 'Lucene410' does not exist. You need to add the corr esponding JAR file supporting this SPI to your classpath.The current classpath s upports the following names: [Lucene40, Lucene3x, Lucene41, Lucene42, Lucene45, Lucene46, Lucene49] Is there any proper way to add index from native SOLR to Heliosearch? The problem with native SOLR is that there are lot of OOM Exceptions (cause of large index). -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Index-to-Helio-Search-tp4163446.html Sent from the Solr - User mailing list archive at Nabble.com.