Re: PathHierarchyTokenizerFactory storage for pivot facets
Hi Stephan, If you want to populate some fields from Path info, UpdateRequestProcessorFactory is a better fit. You can use URLClassifyProcessor(Factory).java as an example. Ahmet On Monday, January 6, 2014 7:46 AM, Stephan Schubert m...@stephan-schubert.com wrote: I want to store the levels of an path/url in seperate fields to make use of pivot faceting. I thought about using the PathHierarchyTokenizerFactory for that. But how can I store the several levels of an url in seperate fields? Example: Doc1 - Path: a/b/c/d Doc2 - Path: f/g/h/i Document 1 should store the value of a in a field something like urllevel1, b in field urllevel2, c in urllevel 3 and so on. The same for document 2 like f in field urllevel1, g in urllevel2 and h in urllevel3. Is the PathHierarchyTokenizerFactory the right approach for that? I know the PathHierarchyTokenizerFactory splits the path up, but I don't know how I can store the several levels in the specific fields and set it up in the schema.xml.
Slowness of Solr search during the replication
Hi, I have configured Solr salve replication for every 1hr. During this time I am seeing my search is unresponsive. Any other architectural changes do we need to do to overcome this? I am giving some cache details which I have in my solrconfig.xml. filterCache class=solr.FastLRUCache size=512 initialSize=512 cleanupThread=true autowarmCount=0/ queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ fieldValueCache class=solr.FastLRUCache size=512 cleanupThread=true autowarmCount=128 showItems=32 / useFilterForSortedQuerytrue/useFilterForSortedQuery useColdSearcherfalse/useColdSearcher maxWarmingSearchers2/maxWarmingSearchers Any suggestions are appreciable. Regards, Siva http://smarttechie.org/ -- View this message in context: http://lucene.472066.n3.nabble.com/Slowness-of-Solr-search-during-the-replication-tp4109712.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Storing MYSQL DATETIME field in solr as String
I found a way to store MySQL DateTime as a string in Solr Here is the way in data-config.xml in the SQL query we could convert the date directly to char CAST(l.creation_date as char) as creation_date, CAST(l.modification_date as char) as modification_date, in schema.xml field name=creation_date type=string indexed=true stored=true multiValued=false default= / field name=modification_date type=string indexed=true stored=true multiValued=false default= / Output would be str name=creation_date2013-11-13 10:26:32/str str name=modification_date2013-11-13 10:26:32/str This is exactly what I was looking for. If you guys have any other wise please free to share. :-). Happy Solr!!! -- View this message in context: http://lucene.472066.n3.nabble.com/Storing-MYSQL-DATETIME-field-in-solr-as-String-tp4106836p4109720.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Exact match on KeywordTokenizer
Hi Chris, thanks for your reply and sorry for my poor explained question. Here are some examples of indexed data (fieldname:propertyType): Apartamento Padrão Casa Padrão Loft Terreno And some examples of the queries propertyType:Apartamento Padrão propertyType:apartamento-padrao propertyType:Loft propertyType:loft Using the analysis menu, I can see that the difference is in the double quotes I'm providing when I search and that are not indexed. How can I solve this? Thanks *--* *E conhecereis a verdade, e a verdade vos libertará. (João 8:32)* *andre.maldonado*@gmail.com andre.maldon...@gmail.com (11) 9112-4227 http://www.orkut.com.br/Main#Profile?uid=2397703412199036664 http://www.facebook.com/profile.php?id=10659376883 http://twitter.com/andremaldonado https://profiles.google.com/105605760943701739931 http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3 http://www.youtube.com/andremaldonado On Fri, Jan 3, 2014 at 9:15 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: Can you show us examples of the types of data you are indexing, and the types of queries you want to match? (as well as examples of queries you *don't* want to match) https://wiki.apache.org/solr/UsingMailingLists#Information_useful_for_searching_problems Best guess, based on your problem description, is that you are indexing text like Foo Bar and then searching for things like foOBaR and you want those to match. With your analyzer as it is, you will never get a match unless the client sending hte query string has already lowercased it, done any asciifolding needed, and always sends - instead of space characters. i suspect what you really want is to have index query analyzers that are the same (or at least better matches for exachother then what you have below)... : Hi, : : Is there a way to do an exact match search on a tokenized field? : : I have a scenario which i need a field to be indexed and searchable : regardless of the case or white spaces used. For this, I created a custom : field type with the following configuration: : : field name=propertyType type=customtype indexed=true stored=true / : : fieldType name=customtype class=solr.TextField : positionIncrementGap=100 : analyzer type=index : tokenizer class=solr.KeywordTokenizerFactory/ : filter class=solr.ASCIIFoldingFilterFactory/ : filter class=solr.LowerCaseFilterFactory/ : filter class=solr.PatternReplaceFilterFactory pattern= : replacement=-/ : /analyzer : analyzer type=query : tokenizer class=solr.KeywordTokenizerFactory/ : /analyzer : /fieldType : : Even using KeywordTokenizerFactory on both index and query, all my searchs : based on exact match stopped working. : : Is there a way to search exact match like a string field and at the same : time use customs tokenizers aplied to that field? : : Thank's in advance : : : *--* : *E conhecereis a verdade, e a verdade vos libertará. (João 8:32)* : : *andre.maldonado*@gmail.com andre.maldon...@gmail.com : (11) 9112-4227 : : http://www.orkut.com.br/Main#Profile?uid=2397703412199036664 : http://www.facebook.com/profile.php?id=10659376883 : http://twitter.com/andremaldonado : https://profiles.google.com/105605760943701739931 : http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3 : http://www.youtube.com/andremaldonado : -Hoss http://www.lucidworks.com/
Re: Solr -The connection has timed out
Browse to the following URL: http://docs.lucidworks.com/display/solr/IndexConfig+in+SolrConfig#IndexConfiginSolrConfig-IndexLocks 2013/12/31 Furkan KAMACI furkankam...@gmail.com Hi; Beside the other error lines did you realize that log lines: *java.net.BindException: Address already in use* Could you check that is there any other application that is using 8983 port? Thanks; Furkan KAMACI 2013/12/31 rakesh rakesh3...@yahoo.com Finally able to get the full log details ERROR - 2013-12-30 15:13:00.811; org.apache.solr.core.SolrCore; [collection1] Solr index directory '/ctgapps/apache-solr-4.6.0/solr-4.6.0/example/solr/collection1/data/index/' is locked. Throwing exception INFO - 2013-12-30 15:13:00.812; org.apache.solr.core.SolrCore; [collection1] CLOSING SolrCore org.apache.solr.core.SolrCore@de26e52 INFO - 2013-12-30 15:13:00.812; org.apache.solr.update.SolrCoreState; Closing SolrCoreState INFO - 2013-12-30 15:13:00.813; org.apache.solr.update.DefaultSolrCoreState; SolrCoreState ref count has reached 0 - closing IndexWriter INFO - 2013-12-30 15:13:00.813; org.apache.solr.core.SolrCore; [collection1] Closing main searcher on request. INFO - 2013-12-30 15:13:00.814; org.apache.solr.core.CachingDirectoryFactory; Closing NRTCachingDirectoryFactory - 2 directories currently being tracked INFO - 2013-12-30 15:13:00.814; org.apache.solr.core.CachingDirectoryFactory; looking to close /ctgapps/apache-solr-4.6.0/solr-4.6.0/example/solr/collection1/data/index [CachedDirrefCount=0;path=/ctgapps/apache-solr-4.6.0/solr-4.6.0/example/solr/collection1/data/index;done=false] INFO - 2013-12-30 15:13:00.814; org.apache.solr.core.CachingDirectoryFactory; Closing directory: /ctgapps/apache-solr-4.6.0/solr-4.6.0/example/solr/collection1/data/index INFO - 2013-12-30 15:13:00.815; org.apache.solr.core.CachingDirectoryFactory; looking to close /ctgapps/apache-solr-4.6.0/solr-4.6.0/example/solr/collection1/data [CachedDirrefCount=0;path=/ctgapps/apache-solr-4.6.0/solr-4.6.0/example/solr/collection1/data;done=false] INFO - 2013-12-30 15:13:00.815; org.apache.solr.core.CachingDirectoryFactory; Closing directory: /ctgapps/apache-solr-4.6.0/solr-4.6.0/example/solr/collection1/data ERROR - 2013-12-30 15:13:00.817; org.apache.solr.core.CoreContainer; Unable to create core: collection1 org.apache.solr.common.SolrException: Index locked for write for core collection1 at org.apache.solr.core.SolrCore.init(SolrCore.java:834) at org.apache.solr.core.SolrCore.init(SolrCore.java:625) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:557) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:592) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:271) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:263) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.lucene.store.LockObtainFailedException: Index locked for write for core collection1 at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:491) at org.apache.solr.core.SolrCore.init(SolrCore.java:755) ... 13 more ERROR - 2013-12-30 15:13:00.819; org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: Unable to create core: collection1 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:977) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:601) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:271) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:263) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.solr.common.SolrException: Index locked for write for core collection1 at org.apache.solr.core.SolrCore.init(SolrCore.java:834) at org.apache.solr.core.SolrCore.init(SolrCore.java:625) at
Re: Slowness of Solr search during the replication
On Mon, 2014-01-06 at 09:18 +0100, sivaprasad wrote: I have configured Solr salve replication for every 1hr. During this time I am seeing my search is unresponsive. Unresponsive as in waiting for the updated searcher to be ready or as in very slow while the replication is ongoing? - Toke Eskildsen
monitoring solr system
hi, we have a cluster consisting of 6 servers. 3 leaders and 3 replicas. The system must be alive and working 24X7. We would like to monitor the system for any troubles or problems that may occur and will demand our immediate support. Currently we are monitoring the servers, the zookeeper and the jetty processes, and query keep-alive. Are there any other monitoring you would recommend us? Are there any important log messages we should pay attention to? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/monitoring-solr-system-tp4109730.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Slowness of Solr search during the replication
The first thing I'd try would be to up the autowarm counts. Don't go overboard here, I'd suggest, say, 16 of so to start but it depends on your query mix. If that doesn't help, you need to add some more details. Some example queries would be a place to start. Best Erick On Jan 6, 2014 4:19 AM, sivaprasad sivaprasa...@echidnainc.com wrote: Hi, I have configured Solr salve replication for every 1hr. During this time I am seeing my search is unresponsive. Any other architectural changes do we need to do to overcome this? I am giving some cache details which I have in my solrconfig.xml. filterCache class=solr.FastLRUCache size=512 initialSize=512 cleanupThread=true autowarmCount=0/ queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ fieldValueCache class=solr.FastLRUCache size=512 cleanupThread=true autowarmCount=128 showItems=32 / useFilterForSortedQuerytrue/useFilterForSortedQuery useColdSearcherfalse/useColdSearcher maxWarmingSearchers2/maxWarmingSearchers Any suggestions are appreciable. Regards, Siva http://smarttechie.org/ -- View this message in context: http://lucene.472066.n3.nabble.com/Slowness-of-Solr-search-during-the-replication-tp4109712.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: monitoring solr system
Hi; You can check here: http://sematext.com/spm/ Thanks; Furkan KAMACI 2014/1/6 elmerfudd na...@012.net.il hi, we have a cluster consisting of 6 servers. 3 leaders and 3 replicas. The system must be alive and working 24X7. We would like to monitor the system for any troubles or problems that may occur and will demand our immediate support. Currently we are monitoring the servers, the zookeeper and the jetty processes, and query keep-alive. Are there any other monitoring you would recommend us? Are there any important log messages we should pay attention to? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/monitoring-solr-system-tp4109730.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Slowness of Solr search during the replication
Do we need to set the autowarmCount on Slave or master? As per the Solr WIKI, I found the below information. Solr4.0 autowarmCount can now be specified as a percentage (ie: 90%) which will be evaluated relative to the number of items in the existing cache. This can be an advantageous setting in an instance of Solr where you don't expect any search traffic (ie a master), but you want some caches so that if it does take on traffic it won't be too overloaded. Once the traffic dies down, subsequent commits will gradually decrease the number of items being warmed. Regards, Siva -- View this message in context: http://lucene.472066.n3.nabble.com/Slowness-of-Solr-search-during-the-replication-tp4109712p4109739.html Sent from the Solr - User mailing list archive at Nabble.com.
Branch/Java questions re: contributing code
1. Should we be using Java 6 or 7? The docs say 1.6 ( http://wiki.apache.org/solr/HowToContribute) but running 'ant test' on trunk/ yields: /lucene/common-build.xml:328: Minimum supported Java version is 1.7. I don't get that error with branch_4x/ which leads to my next question. 2. Should work toward 4.X be done on trunk/ or branch_4x/? It sounds like patches should be based on trunk then it gets ported as necessary. Thanks! Ryan
Error from SPLITSHARD that seems to be unrecoverable
I'm using Solr 4.6 with SolrCloud. I tried using the SPLITSHARD command and it threw a series of exceptions, which have put my SolrCloud in a weird state. Here is an image of my SolrCloud setup after a few tries at SPLITSHARD, all of which fail. http://imgur.com/CFXJKfb Here is the log output. http://pastebin.com/7uC5PQsa The notable exception claims that it can't read stopwords.txt, but the file is absolutely present locally at solr/conf/stopwords.txt, and it's present in zookeeper at /configs/config1/stopwords.txt (I checked with zkCli.cmd). Here is the notable exception stack trace: ERROR - 2013-12-20 20:18:24.231; org.apache.solr.core.CoreContainer; Unable to create core: collection1_shard3_1_replica1 java.lang.RuntimeException: java.io.IOException: Error opening /configs/config1/stopwords.txt at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:169) at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69) at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:254) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:590) at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:498) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:662) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source) Caused by: java.io.IOException: Error opening /configs/config1/stopwords.txt at org.apache.solr.cloud.ZkSolrResourceLoader.openResource(ZkSolrResourceLoader.java:83) at org.apache.lucene.analysis.util.AbstractAnalysisFactory.getLines(AbstractAnalysisFactory.java:255) at org.apache.lucene.analysis.util.AbstractAnalysisFactory.getWordSet(AbstractAnalysisFactory.java:243) at org.apache.lucene.analysis.core.StopFilterFactory.inform(StopFilterFactory.java:99) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:655) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:167) ... 35 more I have 2 questions that stem from this: Why is it happening and how can I solve it? It seems to be having trouble locating the config
Re: Branch/Java questions re: contributing code
On Mon, Jan 6, 2014 at 8:54 PM, Ryan Cutter ryancut...@gmail.com wrote: 1. Should we be using Java 6 or 7? The docs say 1.6 ( http://wiki.apache.org/solr/HowToContribute) but running 'ant test' on trunk/ yields: /lucene/common-build.xml:328: Minimum supported Java version is 1.7. I don't get that error with branch_4x/ which leads to my next question. branch_4x is on Java 6 and trunk is on Java 7. 2. Should work toward 4.X be done on trunk/ or branch_4x/? It sounds like patches should be based on trunk then it gets ported as necessary. Thanks! Ryan Yeah, you are right. Features are committed to trunk first and backported to branch_4x -- Regards, Shalin Shekhar Mangar.
RE: Branch/Java questions re: contributing code
Trunk (5.x) requires Java 1.7, 4.x still works with 1.6. Check the CHANGES.txt, you'll see it near the top. -Original message- From:Ryan Cutter ryancut...@gmail.com Sent: Monday 6th January 2014 16:27 To: solr-user@lucene.apache.org Subject: Branch/Java questions re: contributing code 1. Should we be using Java 6 or 7? The docs say 1.6 ( http://wiki.apache.org/solr/HowToContribute) but running 'ant test' on trunk/ yields: /lucene/common-build.xml:328: Minimum supported Java version is 1.7. I don't get that error with branch_4x/ which leads to my next question. 2. Should work toward 4.X be done on trunk/ or branch_4x/? It sounds like patches should be based on trunk then it gets ported as necessary. Thanks! Ryan
Re: need help on OpenNLP with Solr
Hi, Also i wanted know, Is it possible to integrate wordnet with this analyzer? I want to use wordnet as synonym expansion along with OpenNLP filters. What are the changes required in solr schema.xml and solrconfig.xml? Thanks in Advance On Mon, Jan 6, 2014 at 9:37 PM, rashi gandhi gandhirash...@gmail.comwrote: Hi, I have applied OpenNLP (LUCENE 2899.patch) patch to SOLR-4.5.1 for nlp searching and it is working fine. Also I have designed an analyzer for this: fieldType name=nlp_type class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.OpenNLPTokenizerFactory sentenceModel=opennlp/en-test-sent.bin tokenizerModel=opennlp/en-test-tokenizer.bin/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.OpenNLPFilterFactory posTaggerModel=opennlp/en-pos-maxent.bin/ filter class=solr.OpenNLPFilterFactory nerTaggerModels=opennlp/en-ner-person.bin/ filter class=solr.OpenNLPFilterFactory nerTaggerModels=opennlp/en-ner-location.bin/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.OpenNLPTokenizerFactory sentenceModel=opennlp/en-test-sent.bin tokenizerModel =opennlp/en-test-tokenizer.bin/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.OpenNLPFilterFactory posTaggerModel=opennlp/en-pos-maxent.bin/ filter class=solr.OpenNLPFilterFactory nerTaggerModels=opennlp/en-ner-person.bin/ filter class=solr.OpenNLPFilterFactory nerTaggerModels=opennlp/en-ner-location.bin/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory/ /analyzer /fieldType I am able to find that posTaggerModel is performing tagging in the phrases and add the payloads. ( but iam not able to analyze it) My Question is: Can i search a phrase giving high boost to NOUN then VERB ? For example: if iam searching sitting on blanket , so i want to give high boost to NOUN term first then VERB, that are tagged by OpenNLP. How can i use payloads for boosting? What are the changes required in schema.xml? Please provide me some pointers to move ahead Thanks in advance
need help on OpenNLP with Solr
Hi, I have applied OpenNLP (LUCENE 2899.patch) patch to SOLR-4.5.1 for nlp searching and it is working fine. Also I have designed an analyzer for this: fieldType name=nlp_type class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.OpenNLPTokenizerFactory sentenceModel=opennlp/en-test-sent.bin tokenizerModel=opennlp/en-test-tokenizer.bin/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.OpenNLPFilterFactory posTaggerModel=opennlp/en-pos-maxent.bin/ filter class=solr.OpenNLPFilterFactory nerTaggerModels=opennlp/en-ner-person.bin/ filter class=solr.OpenNLPFilterFactory nerTaggerModels=opennlp/en-ner-location.bin/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.OpenNLPTokenizerFactory sentenceModel=opennlp/en-test-sent.bin tokenizerModel =opennlp/en-test-tokenizer.bin/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.OpenNLPFilterFactory posTaggerModel=opennlp/en-pos-maxent.bin/ filter class=solr.OpenNLPFilterFactory nerTaggerModels=opennlp/en-ner-person.bin/ filter class=solr.OpenNLPFilterFactory nerTaggerModels=opennlp/en-ner-location.bin/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory/ /analyzer /fieldType I am able to find that posTaggerModel is performing tagging in the phrases and add the payloads. ( but iam not able to analyze it) My Question is: Can i search a phrase giving high boost to NOUN then VERB ? For example: if iam searching sitting on blanket , so i want to give high boost to NOUN term first then VERB, that are tagged by OpenNLP. How can i use payloads for boosting? What are the changes required in schema.xml? Please provide me some pointers to move ahead Thanks in advance
Index for csv-file created successfully, but no data is shown
Dear solr users, I would appreciate if someone can help me out here. My goal is to index a csv-file. First of all, I am using the CDH 5 beta distribution of Hadoop, which includes solr 4.4.0, on a single node. I am following the hue tutorial to index and search the data from the yelp dataset challenge http://gethue.tumblr.com/post/65969470780/hadoop-tutorials-season-ii-7-how-to-index-and-search. Following the tutorial, I have uploaded the config files, including the prepared schema.xml, to zookeeper via the solrctl-command: solrctl instancedir --create reviews [path to conf] After this, I have created the collection via: solrctl collection --create reviews -s 1 This works fine, as I can see the collection created in the Solr Admin Web UI and the instancedir in the zookeeper shell. Then, using the MapReduceIndexerTool and the provided morphline file the index is created and uploaded to solr. According to the command output the index was created successfully: 1481 [main] INFO org.apache.solr.hadoop.MapReduceIndexerTool - Indexing 1 files using 1 real mappers into 1 reducers 52716 [main] INFO org.apache.solr.hadoop.MapReduceIndexerTool - Done. Indexing 1 files using 1 real mappers into 1 reducers took 51.233 secs 52774 [main] INFO org.apache.solr.hadoop.GoLive - Live merging of output shards into Solr cluster... 52829 [pool-4-thread-1] INFO org.apache.solr.hadoop.GoLive - Live merge hdfs://svr-hdp01:8020/tmp/load/results/part-0 into http://SVR-HDP01:8983/solr 53017 [pool-4-thread-1] INFO org.apache.solr.client.solrj.impl.HttpClientUtil - Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false 53495 [main] INFO org.apache.solr.hadoop.GoLive - Committing live merge... 53496 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil - Creating new http client, config: 53512 [main] INFO org.apache.solr.common.cloud.ConnectionManager - Waiting for client to connect to ZooKeeper 53513 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@19014023 name:ZooKeeperConnection Watcher:SVR-HDP01:2181/solr got event WatchedEvent state:SyncConnected type:None path:null path:null type:None 53513 [main] INFO org.apache.solr.common.cloud.ConnectionManager - Client is connected to ZooKeeper 53514 [main] INFO org.apache.solr.common.cloud.ZkStateReader - Updating cluster state from ZooKeeper... 53652 [main] INFO org.apache.solr.hadoop.GoLive - Done committing live merge 53652 [main] INFO org.apache.solr.hadoop.GoLive - Live merging of index shards into Solr cluster took 0.878 secs 53652 [main] INFO org.apache.solr.hadoop.GoLive - Live merging completed successfully 53652 [main] INFO org.apache.solr.hadoop.MapReduceIndexerTool - Succeeded with job: jobName: org.apache.solr.hadoop.MapReduceIndexerTool/MorphlineMapper, jobId: job_1388405934175_0013 53653 [main] INFO org.apache.solr.hadoop.MapReduceIndexerTool - Success. Done. Program took 53.719 secs. Goodbye. Now, when I go to the web UI and select the created core, I find the core to be empty, there are 0 number of Docs and querying it bears no result. My question is, if I have to upload the csv-file manually to somewhere on the solr server as it seems as if the csv-file was parsed and indexed successfully, but the data is missing that was indexed. I hope, the description of the problem was clear enough. Thanks a lot! Kind regards __ initions AG Chi-Hao Huynh Weidestraße 120a D-22081 Hamburg t: +49 (0) 40 / 41 49 60-62 f: +49 (0) 40 / 41 49 60-11 e: hu...@initios.commailto:hu...@initios.com w: www.initions.comhttp://www.initions.com Vollständiger Name der Gesellschaft: initions innovative IT solutions AG Sitz der Gesellschaft: Hamburg Handelsregister Hamburg B 83929 Aufsichtsratsvorsitzender: Dr. Michael Leue Vorstand: Dr. Stefan Anschütz, André Paul Henkel, Dr. Helge Plehn
Re: How to boost documents ?
Hi, I tried to isolate the problem, so I tested the following query on solr-4.6.0 : http://localhost:8983/solr/collection1/select?q=ipod belkinwt=xmldebugQuery=trueq.op=ANDdefType=edismaxbf=map(query($qq),0,0,0,100.0)qq={!edismax}power The error is : org.apache.solr.search.SyntaxError: Infinite Recursion detected parsing query 'power' And the stacktrace : ERROR - 2014-01-06 18:27:02.275; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: Infinite Recursion detected parsing query 'power' at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:171) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.solr.search.SyntaxError: Infinite Recursion detected parsing query 'power' at org.apache.solr.search.QParser.checkRecurse(QParser.java:178) at org.apache.solr.search.QParser.subQuery(QParser.java:200) at org.apache.solr.search.ExtendedDismaxQParser.getBoostFunctions(ExtendedDismaxQParser.java:437) at org.apache.solr.search.ExtendedDismaxQParser.parse(ExtendedDismaxQParser.java:175) at org.apache.solr.search.QParser.getQuery(QParser.java:142) at org.apache.solr.search.FunctionQParser.parseNestedQuery(FunctionQParser.java:236) at org.apache.solr.search.ValueSourceParser$19.parse(ValueSourceParser.java:270) at org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:352) at org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:223) at org.apache.solr.search.ValueSourceParser$13.parse(ValueSourceParser.java:198) at org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:352) at org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:68) at org.apache.solr.search.QParser.getQuery(QParser.java:142) at org.apache.solr.search.ExtendedDismaxQParser.getBoostFunctions(ExtendedDismaxQParser.java:437) at org.apache.solr.search.ExtendedDismaxQParser.parse(ExtendedDismaxQParser.java:175) at org.apache.solr.search.QParser.getQuery(QParser.java:142) at
SOLR Security - Displaying endpoints to public
Hi, We are currently showing the SOLR endpoints to the public when using our application (public users would be able to view the SOLR endpoints (/select) and the query in debugging console). I am trying to figure out if there is any security threat in terms of displaying the endpoints directly in internet. We have disabled the update handler in production so I assume writes / updates are not possible. The below URL mentions a point 'Solr does not concern itself with security either at the document level or the communication level. It is strongly recommended that the application server containing Solr be firewalled such the only clients with access to Solr are your own.' Is the above statement true even if we just display the read-only endpoints to the public users? Can someone please advise? http://wiki.apache.org/solr/SolrSecurity -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Security-Displaying-endpoints-to-public-tp4109792.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: adding wild card at the end of the text and search(like sql like search)
by using Q command and passing query parameter defType=unorderedcomplexphrase it worked for me. http://localhost:8999/solr/MACSearch/select?q=LAST_NAM%3A%22DE+PAR*%22%0Awt=xmlindent=truedefType=unorderedcomplexphrase Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/adding-wild-card-at-the-end-of-the-text-and-search-like-sql-like-search-tp4108399p4109796.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: delta-import giving Total Documents Processed = 0
I think issue was with deltaImportQuery, it is case sensitive. I was using '${dataimporter.delta.clai_idn}' instead of '${dataimporter.delta.CLAI_IDN}' field column=CLAI_IDN name=CLAI_IDN / -- View this message in context: http://lucene.472066.n3.nabble.com/delta-import-giving-Total-Documents-Processed-0-tp4089118p4109798.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR Security - Displaying endpoints to public
On 1/6/2014 10:55 AM, Developer wrote: We are currently showing the SOLR endpoints to the public when using our application (public users would be able to view the SOLR endpoints (/select) and the query in debugging console). I am trying to figure out if there is any security threat in terms of displaying the endpoints directly in internet. We have disabled the update handler in production so I assume writes / updates are not possible. The below URL mentions a point 'Solr does not concern itself with security either at the document level or the communication level. It is strongly recommended that the application server containing Solr be firewalled such the only clients with access to Solr are your own.' Is the above statement true even if we just display the read-only endpoints to the public users? Can someone please advise? Without an application between the public and Solr that sanitizes user input, an attacker can send denial of service queries to your Solr instance that will cause it to spin so hard it can't serve regular queries. We can't block such things in server code, because sometimes such queries *are* legitimate, they just take a lot of resources and time to complete. Even if you disable admin handlers so that it's impossible to gather full information about your schema and other settings, generating legitimate queries is probably enough for an attacker to get the information they need. If your design is such that client-side scripting handles almost everything, you probably need to set up a proxy in front of Solr that's configured to deny things that look suspicious. I do not know of any publicly available proxy configurations like this, and I have never come across any private ones either. Thanks, Shawn
Re: Function query matching
: The bottom line for Peter is still the same: using scale() wrapped arround : a function/query does involve a computing hte results for every document, : and that is going to scale linearly as the size of hte index grows -- but : it it is *only* because of the scale function. Another problem with this approach is that the scale() function will likely generate incorrect values because it occurs before any filters. If the filters drop high scoring docs, the scaled values will never include the 'maxTarget' value (and may not include the 'minTarget' value, either). Peter On Sat, Dec 7, 2013 at 2:30 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: (This is why i shouldn't send emails just before going to bed.) I woke up this morning realizing that of course I was completley wrong when i said this... : I want to be clear for 99% of the people reading this, if you find : yourself writting a query structure like this... : : q={!func}..functions involving wrapping $qq ... ... : ...Try to restructure the match you want to do into the form of a : multiplier ... : Because the later case is much more efficient and Solr will only compute : the function values for hte docs it needs to (that match the wrapped $qq : query) The reason i was wrong... Even though function queries do by default match all documents, and even if the main query is a function query (ie: q={!func}...), if there is an fq that filters down the set of documents, then the (main) function query will only be calculated for the documents that match the filter. It was trivial to ammend the test i mentioned last night to show this (and i feel silly for not doing that last night and stoping myself from saying something foolish)... https://svn.apache.org/viewvc?view=revisionrevision=r1548955 The bottom line for Peter is still the same: using scale() wrapped arround a function/query does involve a computing hte results for every document, and that is going to scale linearly as the size of hte index grows -- but it it is *only* because of the scale function. -Hoss http://www.lucidworks.com/
MergePolicy for append-only indices?
Hi, (cross-posting to both Solr and Lucene user lists because while this is a Lucene-level question, I suspect a lot of people who know about this or are interested in this subject are actually on the Solr list) I have a large append-only index and I looked at merge policies hoping to identify one that is naturally more suitable for indices without any updates and deletions, just adds. I've read http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/index/TieredMergePolicy.htmland the javadocs for its cousins, but it doesn't look like any of them is more suited for append-only index than the other ones and Tiered MP having more knobs is probably the best one to use. I was wondering if I was missing something, if one of the MPs is in fact better for append-only indices OR if one can suggest how one could write a custom MP that's specialized for append-only indices. Thanks, Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/
Re: SOLR Security - Displaying endpoints to public
On 1/6/2014 11:18 AM, Shawn Heisey wrote: Even if you disable admin handlers so that it's impossible to gather full information about your schema and other settings, generating legitimate queries is probably enough for an attacker to get the information they need. Self-replying on this point: If you *don't* disable admin handlers, an attacker would also be able to simply unload the core and ask Solr to delete it from disk. A side effect of disabling admin handlers is that the admin UI won't work either. In terms of security hardening, that's a good thing ... but it makes it *very* difficult to gather useful information about your installation's health. Thanks, Shawn
Re: MergePolicy for append-only indices?
On 1/6/2014 11:24 AM, Otis Gospodnetic wrote: (cross-posting to both Solr and Lucene user lists because while this is a Lucene-level question, I suspect a lot of people who know about this or are interested in this subject are actually on the Solr list) I have a large append-only index and I looked at merge policies hoping to identify one that is naturally more suitable for indices without any updates and deletions, just adds. I've read http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/index/TieredMergePolicy.htmland the javadocs for its cousins, but it doesn't look like any of them is more suited for append-only index than the other ones and Tiered MP having more knobs is probably the best one to use. I was wondering if I was missing something, if one of the MPs is in fact better for append-only indices OR if one can suggest how one could write a custom MP that's specialized for append-only indices. The Tiered policy was made default for Solr back in the 3.x days. Defaults in both Solr and Lucene don't normally change without some serious thought about the repercussions. As for what's best for different kinds of indexes (add-only vs update/delete) ... unless there are *enormous* numbers of deletions (whether from updates or pure delete requests), I don't think that affects the decision very much. The Tiered policy seems like it's probably the best choice either way. I assume you've seen the following blog post? http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html Thanks, Shawn
Re: SOLR Security - Displaying endpoints to public
On 06 Jan 2014, at 19:37 , Shawn Heisey s...@elyograg.org wrote: On 1/6/2014 11:18 AM, Shawn Heisey wrote: Even if you disable admin handlers so that it's impossible to gather full information about your schema and other settings, generating legitimate queries is probably enough for an attacker to get the information they need. Self-replying on this point: If you *don't* disable admin handlers, an attacker would also be able to simply unload the core and ask Solr to delete it from disk. A side effect of disabling admin handlers is that the admin UI won't work either. In terms of security hardening, that's a good thing ... but it makes it *very* difficult to gather useful information about your installation's health. If you want to apply some sort of access restrictions on the content, you will need a mechanism to identify the user and add parameters to restrict the result set. You will also need to stop the user from circumventing this mechanism, which basically means that the raw Solr endpoints must not be accessible to the user.
DateField - Invalid JSON String Exception - converting Query Response to JSON Object
Hi, Wish You All a Very Happy New Year. We have index where date field have default value as 'NOW'. We are using solrj to query solr and when we try to convert query response(response.getResponse) to JSON object in java. The JSON API(org.json) throws 'invalid json string' exception. API say so because date field value i.e. -mm-ddThh:mm:ssZ is not surrounded by double inverted commas( ). So It says required , or } character when API see the colon. Could you please help me to retrieve the date field value as string in JSON response. Or any pointers. Any help would be highly appreciable.
Re: Slowness of Solr search during the replication
Hello Siva, Do you have an idea what make them freeze? Ideally you might be able to take a thread-dump at the moment of freeze, if you can. Also, check SolrIndexSearcher debug logs for autowarming timing. What about specifying few heaviest query in newSearcher listener https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig#QuerySettingsinSolrConfig-Query-RelatedListeners? +1 for bumping auto-warming on slaves. On Mon, Jan 6, 2014 at 4:34 PM, sivaprasad sivaprasa...@echidnainc.comwrote: Do we need to set the autowarmCount on Slave or master? As per the Solr WIKI, I found the below information. Solr4.0 autowarmCount can now be specified as a percentage (ie: 90%) which will be evaluated relative to the number of items in the existing cache. This can be an advantageous setting in an instance of Solr where you don't expect any search traffic (ie a master), but you want some caches so that if it does take on traffic it won't be too overloaded. Once the traffic dies down, subsequent commits will gradually decrease the number of items being warmed. Regards, Siva -- View this message in context: http://lucene.472066.n3.nabble.com/Slowness-of-Solr-search-during-the-replication-tp4109712p4109739.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: MergePolicy for append-only indices?
I think the key optimization when there are no deletions is that you don't need to renumber documents and can bulk-copy blocks of contiguous documents, and that is independent of merge policy. I think :) -Mike On 01/06/2014 01:54 PM, Shawn Heisey wrote: On 1/6/2014 11:24 AM, Otis Gospodnetic wrote: (cross-posting to both Solr and Lucene user lists because while this is a Lucene-level question, I suspect a lot of people who know about this or are interested in this subject are actually on the Solr list) I have a large append-only index and I looked at merge policies hoping to identify one that is naturally more suitable for indices without any updates and deletions, just adds. I've read http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/index/TieredMergePolicy.htmland the javadocs for its cousins, but it doesn't look like any of them is more suited for append-only index than the other ones and Tiered MP having more knobs is probably the best one to use. I was wondering if I was missing something, if one of the MPs is in fact better for append-only indices OR if one can suggest how one could write a custom MP that's specialized for append-only indices. The Tiered policy was made default for Solr back in the 3.x days. Defaults in both Solr and Lucene don't normally change without some serious thought about the repercussions. As for what's best for different kinds of indexes (add-only vs update/delete) ... unless there are *enormous* numbers of deletions (whether from updates or pure delete requests), I don't think that affects the decision very much. The Tiered policy seems like it's probably the best choice either way. I assume you've seen the following blog post? http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html Thanks, Shawn
Seemingly arbitrary error on document adds to SolrCloud - Server Error request: http://10.0.0.5:8443/solr/collection1/update?update.distrib=TOLEADERdistrib.from=...
I'm adding dozens of documents every few minutes to a SolrCloud instance with 3 machines and ~ 25 million documents. I'm starting to see issues where adds are throwing these ugly errors that seem to indicate there might be some issues with the nodes communicating to one another. My posts are of the following form, but with about 30 fields rather than just 1: add doc field name=id112370241/field /doc /add And here is the error that Solr is throwing: null:org.apache.solr.common.SolrException: Server Error request: http://10.0.0.5:8443/solr/collection1/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F10.0.0.229%3A8443%2Fsolr%2Fcollection1%2Fwt=javabinversion=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:240) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) What is the source of these errors, and how can I resolve them? -- View this message in context: http://lucene.472066.n3.nabble.com/Seemingly-arbitrary-error-on-document-adds-to-SolrCloud-Server-Error-request-http-10-0-0-5-8443-solr-tp4109864.html Sent from the Solr - User mailing list archive at Nabble.com.
Setting max number of connections
I am trying to increase the max number of connections allowed for query with Solr Cloud. I have searched around and found mention that: max number of connections is 128 max number of connections per host is 32 I start solr in the example directory with some options, but basically it is just: java -jar start.jar How can I increase the two values above? Is there some config file that needs changing? While I wait to see what recommendations the community has to offer I am experimenting with the following that I read on the SolrConfigXml wiki: requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str str name=dftext/str /lst !-- other params go here -- shardHandlerFactory class=HttpShardHandlerFactory int name=socketTimeOut1000/int int name=connTimeOut5000/int int name=maxConnectionsPerHost512/int /shardHandlerFactory /requestHandler I added the echo params and df as text because there was some error that faceting query threw. Any help is appreciated. I have been trying to apply load to our solr system and I can't get the CPU on the boxes to hardly budge. I noticed that during our tests when we reached 128 users the throughput flat lined at that point, and so I searched and sure enough I found the 128 connection limit mentioned. Thank you.
Re: SOLR Security - Displaying endpoints to public
Apache url_rewrite can help with this and it's only a few minutes to set up. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, Jan 6, 2014 at 12:55 PM, Developer bbar...@gmail.com wrote: Hi, We are currently showing the SOLR endpoints to the public when using our application (public users would be able to view the SOLR endpoints (/select) and the query in debugging console). I am trying to figure out if there is any security threat in terms of displaying the endpoints directly in internet. We have disabled the update handler in production so I assume writes / updates are not possible. The below URL mentions a point 'Solr does not concern itself with security either at the document level or the communication level. It is strongly recommended that the application server containing Solr be firewalled such the only clients with access to Solr are your own.' Is the above statement true even if we just display the read-only endpoints to the public users? Can someone please advise? http://wiki.apache.org/solr/SolrSecurity -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Security-Displaying-endpoints-to-public-tp4109792.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Index for csv-file created successfully, but no data is shown
Hi, This may be a better question for the Cloudera Search mailing list. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, Jan 6, 2014 at 11:06 AM, Huynh, Chi-Hao hu...@initions.com wrote: Dear solr users, I would appreciate if someone can help me out here. My goal is to index a csv-file. First of all, I am using the CDH 5 beta distribution of Hadoop, which includes solr 4.4.0, on a single node. I am following the hue tutorial to index and search the data from the yelp dataset challenge http://gethue.tumblr.com/post/65969470780/hadoop-tutorials-season-ii-7-how-to-index-and-search . Following the tutorial, I have uploaded the config files, including the prepared schema.xml, to zookeeper via the solrctl-command: solrctl instancedir --create reviews [path to conf] After this, I have created the collection via: solrctl collection --create reviews -s 1 This works fine, as I can see the collection created in the Solr Admin Web UI and the instancedir in the zookeeper shell. Then, using the MapReduceIndexerTool and the provided morphline file the index is created and uploaded to solr. According to the command output the index was created successfully: 1481 [main] INFO org.apache.solr.hadoop.MapReduceIndexerTool - Indexing 1 files using 1 real mappers into 1 reducers 52716 [main] INFO org.apache.solr.hadoop.MapReduceIndexerTool - Done. Indexing 1 files using 1 real mappers into 1 reducers took 51.233 secs 52774 [main] INFO org.apache.solr.hadoop.GoLive - Live merging of output shards into Solr cluster... 52829 [pool-4-thread-1] INFO org.apache.solr.hadoop.GoLive - Live merge hdfs://svr-hdp01:8020/tmp/load/results/part-0 into http://SVR-HDP01:8983/solr 53017 [pool-4-thread-1] INFO org.apache.solr.client.solrj.impl.HttpClientUtil - Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false 53495 [main] INFO org.apache.solr.hadoop.GoLive - Committing live merge... 53496 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil - Creating new http client, config: 53512 [main] INFO org.apache.solr.common.cloud.ConnectionManager - Waiting for client to connect to ZooKeeper 53513 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@19014023name:ZooKeeperConnection Watcher:SVR-HDP01:2181/solr got event WatchedEvent state:SyncConnected type:None path:null path:null type:None 53513 [main] INFO org.apache.solr.common.cloud.ConnectionManager - Client is connected to ZooKeeper 53514 [main] INFO org.apache.solr.common.cloud.ZkStateReader - Updating cluster state from ZooKeeper... 53652 [main] INFO org.apache.solr.hadoop.GoLive - Done committing live merge 53652 [main] INFO org.apache.solr.hadoop.GoLive - Live merging of index shards into Solr cluster took 0.878 secs 53652 [main] INFO org.apache.solr.hadoop.GoLive - Live merging completed successfully 53652 [main] INFO org.apache.solr.hadoop.MapReduceIndexerTool - Succeeded with job: jobName: org.apache.solr.hadoop.MapReduceIndexerTool/MorphlineMapper, jobId: job_1388405934175_0013 53653 [main] INFO org.apache.solr.hadoop.MapReduceIndexerTool - Success. Done. Program took 53.719 secs. Goodbye. Now, when I go to the web UI and select the created core, I find the core to be empty, there are 0 number of Docs and querying it bears no result. My question is, if I have to upload the csv-file manually to somewhere on the solr server as it seems as if the csv-file was parsed and indexed successfully, but the data is missing that was indexed. I hope, the description of the problem was clear enough. Thanks a lot! Kind regards __ initions AG Chi-Hao Huynh Weidestraße 120a D-22081 Hamburg t: +49 (0) 40 / 41 49 60-62 f: +49 (0) 40 / 41 49 60-11 e: hu...@initios.commailto:hu...@initios.com w: www.initions.comhttp://www.initions.com Vollständiger Name der Gesellschaft: initions innovative IT solutions AG Sitz der Gesellschaft: Hamburg Handelsregister Hamburg B 83929 Aufsichtsratsvorsitzender: Dr. Michael Leue Vorstand: Dr. Stefan Anschütz, André Paul Henkel, Dr. Helge Plehn
Re: Branch/Java questions re: contributing code
Thanks, everything worked fine after these pointers and I was able to generate a patch properly. Cheers, Ryan On Mon, Jan 6, 2014 at 7:31 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Mon, Jan 6, 2014 at 8:54 PM, Ryan Cutter ryancut...@gmail.com wrote: 1. Should we be using Java 6 or 7? The docs say 1.6 ( http://wiki.apache.org/solr/HowToContribute) but running 'ant test' on trunk/ yields: /lucene/common-build.xml:328: Minimum supported Java version is 1.7. I don't get that error with branch_4x/ which leads to my next question. branch_4x is on Java 6 and trunk is on Java 7. 2. Should work toward 4.X be done on trunk/ or branch_4x/? It sounds like patches should be based on trunk then it gets ported as necessary. Thanks! Ryan Yeah, you are right. Features are committed to trunk first and backported to branch_4x -- Regards, Shalin Shekhar Mangar.
Re: DateField - Invalid JSON String Exception - converting Query Response to JSON Object
Hi, We have index where date field have default value as 'NOW'. We are using solrj to query solr and when we try to convert query response(response.getResponse) to JSON object in java. The JSON API(org.json) throws 'invalid json string' exception. API say so because date field value i.e. -mm-ddThh:mm:ssZ is not surrounded by double inverted commas( ). So It says required , or } character when API see the colon. Could you please help me to retrieve the date field value as string in JSON response. Or any pointers. Any help would be highly appreciable. On Tue, Jan 7, 2014 at 12:28 AM, Amit Jha shanuu@gmail.com wrote: Hi, Wish You All a Very Happy New Year. We have index where date field have default value as 'NOW'. We are using solrj to query solr and when we try to convert query response(response.getResponse) to JSON object in java. The JSON API(org.json) throws 'invalid json string' exception. API say so because date field value i.e. -mm-ddThh:mm:ssZ is not surrounded by double inverted commas( ). So It says required , or } character when API see the colon. Could you please help me to retrieve the date field value as string in JSON response. Or any pointers. Any help would be highly appreciable.
Re: DateField - Invalid JSON String Exception - converting Query Response to JSON Object
Hi Amit, If you want json response, Why don't you use wt=json? Ahmet On Tuesday, January 7, 2014 7:34 AM, Amit Jha shanuu@gmail.com wrote: Hi, We have index where date field have default value as 'NOW'. We are using solrj to query solr and when we try to convert query response(response.getResponse) to JSON object in java. The JSON API(org.json) throws 'invalid json string' exception. API say so because date field value i.e. -mm-ddThh:mm:ssZ is not surrounded by double inverted commas( ). So It says required , or } character when API see the colon. Could you please help me to retrieve the date field value as string in JSON response. Or any pointers. Any help would be highly appreciable. On Tue, Jan 7, 2014 at 12:28 AM, Amit Jha shanuu@gmail.com wrote: Hi, Wish You All a Very Happy New Year. We have index where date field have default value as 'NOW'. We are using solrj to query solr and when we try to convert query response(response.getResponse) to JSON object in java. The JSON API(org.json) throws 'invalid json string' exception. API say so because date field value i.e. -mm-ddThh:mm:ssZ is not surrounded by double inverted commas( ). So It says required , or } character when API see the colon. Could you please help me to retrieve the date field value as string in JSON response. Or any pointers. Any help would be highly appreciable.
pagination with grouping
Hi, i am using group.query like below, group=true group.query=_query_:{!frange l=0 u=10 v=$score} group.query=_query_:{!frange l=10 u=20 v=$score} group.query=_query_:{!frange l=20 u=30 v=$score} here I want to restrict the overall record counts, start=0rows=10 is not working here. within the group level we can do this using offset=0group.limit=10. for egI want only 10 records, if the first group group.query=_query_:{!frange l=0 u=10 v=$score} contain 10 records,i don't need records from other two groups. could someone help me please. Thanks Regards, Senthilnathan V