Re: Get position of first occurrence in search result
It wouldn't be too hard to write a Solr Plugin that take a param docId together with a query and return the position of that doc within the result list for that query. You will still need to deal with the performance though. For example, if the doc ranks at one millionth, the plugin still needs to get at least 1M docs, and so the underlying collector still needs to sort through 1M documents. Maybe your business requirement is different, but does it really make a difference if a document ranks at one thousandth or one millionth? For example with Google SEO, either you rank in the first 3 (precision at 3) or you don't :) --Tri On Jun 23, 2014, at 10:55 PM, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Basically this is for analytical purposes, essentially we want to help people (which sites we’ve indexed in our app) to find out for which particular terms (in theory related with their domain) they are bad positioned in our index. Initially we’re starting with this basic “position per term” but the idea is to elaborate further in this direction. This logic por position finding could be abstracted effectively in a plugin inside Solr? I guess it would be more efficient to iterate (or fire the 2 queries) from within solr itself than in our app (written in PHP, so not so fast for some things) speeding up things? Regards, On Jun 24, 2014, at 1:42 AM, Aman Tandon amantandon...@gmail.com wrote: Jorge, i don't think that solr provide this functionality, you have to iterate and solr is very fast in this, you can create a script for that which search for pattern(term) and parse(request) the records until get the record of that desired url, i don't thing 1/3 seconds time to find out is more. As per the search result analysis, there are very few people who request for the second page for their query, otherwise mostly leave the search or modify query string. So i better suggest you that the if the website has the appropriate and good data it should come on first page, so its better to come on first page rather than finding the position. With Regards Aman Tandon On Tue, Jun 24, 2014 at 10:35 AM, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Yes, but I’m looking for the position of the url field of interest in the response of solr. Solr matches the terms against the collection of documents and returns sorted list by score, what I’m trying to do is get the position of the a specific id in this sorted response. The response could be something like position: 5, or position 500. To do this manually suppose the response consists of a very large amount of documents (webpages) in this case I would need to iterate over the complete response to find the position, which in a worst case scenario could be in the last page for instance. For this particular use case I’m not so interested in the URL field per se but more on the position a certain url has in the full solr response. On Jun 24, 2014, at 12:31 AM, Walter Underwood wun...@wunderwood.org wrote: Solr is designed to do exactly this very, very fast. So there isn't a faster way to do it. But you only need to fetch the URL field. You can ignore everything else. wunder On Jun 23, 2014, at 9:32 PM, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Basically given a few search terms (query) the idea is to know given one or more terms in which position your website is located for those specific terms. On Jun 24, 2014, at 12:12 AM, Aman Tandon amantandon...@gmail.com wrote: What kind of search criteria, could you please explain With Regards Aman Tandon On Tue, Jun 24, 2014 at 4:30 AM, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: I’m using Solr for an analytic use case, one of the
solr4.7.2 startup take too long time
Before i upgrade solr to 4.7.2, i use solr3.6.where i startup tomcat, the solr is started up quickly,the index size is 35G. After i upgrade solr to 4.7.2. i rebuild the index totally. and the size of index is 16G. But when i restart the tomcat, i found that solr is startedup too slowly, almost take about 10 minutes. i do not know the reason, And ask for help. thank you -- View this message in context: http://lucene.472066.n3.nabble.com/solr4-7-2-startup-take-too-long-time-tp4143634.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr4.7.2 startup take too long time
You have any warming queries? Also, how do you measure the speed? What does the boot log timestamps show for your index as opposed to - say - an empty example index? Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Tue, Jun 24, 2014 at 1:51 PM, hrdxwandg hrdxwa...@gmail.com wrote: Before i upgrade solr to 4.7.2, i use solr3.6.where i startup tomcat, the solr is started up quickly,the index size is 35G. After i upgrade solr to 4.7.2. i rebuild the index totally. and the size of index is 16G. But when i restart the tomcat, i found that solr is startedup too slowly, almost take about 10 minutes. i do not know the reason, And ask for help. thank you -- View this message in context: http://lucene.472066.n3.nabble.com/solr4-7-2-startup-take-too-long-time-tp4143634.html Sent from the Solr - User mailing list archive at Nabble.com.
Fwd: solr4.7.2 startup take too long time
Forwarding to the mailing list. -- Forwarded message -- From: hrdxwa...@gmail.com Date: Tue, Jun 24, 2014 at 2:15 PM Subject: Re: solr4.7.2 startup take too long time thanks for your reply. i do not warm any queries. and the configuration is default as follow: listener event=newSearcher class=solr.QuerySenderListener arr name=queries !-- lststr name=qsolr/strstr name=sortprice asc/str/lst lststr name=qrocks/strstr name=sortweight asc/str/lst -- /arr /listener listener event=firstSearcher class=solr.QuerySenderListener arr name=queries lst str name=qstatic firstSearcher warming in solrconfig.xml/str /lst /arr /listener the speed is a little objective. i check the tomcat log and its startup time. when i use solr3.6, after i restart the tomcat, i can get the front page from chrome browser. but After i use solr4.7.2, i must wait for a long time. In addition, the index is not empty, my index is 35G in solr3.6, and 16G in solr4.7.2 the question is so confused. quote author='Alexandre Rafalovitch' You have any warming queries? Also, how do you measure the speed? What does the boot log timestamps show for your index as opposed to - say - an empty example index? Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Tue, Jun 24, 2014 at 1:51 PM, hrdxwandg hrdxwa...@gmail.com wrote: Before i upgrade solr to 4.7.2, i use solr3.6.where i startup tomcat, the solr is started up quickly,the index size is 35G. After i upgrade solr to 4.7.2. i rebuild the index totally. and the size of index is 16G. But when i restart the tomcat, i found that solr is startedup too slowly, almost take about 10 minutes. i do not know the reason, And ask for help. thank you
Re: Bug in Collapsing QParserPlugin : Sort by 3 or more fields is broken
Hi Joel, Had missed this email .. Some issue with my gmail setting. The reason CollapsignQParserPlugin is more performant than regular grouping is because 1. QParser refers to global ords for group.field and avoids storing strings in a set. This has two advantage. a) Terms of memory (storing millions of ints vs strings, results in major savings). b) No binary search / look up is necessary when segment changes. Resulting in huge computation savings. 2. The cost CollapsingFieldValue has to maintain score/field value for each unique ord. Memory requirement = number of ords * size of 1 field value. The basic types byte, int, float , long etc will consume reasonable memory. String/Text value can be stored as ords and will consume only 4 bytes. The memory requirement is because arrays are dense and it is per request. Taking an example : Index Size = 100 million documents Unique ords = 10 million Sort field = 4 ( 1 int field + 1 long field + 2 string/text field) Memory requirement = 40 MB for int field + 80 MB for long field + 80 MB for string ords = 200 MB I agree 200 MB per request just for collapsing the search results is huge but at least it increases linearly with number of sort fields.. For my use case, I am willing to pay the linear cost specially when I can't combine the sort fields intelligently into a sort function. Plus it allows me to sort by String/Text fields also which is a big win. PS : 1. We can store long/string fields also as byte/short ords ..For sort fields, where number of unique values are smaller ( example sort by date , sales rank etc), this will result into significant memory savings. On 19 June 2014 19:40, Joel Bernstein joels...@gmail.com wrote: Umesh, this is a good summary. So, the question is what is the cost (performance and memory) of having the CollapsingQParserPlugin choose the group head by using the Solr sort criteria? Keep in mind that the CollapsingQParserPlugin's main design goal is to provide fast performance when collapsing on a high cardinality field. How you choose the group head can have a big impact here, both on memory consumption performance. The function query collapse criteria was added to allow you to come up with custom formulas for selecting the group head, with little or no impact on performance and memory. Using Solr's recip() function query it seems like you could come up with some nice scenarios where two variables could be used to select the group head. For example: fq={!collapse field=a max='sub(prod(cscore(),1000), recip(field(x),1, 1000, 1000))'} This seems like it would basically give you two sort critea: cscore(), which returns the score, would be the primary criteria. The recip of field x would be the secondary criteria. Joel Bernstein Search Engineer at Heliosearch On Thu, Jun 19, 2014 at 2:18 AM, Umesh Prasad umesh.i...@gmail.com wrote: Continuing the discussion on mailing list from Jira. An Example *id group f1 f2*1 g1 5 10 2 g1 5 1000 3 g1 5 1000 4 g1 10 100 5 g2 5 10 6 g2 5 1000 7 g2 5 1000 8 g210 100 sort= f1 asc, f2 desc , id desc *Without collapse will give : * (7,g2), (6,g2), (3,g1), (2,g1), (5,g2), (1,g1), (8,g2), (4,g1) *On collapsing by group_s expected output is : * (7,g2), (3,g1) solr standard collapsing does give this output with group=on,group.field=group_s,group.main=true * Collapsing with CollapsingQParserPlugin* fq={!collapse field=group_s} : (5,g2), (1,g1) * Summarizing Jira Discussion :* 1. CollapsingQParserPlugin picks up the group heads from matching results and passes those further. So in essence filtering some of the matching documents, so that subsequent collectors never see them. It can also pass on score to subsequent collectors using a dummy scorer. 2. TopDocCollector comes later in hierarchy and it will sort on the collapsed set. That works fine. The issue is with step 1. Collapsing is done by a single comparator which can take its value from a field or function. It defaults to score. Function queries do allow us to combine multiple fields / value sources, however it would be difficult to construct a function for given sort fields. Primarily because a) The range of values for a given sort field is not known in advance. It is possible for one sort field to unbounded, but other to be bounded within a small range. b) The sort field can itself hold custom logic. Because of (a) the group head selected by CollapsingQParserPlugin will be incorrect and subsequent sorting will break. On 14 June
CollapsingQParserPlugin throws Exception when useFilterForSortedQuery=true
Hi , Found another bug with CollapsignQParserPlugin. Not a critical one. It throws an exception when used with useFilterForSortedQuery true /useFilterForSortedQuery Patch attached (against 4.8.1 but reproducible in other branches also) 518 T11 C0 oasc.SolrCore.execute [collection1] webapp=null path=null params={q=*%3A*fq=%7B%21collapse+field%3Dgroup_s%7DdefType=edismaxbf=field%28test_ti%29} hits=2 status=0 QTime=99 4557 T11 C0 oasc.SolrCore.execute [collection1] webapp=null path=null params={q=*%3A*fq=%7B%21collapse+field%3Dgroup_s+nullPolicy%3Dexpand+min%3Dtest_tf%7DdefType=edismaxbf=field%28test_ti%29sort=} hits=4 status=0 QTime=15 4587 T11 C0 oasc.SolrException.log ERROR java.lang.UnsupportedOperationException: Query does not implement createWeight at org.apache.lucene.search.Query.createWeight(Query.java:80) at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:684) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297) at org.apache.solr.search.SolrIndexSearcher.getDocSetScore(SolrIndexSearcher.java:879) at org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:902) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1381) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:478) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:461) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at org.apache.solr.util.TestHarness.query(TestHarness.java:295) at org.apache.solr.util.TestHarness.query(TestHarness.java:278) at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:676) at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:669) at org.apache.solr.search.TestCollapseQParserPlugin.testCollapseQueries(TestCollapseQParserPlugin.java:106) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at
RequestHandler init failure
Hi, I'm getting below error while trying to access solrcloud , i tried with , added these two in to solrconfig.xml file lib dir=../../../../contrib/dataimporthandler/lib/ regex=.*\.jar / lib dir=../../../../dist/ regex=solr-dataimporthandler-.*\.jar / actual location is /opt/apps/prod/solr/dist here all the required below jars are available solr-4.8.1.war solr-map-reduce-4.8.1.jar solr-analysis-extras-4.8.1.jar solr-morphlines-cell-4.8.1.jar solr-cell-4.8.1.jar solr-morphlines-core-4.8.1.jar solr-clustering-4.8.1.jarsolr-solrj-4.8.1.jar solr-core-4.8.1.jar solr-test-framework-4.8.1.jar solr-dataimporthandler-4.8.1.jar solr-uima-4.8.1.jar solr-dataimporthandler-extras-4.8.1.jar solr-velocity-4.8.1.jar solrj-libtest-framework solr-langid-4.8.1.jar but stil getting same issue , any help please HTTP Status 500 - {msg=SolrCore 'collection1' is not available due to init failure: RequestHandler init failure,trace=org.apache.solr.common.SolrException: SolrCore 'collection1' is not available due to init failure: RequestHandler init failure at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:753) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:347) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.solr.common.SolrException: RequestHandler init failure at org.apache.solr.core.SolrCore.init(SolrCore.java:858) at org.apache.solr.core.SolrCore.init(SolrCore.java:641) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:556) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:261) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:253) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ... 1 more Caused by: org.apache.solr.common.SolrException: RequestHandler init failure at org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:167) at org.apache.solr.core.SolrCore.init(SolrCore.java:785) ... 10 more Caused by: org.apache.solr.common.SolrException: Error Instantiating Request Handler, solr.DataImportHandler failed to instantiate org.apache.solr.request.SolrRequestHandler at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:559) at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:611) at org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:153) ... 11 more Caused by: java.lang.ClassCastException: class org.apache.solr.handler.dataimport.DataImportHandler at java.lang.Class.asSubclass(Class.java:3165) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:484) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:421) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:538) ... 13 more ,code=500} -- View this message in context: http://lucene.472066.n3.nabble.com/RequestHandler-init-failure-tp4143640.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: No results for a wildcard query for text_general field in solr 4.1
Hi Erick, that is what i did, tried that input on analysis page. The index field splitting the value into two words: „test“ and „or123 Now checking the query at analysis page, and there are the word ist splitting into „test“ and „or123“. By doing the query and look into the debug result, i see that there is no splitting of words. Thats what i expect… str name=rawquerystringsearchField_t:test\-or123*/str str name=querystringsearchField_t:test\-or123*/str str name=parsedquerysearchField_t:test-or123*/str str name=parsedquery_toStringsearchField_t:test-or123*/str Without the wildcard, the word is splitting also in two parts: str name=rawquerystringsearchField_t:test\-or123/str str name=querystringsearchField_t:test\-or123/str str name=parsedquerysearchField_t:test searchField_t:or123/str str name=parsedquery_toStringsearchField_t:test searchField_t:or123/str Any idea which configuration has the responsibility for that behavior? Thanks! Am 23.06.2014 um 22:55 schrieb Erick Erickson erickerick...@gmail.com: Well, you can do more than guess by looking at the admin/analysis page and trying your input on the field in question. That'll show you what actual transformations are performed. You're probably right though. Try adding debug=query to your URL to see what the actual parsed query looks like and compare with the admin/analysis page But yeah, it's a matter of getting all the parts (query parser and analysis chains) to do the right thing. Best, Erick On Mon, Jun 23, 2014 at 7:30 AM, Sven Schönfeldt schoenfe...@subshell.com wrote: Hi Solr-Users, i am trying to do a wildcard query on a dynamic textfield (_t), but don’t get the right result. The configuration for the field type is „text_general“, the default configuration: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Input for the textfield is test-or123 and my query looks like test\-or*“. It seems that the input is allready split into two words: „test“ and „or123“, but that's just a guess. Anyone who can help me, and know why i don’t find the document and whats todo to make the quert working? Regards!
Re: solr4.7.2 startup take too long time
i am a fresh man of nabble, i am so sorry that i take the wrong operation. now i will repeat my answer here. there is no warming queries i my solrconfig.xml , as follows: listener event=newSearcher class=solr.QuerySenderListener arr name=queries /arr /listener listener event=firstSearcher class=solr.QuerySenderListener arr name=queries lst str name=qstatic firstSearcher warming in solrconfig.xml/str /lst /arr /listener the startup speed is a little objective.i check the startup time in tomcat log.when i use solr3.6, After i restart the tomcat, i can get the front page quickly using chrome browser. But i must wait for a long time when i use solr4.7.2. In addtion, the index is not empty. in solr3.6, the index size is 35G, and in solr4.7.2, i rebuild the index and the size of index is 16G. -- View this message in context: http://lucene.472066.n3.nabble.com/solr4-7-2-startup-take-too-long-time-tp4143634p4143648.html Sent from the Solr - User mailing list archive at Nabble.com.
RAMDirectoryFactory setting on replication slave
Hi Guys, As I know RAMDirectoryFactory setting does not work with replication. ( https://cwiki.apache.org/confluence/display/solr/DataDir+and+DirectoryFactory+in+SolrConfig ) By the way, can I use it for replication slave nodes ( not master ) or for SolrCloud ? Thanks, Chunki.
Re: No results for a wildcard query for text_general field in solr 4.1
Hi Sven, StandardTokenizerFactory splits it into two pieces. You can confirm this at analysis page. If this is something you don't want, lets us know. We can help you to create an analysis chain that suits your needs. Ahmet On Tuesday, June 24, 2014 10:39 AM, Sven Schönfeldt schoenfe...@subshell.com wrote: Hi Erick, that is what i did, tried that input on analysis page. The index field splitting the value into two words: „test“ and „or123 Now checking the query at analysis page, and there are the word ist splitting into „test“ and „or123“. By doing the query and look into the debug result, i see that there is no splitting of words. Thats what i expect… str name=rawquerystringsearchField_t:test\-or123*/str str name=querystringsearchField_t:test\-or123*/str str name=parsedquerysearchField_t:test-or123*/str str name=parsedquery_toStringsearchField_t:test-or123*/str Without the wildcard, the word is splitting also in two parts: str name=rawquerystringsearchField_t:test\-or123/str str name=querystringsearchField_t:test\-or123/str str name=parsedquerysearchField_t:test searchField_t:or123/str str name=parsedquery_toStringsearchField_t:test searchField_t:or123/str Any idea which configuration has the responsibility for that behavior? Thanks! Am 23.06.2014 um 22:55 schrieb Erick Erickson erickerick...@gmail.com: Well, you can do more than guess by looking at the admin/analysis page and trying your input on the field in question. That'll show you what actual transformations are performed. You're probably right though. Try adding debug=query to your URL to see what the actual parsed query looks like and compare with the admin/analysis page But yeah, it's a matter of getting all the parts (query parser and analysis chains) to do the right thing. Best, Erick On Mon, Jun 23, 2014 at 7:30 AM, Sven Schönfeldt schoenfe...@subshell.com wrote: Hi Solr-Users, i am trying to do a wildcard query on a dynamic textfield (_t), but don’t get the right result. The configuration for the field type is „text_general“, the default configuration: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Input for the textfield is test-or123 and my query looks like test\-or*“. It seems that the input is allready split into two words: „test“ and „or123“, but that's just a guess. Anyone who can help me, and know why i don’t find the document and whats todo to make the quert working? Regards!
DIH on Solr
Hi experts, We have a requirement to import the data from hbase tables using solr, we have tried with help of Dataimporthandler, we couldn't find the configuration streps or document for dataimporthandler for HBASE, can anybody please share the steps to configure, we tried with basic configuration but while select full import its throwing error , please share the docs or links to configure DIH for hbase table. 6/24/2014 3:44:00 PM WARN ZKPropertiesWriter Could not read DIH properties from /configs/collection1/dataimport.properties :class org.apache.zookeeper.KeeperException$NoNodeException 6/24/2014 3:44:00 PM ERROR DataImporter Full Import failed:java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to load EntityProcessor implementation for entity:msg Processing Document # 1 Thanks in Advance -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-on-Solr-tp4143669.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: RequestHandler init failure
Hi, Looks like you have different versions jars than solr.war ? Ahmet On Tuesday, June 24, 2014 10:33 AM, atp annamalai...@hcl.com wrote: Hi, I'm getting below error while trying to access solrcloud , i tried with , added these two in to solrconfig.xml file lib dir=../../../../contrib/dataimporthandler/lib/ regex=.*\.jar / lib dir=../../../../dist/ regex=solr-dataimporthandler-.*\.jar / actual location is /opt/apps/prod/solr/dist here all the required below jars are available solr-4.8.1.war solr-map-reduce-4.8.1.jar solr-analysis-extras-4.8.1.jar solr-morphlines-cell-4.8.1.jar solr-cell-4.8.1.jar solr-morphlines-core-4.8.1.jar solr-clustering-4.8.1.jar solr-solrj-4.8.1.jar solr-core-4.8.1.jar solr-test-framework-4.8.1.jar solr-dataimporthandler-4.8.1.jar solr-uima-4.8.1.jar solr-dataimporthandler-extras-4.8.1.jar solr-velocity-4.8.1.jar solrj-lib test-framework solr-langid-4.8.1.jar but stil getting same issue , any help please HTTP Status 500 - {msg=SolrCore 'collection1' is not available due to init failure: RequestHandler init failure,trace=org.apache.solr.common.SolrException: SolrCore 'collection1' is not available due to init failure: RequestHandler init failure at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:753) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:347) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.solr.common.SolrException: RequestHandler init failure at org.apache.solr.core.SolrCore.init(SolrCore.java:858) at org.apache.solr.core.SolrCore.init(SolrCore.java:641) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:556) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:261) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:253) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ... 1 more Caused by: org.apache.solr.common.SolrException: RequestHandler init failure at org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:167) at org.apache.solr.core.SolrCore.init(SolrCore.java:785) ... 10 more Caused by: org.apache.solr.common.SolrException: Error Instantiating Request Handler, solr.DataImportHandler failed to instantiate org.apache.solr.request.SolrRequestHandler at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:559) at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:611) at org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:153) ... 11 more Caused by: java.lang.ClassCastException: class org.apache.solr.handler.dataimport.DataImportHandler at java.lang.Class.asSubclass(Class.java:3165) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:484) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:421) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:538) ... 13 more ,code=500} -- View this message in context: http://lucene.472066.n3.nabble.com/RequestHandler-init-failure-tp4143640.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH on Solr
Hi, There is no DataSource or EntityProcessor for HBase, I think. May be http://www.lilyproject.org/lily/index.html works for you? AHmet On Tuesday, June 24, 2014 1:27 PM, atp annamalai...@hcl.com wrote: Hi experts, We have a requirement to import the data from hbase tables using solr, we have tried with help of Dataimporthandler, we couldn't find the configuration streps or document for dataimporthandler for HBASE, can anybody please share the steps to configure, we tried with basic configuration but while select full import its throwing error , please share the docs or links to configure DIH for hbase table. 6/24/2014 3:44:00 PM WARN ZKPropertiesWriter Could not read DIH properties from /configs/collection1/dataimport.properties :class org.apache.zookeeper.KeeperException$NoNodeException 6/24/2014 3:44:00 PM ERROR DataImporter Full Import failed:java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to load EntityProcessor implementation for entity:msg Processing Document # 1 Thanks in Advance -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-on-Solr-tp4143669.html Sent from the Solr - User mailing list archive at Nabble.com.
TokenFilter not working at index time
I'm trying to create a Norwegian Lemmatizer based on a dictionary, but for some odd reason I don't get any search results even thought the Analyzer in Solr Admin shows that it does the right thing. It works at query time if I have reindexed everything based on another stemmer, e.g. NorwegianMinimalStemmer. Here's a screenshot of how it lemmatizes the Norwegian word studenter (masculine indefinite noun, plural - English: students). The stem is student. So far so good: http://folk.uio.no/erlendfg/solr/lemmatizer.png But I get no/few results if I search for studenter compared to student. If I switch to solr.NorwegianMinimalStemFilterFactory in schema.xml at index time and reindexes everything, it works as it should: analyzer type=index filter class=solr.NorwegianMinimalStemFilterFactory variant=no/ What is wrong with my TokenFilter and/or how can I debug this further? I have tried a lot of different things without any luck, for example decode everything explicitly to UTF8 (the wordlist is in iso-8859-1, but I'm reading it properly by setting the correct character set) and trim all the words without any help. The byte sequence also seems to be correct for the stemmed word. My lemmatizer shows [73 74 75 64 65 6e 74], exactly the same as when I have configured NorwegianMinimalStemFilterFactory in schema.xml. Here's the source code of my lemmatizer. Please note that it is not finished: http://folk.uio.no/erlendfg/solr/ Here's the line in my wordlist which contains the word studenter: 66235 student studenter subst mask appell fl ub normert 700 3 The following line returns the stem (input is studenter): final String[] values = stemmer.stem(termAtt.buffer()); The rest of the code is in NorwegianLemmatizerFilter. If several stems are returned, they are all added. Erlend
How to add some more documents to an existing index file
I am having an index file which contains the data from mysql database, I created this index file using dataimporthandler of solr. My requirement is, suppose if i add a new row to database, I want to update that row in my existing index file of solr. I dont have any idea how to add the new record from database to solr ? Do I need to re-index the index file again or even single record update is possible. Thanks Regards Gurunath Pai.
Re: Solr alternates returning different versions of the same document
Hi Erik, thanks - if it helps, I eventually fixed the problem by deleting the documents by id (via an http request), which apparently deleted all the versions everywhere, then re-creating the documents via the admin interface (update, csv). This seems to have left only one version of each document. Yann -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-alternates-returning-different-versions-of-the-same-document-tp4143006p4143680.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Evaluate function only on subset of documents
Thanks guys for your answers. Sorry for the query syntax errors I've added in the previous queries. Chris, you've been really helpful. Indeed, point 3 is the one I'm trying to solve, rather than 2. You're saying that BooleanScorer will consult the clauses in order based on which clause says it can skip the most documents. I think this might be the culprit for me. Let's take this query sample: XXX OR AAA AND {!frange ...} For my use case: AAA returns a subset of 100k documents. frange returns 5k documents, all part of these 100k documents. Therefore, frange skips the most documents. From what you are saying, frange is going to be applied on all documents (since it skips the most documents) and AAA is going to be applied on the subset. This is kind of what I've originally noticed. My goal is to have this in reverse order, since frange is much more expensive than AAA. I was hoping to do so by specifying the cost, saying that Hey, frange has cost 100 while AAA has cost 1, so run AAA first and then run frange on the subset. However this does not seem to be taken into consideration. Does this make sense / Am I getting something wrong? Is there something I can do to achieve this? Thanks, Costi On Tue, Jun 24, 2014 at 4:23 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : Now, if I want to make a query that also contains some OR, it is impossible : to do so with this approach. This is because fq with OR operator is not : supported (SOLR-1223). As an alternative I've tried these queries: : : county='New York' AND (location:Maylands OR location:Holliscort or : parking:yes) AND_val_:{!frange u=0 cost=150 cache=false}mycustomfunction() 1) most of the examples you've posted have syntax errors in them that are probably throwing a wrench into your testing. in this example county='New York' is not valid syntax, presumably you want conty='New Your' 2) based on the example you give, what you're trying to do here doesn't really depend on using SHOULD (ie: OR) type logic against the frange: the only disjunction you have is in a sub-query of a top level conjunction (e: all required) ... the frange itself is still mandatory. so you could still use it as a non-cached postfilter just like in your previous example: q=+XXX +(YYY ZZZ)fq={!frange cost=150 cache=false ...} 3) if that query wasn't exactly what you ment, and your top level query is more complex, containing a mix of MUST, MUST_NOT, and SHOULD clauses, ie: q=+XXX YYY ZZZ -AAA +{!frange ...} ...then the internal behavior of BooleanQuery will automatically do what you want (no need for cache or cost params on the fq) to the best of it's ability because of how the evaluation of boolean clauses are re-ordered internally based on the next match. it's kind of complicated to explain, but the short version is: a) BooleanScorer will avoid asking any clause if it matches a document which has already been disqualified by another clause b) BooleanScorer will consult the clauses in order based on which clause says it can skip the most documents So you migght see your custom function evaluated for some docs that ultimately don't match, but if there are more rare mandatory clauses of your BQ that tell Lucene it can skip over a large number of docs then, your custom function will be skipped. This is how BooleanQuery has always worked, but i just committed a test to verify it even when wrapping a FunctionRangeQuery... https://svn.apache.org/r1604990 4) the extreme of #3 is that if you need to use the {!frange} as part of a full disjunction, ie: q=XXX OR YYY OR {!frange ...} ...then it would be impossible for Solr to only execute the expensive function against the subset of documents that match the query -- because BooleanScorer won't be able to tell which documents match the query unless it evaluates the function (it's a catch-22). even if every doc does not match either XXX or YYY, solr has to evaluate the function against every doc to see if that function *makes* the document match the entire query. -Hoss http://www.lucidworks.com/
Slow QTimes - 5 seconds for Small sized Collections
I am running Solr 4.5.1. Here is how my setup looks: Have 2 modest sized Collections. Collection 1 - 2 shards, 3 replicas (Size of Shard 1 - 115 MB, Size of Shard 2 - 55 MB) Collection 2 - 2 shards, 3 replicas (Size of Shard 2 - 3.5 GB, Size of Shard 2 - 1 GB) These two collections are distributed across: 6 Tomcat Nodes setup on 3 VMs (2 Nodes per VM) Each of the 6 Tomcat nodes has a XmS / XmX setting of 2 GB Each of the 3 VMs have a Physical Memory (RAM) of 32 GB As you can see my Collections are pretty small - This is actually a test environment (and NOT Production), However my users (only have a handful of testers) are complaining of sporadic performances issues on the Search. Here are my observations from the application logs: 1) Out of 200 sample searches across both collections - 13 requests are slow (3 slow responses on Collection 1 and 10 slow responses on Collection 2). 2) When things run fast - they are really fast (Qtimes of 25 - 100 milliseconds) - but when things are slow - I can see that the QTime consistently hovers around the 5 second (or 5000 millisecond mark). I am seeing responses of the order of 5024, 5094, 5035 ms - as though something just hung for 5 seconds. I am observing this 5 second delay on both Collections - which I feel is unusual - because both contain very different data sets. I am unable to figure out whats causing the QTime to be so consistent around the 5 second mark. 3) I build my index only once. I did try running an optimize on both Collection 1 and Collection 2 after the users complained - I did notice that post the optimize the segment count on each of the four shards did come down - but that still didn't resolve the slowness on the searches (I was hoping it would). 4) I am looking at the Solr Dashboard for more clues - My TomCat nodes are definitely NOT running out of memory - the 6 nodes are consuming anywhere between 500 MB - 1 GB RAM 5) The File Descriptor counts are under control - can only see a maximum of 100 file descriptors being used of a total of 4096 6) The Solr dashboard is however showing that 0.2% (or 9.8MB) of Swap Space being consumed on one of the 3 VMs. Is this a concern ? 7) Also looked at the Plugin / Stats for every core on the Solr Dashboard. I can't see any evictions happening in any of the caches - Its always ZERO. Has anyone encountered such an issue ? What else should I be looking for to debug my problem ? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Slow-QTimes-5-seconds-for-Small-sized-Collections-tp4143681.html Sent from the Solr - User mailing list archive at Nabble.com.
Does one need to perform an optimize soon after doing a batch indexing using SolrJ ?
I am using Solr 4.5.1. I have two collections: Collection 1 - 2 shards, 3 replicas (Size of Shard 1 - 115 MB, Size of Shard 2 - 55 MB) Collection 2 - 2 shards, 3 replicas (Size of Shard 2 - 3.5 GB, Size of Shard 2 - 1 GB) I have a batch process that performs indexing (full refresh) - once a week on the same index. Here is some information on how I index: a) I use SolrJ's bulk ADD API for indexing - CloudSolrServer.add(Collection docs). b) I have an autoCommit (hardcommit) setting of for both my Collections (solrConfig.xml): autoCommit maxDocs10/maxDocs openSearcherfalse/openSearcher /autoCommit c) I do a programatic hardcommit at the end of the indexing cycle - with an open searcher of true - so that the documents show up on the Search Results. d) I neither programatically soft commit (nor have any autoSoftCommit seetings) during the batch indexing process e) When I re-index all my data again (the following week) into the same index - I don't delete existing docs. Rather, I just re-index into the same Collection. f) I am using the default mergefactor of 10 in my solrconfig.xml mergeFactor10/mergeFactor Here is what I am observing: 1) After a batch indexing cycle - the segment counts for each shard / core is pretty high. The Solr Dashboard reports segment counts between 8 - 30 segments on the variousr cores. 2) Sometimes the Solr Dashboard shows the status of my Core as - NOT OPTIMIZED. This I find unusual - since I have just finished a Batch indexing cycle - and would assume that the Index should already be optimized - Is this happening because I don't delete my docs before re-indexing all my data ? 3) After I run an optimize on my Collections - the segment count does reduce to significantly - to 1 segment. Am I doing indexing the right way ? Is there a better strategy ? Is it necessary to perform an optimize after every batch indexing cycle ?? The outcome I am looking for is that I need an optimized index after every major Batch Indexing cycle. Thanks!! -- View this message in context: http://lucene.472066.n3.nabble.com/Does-one-need-to-perform-an-optimize-soon-after-doing-a-batch-indexing-using-SolrJ-tp4143686.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Slow QTimes - 5 seconds for Small sized Collections
On Tue, 2014-06-24 at 14:26 +0200, RadhaJayalakshmi wrote: Here are my observations from the application logs: 1) Out of 200 sample searches across both collections - 13 requests are slow (3 slow responses on Collection 1 and 10 slow responses on Collection 2). 2) When things run fast - they are really fast (Qtimes of 25 - 100 milliseconds) - but when things are slow - I can see that the QTime consistently hovers around the 5 second (or 5000 millisecond mark). I am seeing responses of the order of 5024, 5094, 5035 ms - as though something just hung for 5 seconds. We have a strange recurring pattern where the first search every hour on the hour takes about 4 seconds, where standard response time is 400ms. That is for a single shard Solr server, running in Tomcat. Can you check if your slow response times are at the start of every full hour? 6) The Solr dashboard is however showing that 0.2% (or 9.8MB) of Swap Space being consumed on one of the 3 VMs. Is this a concern ? Swap in itself is of no concern. Swapping out unused memory blocks is a feature. As long as the machine rarely accesses the swap file, it is working as intended. - Toke Eskildsen, State and University Library, Denmark
Re: How to add some more documents to an existing index file
Single document update is quite possible! No worries there. Since you’re using DIH (data import handler) you can use the delta-import command, see https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-DataImportHandlerCommands You’ll need some way to determine what a “new” document is. DIH provides the last indexed timestamp that you can leverage in the delta query configuration to pick up documents since that time. Erik On Jun 24, 2014, at 8:07 AM, Pai, Gurunath (GE Corporate, consultant) gurunath@ge.com wrote: I am having an index file which contains the data from mysql database, I created this index file using dataimporthandler of solr. My requirement is, suppose if i add a new row to database, I want to update that row in my existing index file of solr. I dont have any idea how to add the new record from database to solr ? Do I need to re-index the index file again or even single record update is possible. Thanks Regards Gurunath Pai.
Re: How to add some more documents to an existing index file
I have found the answer for the above query, i.e, by using Delta import handler. But If I am going to use the deltaimporthandler of solr, Then I need to add a last modified column in the database table. Is it possible to achieve the same without altering the database table. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-add-some-more-documents-to-an-existing-index-file-tp4143677p4143690.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr4.7.2 startup take too long time
On 6/24/2014 12:51 AM, hrdxwandg wrote: Before i upgrade solr to 4.7.2, i use solr3.6.where i startup tomcat, the solr is started up quickly,the index size is 35G. After i upgrade solr to 4.7.2. i rebuild the index totally. and the size of index is 16G. But when i restart the tomcat, i found that solr is startedup too slowly, almost take about 10 minutes. When you upgraded, what did you change in the config? One thing I am looking for specifically in this situation is the updateLog config. It is a good idea to turn this on in the new version, but depending on how you do your commits, this may make restarts take a very long time. Here's a wiki page that discusses the possible problem in some detail: http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup Thanks, Shawn
Aggregate functions in Solr entity Queries.
Hi, Im new to solr and would ike to index my database.It is working fine for columns. But in solr I have one which which will take the average value from database.The avaerage is not saving in solr. Below is my sample dataconfig.xml dataSource / document entity name=doctor query=*** //fields-Columns entity name=count query=select count(*) from table field column=count name=countValue / /entity .. /document countValue is always returning zero. Can anyone help me in this regard. -- View this message in context: http://lucene.472066.n3.nabble.com/Aggregate-functions-in-Solr-entity-Queries-tp4143675.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: running Post jar from different server
Yes, the localhost is replaced to the right solr URL. The pasted one is test URL. After debug, we found the actual problem is the XML files path is not correct. Thanks for all the support. --Ravi -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Monday, June 23, 2014 2:34 PM To: solr-user@lucene.apache.org Subject: Re: running Post jar from different server You said that SQLDB and Solr are on different servers and that you are running post.jar from a network drive mapped to your SQLDB. If so, then why are you trying to post to localhost? That would resolve to the SQLDB host where Solr is not running. Instead of using localhost in the -Durl part of your command line, use the full hostname or IP address of your Solr server. On Mon, Jun 23, 2014 at 11:04 PM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: HI Anyone has the any reference for these type of execution..? -Original Message- From: EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) [mailto: external.ravi.tamin...@us.bosch.com] Sent: Friday, June 20, 2014 1:46 PM To: solr-user@lucene.apache.org Subject: RE: running Post jar from different server Hi Sameer, Thanks for looking the post. Below are the two variables read from the xml file in my tool. add key=JavaPath value=%JAVA_HOME%\bin\java.exe / add key=JavaArgument value= -Xms128m -Xmx256m -Durl= http://localhost:8983/solr/{0}/update -jar F:/DataDump/Tools/post.jar / In commandline it is something like C:\DataImport\bin\java.exe -Xms128m -Xmx256m -Durl= http://localhost:8983/solr/DataCollection/update -jar F:/DataDump/Tools/post.jar F:/DatFiles/*.xml F:\ is the network drive. Thanks Ravi -Original Message- From: Sameer Maggon [mailto:sam...@measuredsearch.com] Sent: Thursday, June 19, 2014 10:02 PM To: solr-user@lucene.apache.org Subject: Re: running Post jar from different server Ravi, post.jar is a standalone utility that does not have to be on the same server. If you can share the command you are executing, there might be some pointers in there. Thanks, -- *Sameer Maggon* http://measuredsearch.com On Thu, Jun 19, 2014 at 8:54 PM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Hi, I have situation where my SQL Job initiate a console application , where I am calling the post.jar to upload data to SOLR. Both SQL DB and SOLR are 2 different servers. I am calling post.jar from my SQLDB where the path is mapped to a network drive. I am getting an error file not found. Is the above scenario is possible, if anyone has some experience on this can you share or any direction will be really appreciated. Thanks Ravi -- Regards, Shalin Shekhar Mangar.
Re: Get position of first occurrence in search result
How fast does it need to be? I've done this sort of things for relevance evaluation with a driver in Python. Send the query, request 10 or 100 hits in JSON. Request only the URL field (fl parameter). Iterate through them until the URL matches. If it doesn't match, request more. Print the number. Try it, I bet you would be surprised at how fast it is. You can run several copies of this script in parallel, maybe 10 or 20. Writing this in Solr seems like hitting a fly with a hammer. It is over-engineering. Build it in PHP first, even if you do want to do it in Solr. wunder On Jun 23, 2014, at 11:20 PM, Tri Cao tm...@me.com wrote: It wouldn't be too hard to write a Solr Plugin that take a param docId together with a query and return the position of that doc within the result list for that query. You will still need to deal with the performance though. For example, if the doc ranks at one millionth, the plugin still needs to get at least 1M docs, and so the underlying collector still needs to sort through 1M documents. Maybe your business requirement is different, but does it really make a difference if a document ranks at one thousandth or one millionth? For example with Google SEO, either you rank in the first 3 (precision at 3) or you don't :) --Tri On Jun 23, 2014, at 10:55 PM, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Basically this is for analytical purposes, essentially we want to help people (which sites we’ve indexed in our app) to find out for which particular terms (in theory related with their domain) they are bad positioned in our index. Initially we’re starting with this basic “position per term” but the idea is to elaborate further in this direction. This logic por position finding could be abstracted effectively in a plugin inside Solr? I guess it would be more efficient to iterate (or fire the 2 queries) from within solr itself than in our app (written in PHP, so not so fast for some things) speeding up things? Regards, On Jun 24, 2014, at 1:42 AM, Aman Tandon amantandon...@gmail.com wrote: Jorge, i don't think that solr provide this functionality, you have to iterate and solr is very fast in this, you can create a script for that which search for pattern(term) and parse(request) the records until get the record of that desired url, i don't thing 1/3 seconds time to find out is more. As per the search result analysis, there are very few people who request for the second page for their query, otherwise mostly leave the search or modify query string. So i better suggest you that the if the website has the appropriate and good data it should come on first page, so its better to come on first page rather than finding the position. With Regards Aman Tandon On Tue, Jun 24, 2014 at 10:35 AM, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Yes, but I’m looking for the position of the url field of interest in the response of solr. Solr matches the terms against the collection of documents and returns sorted list by score, what I’m trying to do is get the position of the a specific id in this sorted response. The response could be something like position: 5, or position 500. To do this manually suppose the response consists of a very large amount of documents (webpages) in this case I would need to iterate over the complete response to find the position, which in a worst case scenario could be in the last page for instance. For this particular use case I’m not so interested in the URL field per se but more on the position a certain url has in the full solr response. On Jun 24, 2014, at 12:31 AM, Walter Underwood wun...@wunderwood.org wrote: Solr is designed to do exactly this very, very fast. So there isn't a faster way to do it. But you only need to fetch the URL field. You can ignore everything else. wunder On Jun 23, 2014, at 9:32 PM, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Basically given a few search terms (query) the idea is to know given one or more terms in which position your website is located for those specific terms.
RE: POST Vs GET
I have few questions before that. Do you mean, running Jetty in Production is good enough? Like, all Clustering, Load Balance will be taken care..? Can we run Jetty as a service in windows Server.? Security won't be a problem if we use Jetty..? I am in the impression, Tomcat will be more robust on handling all these heading.. May be i can think about running in Jetty in production. --Ravi -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Monday, June 23, 2014 2:30 PM To: solr-user@lucene.apache.org Subject: Re: POST Vs GET Why don't you just use the jetty shipped with Solr? It has all the correct defaults. In future, we may not even support shipping a war file. On Mon, Jun 23, 2014 at 11:07 PM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Hi, I am executing a solr query runs 10 to 12 lines with all the boosting and condition. I change the Http Contentype to POST from GET as post doesn't have any restriction for size. But I am getting an error. I am using Tomcat 7, Is there any place we need to specify in Tomcat to accept POST.. FYI, From my Jetty solr version everthing works good. Thanks Ravi -- Regards, Shalin Shekhar Mangar.
Re: Block Join Not Working - what am I doing wrong?
Okay, Let me try again. 1. Here is some sample SolrJ code that creates a parent and child document (I hope) https://gist.github.com/anonymous/d03747661ef03923de74 2. I tried a block join query which didn't return any results (I tried the Block Join Parent Query Parser approach described in this link https://cwiki.apache.org/confluence/display/solr/Other+Parsers). I expected to get back the parent doc of a child which has ATTRIBUTES.STATE:TX, which I did not , That is what I'm trying to figure out. Thanks http://localhost:8088/solr/test_core/select?q={!parent which=content_type:parentDocument}ATTRIBUTES.STATE:TXwt=jsonindent=true ( equivalent to http://localhost:8088/solr/test_core/select?q=%7b!parent+which%3d%22content_type%3aparentDocument%22%7dATTRIBUTES.STATE%3aTX%26wt%3djson%26indent%3dtrue ) Resulting in response lst name=responseHeader int name=status0/int int name=QTime1/int lst name=params str name=q {!parent which=content_type:parentDocument}ATTRIBUTES.STATE:TXwt=jsonindent=true /str /lst /lst result name=response numFound=0 start=0/ /response On Mon, Jun 23, 2014 at 4:04 PM, Erick Erickson erickerick...@gmail.com wrote: Well, what do you mean by not working? You might review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Mon, Jun 23, 2014 at 12:20 PM, Vinay B, vybe3...@gmail.com wrote: Hi, I've been trying to experiment with block joins and parent / child docs as described in this thread (input described in my first post of the thread, .. and block join in my second post, as per the suggestions given). What else am I missing? Thanks http://lucene.472066.n3.nabble.com/Why-aren-t-my-nested-documents-nesting-tt4142702.html#none
Re: RAMDirectoryFactory setting on replication slave
Please don't. At least not until you prove that this is where your bottleneck is. You haven't described what you're trying to fix by making such a change. Solr/Lucene already works a _lot_ to keep the relevant bits of the index in memory. Additionally, the defaults use MMapDirectory, which makes use of the OS cache to hold yet more of the index in memory, see: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html This feels like an XY problem, what is the reason you're interested? Best, Erick On Tue, Jun 24, 2014 at 1:25 AM, Lee Chunki lck7...@coupang.com wrote: Hi Guys, As I know RAMDirectoryFactory setting does not work with replication. ( https://cwiki.apache.org/confluence/display/solr/DataDir+and+DirectoryFactory+in+SolrConfig ) By the way, can I use it for replication slave nodes ( not master ) or for SolrCloud ? Thanks, Chunki.
Re: TokenFilter not working at index time
Hmmm. It would help if you posted a couple of other pieces of information BTW, if this is new code are you considering donating it back? If so please open a JIRA so we can track it, see: http://wiki.apache.org/solr/HowToContribute But to your question: First couple of things I'd do: 1 see what the admin/analysis page tells you happens. 2 attach debug=query to your test case, see what the parsed query looks like. 3 use the admin/schema browser link for the field in question to see what actually makes it into the index. (Or use Luke or even the TermsComponent). My bet is that 2 or 3 will show something unexpected which may give you some clues. Best, Erick On Tue, Jun 24, 2014 at 5:00 AM, Erlend Garåsen e.f.gara...@usit.uio.no wrote: I'm trying to create a Norwegian Lemmatizer based on a dictionary, but for some odd reason I don't get any search results even thought the Analyzer in Solr Admin shows that it does the right thing. It works at query time if I have reindexed everything based on another stemmer, e.g. NorwegianMinimalStemmer. Here's a screenshot of how it lemmatizes the Norwegian word studenter (masculine indefinite noun, plural - English: students). The stem is student. So far so good: http://folk.uio.no/erlendfg/solr/lemmatizer.png But I get no/few results if I search for studenter compared to student. If I switch to solr.NorwegianMinimalStemFilterFactory in schema.xml at index time and reindexes everything, it works as it should: analyzer type=index filter class=solr.NorwegianMinimalStemFilterFactory variant=no/ What is wrong with my TokenFilter and/or how can I debug this further? I have tried a lot of different things without any luck, for example decode everything explicitly to UTF8 (the wordlist is in iso-8859-1, but I'm reading it properly by setting the correct character set) and trim all the words without any help. The byte sequence also seems to be correct for the stemmed word. My lemmatizer shows [73 74 75 64 65 6e 74], exactly the same as when I have configured NorwegianMinimalStemFilterFactory in schema.xml. Here's the source code of my lemmatizer. Please note that it is not finished: http://folk.uio.no/erlendfg/solr/ Here's the line in my wordlist which contains the word studenter: 66235 student studenter subst mask appell fl ub normert 700 3 The following line returns the stem (input is studenter): final String[] values = stemmer.stem(termAtt.buffer()); The rest of the code is in NorwegianLemmatizerFilter. If several stems are returned, they are all added. Erlend
Re: No results for a wildcard query for text_general field in solr 4.1
Wildcards are a tough thing to get your head around. I think my first post on the users list was titled I just don't get wildcards at all or something like that... Right, wildcards aren't tokenized. So by getting your term through the query parsing as a single token, including the hyphen, when the analyzer sees that it's a wildcard it doesn't break on the hyphen. So it's looking for a single token. And since there is not single term like test-or123 you get no matches. I'm afraid this is just how it works. You can do something like replace the hyphen at the app layer. But I don't think there's a way to do what you want OOB. Best, Erick On Tue, Jun 24, 2014 at 1:55 AM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Sven, StandardTokenizerFactory splits it into two pieces. You can confirm this at analysis page. If this is something you don't want, lets us know. We can help you to create an analysis chain that suits your needs. Ahmet On Tuesday, June 24, 2014 10:39 AM, Sven Schönfeldt schoenfe...@subshell.com wrote: Hi Erick, that is what i did, tried that input on analysis page. The index field splitting the value into two words: „test“ and „or123 Now checking the query at analysis page, and there are the word ist splitting into „test“ and „or123“. By doing the query and look into the debug result, i see that there is no splitting of words. Thats what i expect… str name=rawquerystringsearchField_t:test\-or123*/str str name=querystringsearchField_t:test\-or123*/str str name=parsedquerysearchField_t:test-or123*/str str name=parsedquery_toStringsearchField_t:test-or123*/str Without the wildcard, the word is splitting also in two parts: str name=rawquerystringsearchField_t:test\-or123/str str name=querystringsearchField_t:test\-or123/str str name=parsedquerysearchField_t:test searchField_t:or123/str str name=parsedquery_toStringsearchField_t:test searchField_t:or123/str Any idea which configuration has the responsibility for that behavior? Thanks! Am 23.06.2014 um 22:55 schrieb Erick Erickson erickerick...@gmail.com: Well, you can do more than guess by looking at the admin/analysis page and trying your input on the field in question. That'll show you what actual transformations are performed. You're probably right though. Try adding debug=query to your URL to see what the actual parsed query looks like and compare with the admin/analysis page But yeah, it's a matter of getting all the parts (query parser and analysis chains) to do the right thing. Best, Erick On Mon, Jun 23, 2014 at 7:30 AM, Sven Schönfeldt schoenfe...@subshell.com wrote: Hi Solr-Users, i am trying to do a wildcard query on a dynamic textfield (_t), but don’t get the right result. The configuration for the field type is „text_general“, the default configuration: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Input for the textfield is test-or123 and my query looks like test\-or*“. It seems that the input is allready split into two words: „test“ and „or123“, but that's just a guess. Anyone who can help me, and know why i don’t find the document and whats todo to make the quert working? Regards!
Re: Solr alternates returning different versions of the same document
Thanks for letting us know. Erick On Tue, Jun 24, 2014 at 5:25 AM, yann yannick.lallem...@gmail.com wrote: Hi Erik, thanks - if it helps, I eventually fixed the problem by deleting the documents by id (via an http request), which apparently deleted all the versions everywhere, then re-creating the documents via the admin interface (update, csv). This seems to have left only one version of each document. Yann -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-alternates-returning-different-versions-of-the-same-document-tp4143006p4143680.html Sent from the Solr - User mailing list archive at Nabble.com.
OOM during indexing nested docs
Hi, I am getting OOM during indexing 400 million docs (nested 7-20 children). The memory usage gets higher while indexing until it gets to 24g. also after OOM and stop indexing, the memory stays on 24g, *seems like a leak.* *Solr Collection Info: * solr 4.8 , 6 shards, 1 replicas per shard, 24g for jvm Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/OOM-during-indexing-nested-docs-tp4143722.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TokenFilter not working at index time
Hi Erlend, After a quick look, I have implemented similar TokenFilter that injects several tokens at same position. Please see source code of : Zemberek2DeasciifyFilter in https://github.com/iorixxx/lucene-solr-analysis-turkish You can insert your line : final String[] values = stemmer.stem(termAtt.buffer()); to it. Another note : You can use o.a.l.analysis.util.CharArrayMapString instead of MapString, Stringwordlist for efficiency. Please see TurkishDeasciifyFilter for example usage. Let us know if that works for you. Ahmet On Tuesday, June 24, 2014 3:00 PM, Erlend Garåsen e.f.gara...@usit.uio.no wrote: I'm trying to create a Norwegian Lemmatizer based on a dictionary, but for some odd reason I don't get any search results even thought the Analyzer in Solr Admin shows that it does the right thing. It works at query time if I have reindexed everything based on another stemmer, e.g. NorwegianMinimalStemmer. Here's a screenshot of how it lemmatizes the Norwegian word studenter (masculine indefinite noun, plural - English: students). The stem is student. So far so good: http://folk.uio.no/erlendfg/solr/lemmatizer.png But I get no/few results if I search for studenter compared to student. If I switch to solr.NorwegianMinimalStemFilterFactory in schema.xml at index time and reindexes everything, it works as it should: analyzer type=index filter class=solr.NorwegianMinimalStemFilterFactory variant=no/ What is wrong with my TokenFilter and/or how can I debug this further? I have tried a lot of different things without any luck, for example decode everything explicitly to UTF8 (the wordlist is in iso-8859-1, but I'm reading it properly by setting the correct character set) and trim all the words without any help. The byte sequence also seems to be correct for the stemmed word. My lemmatizer shows [73 74 75 64 65 6e 74], exactly the same as when I have configured NorwegianMinimalStemFilterFactory in schema.xml. Here's the source code of my lemmatizer. Please note that it is not finished: http://folk.uio.no/erlendfg/solr/ Here's the line in my wordlist which contains the word studenter: 66235 student studenter subst mask appell fl ub normert 700 3 The following line returns the stem (input is studenter): final String[] values = stemmer.stem(termAtt.buffer()); The rest of the code is in NorwegianLemmatizerFilter. If several stems are returned, they are all added. Erlend
Re: Block Join Not Working - what am I doing wrong?
did you run the underneath query ATTRIBUTES. STATE:TX. does it return anything? On Tue, Jun 24, 2014 at 6:59 PM, Vinay B, vybe3...@gmail.com wrote: Okay, Let me try again. 1. Here is some sample SolrJ code that creates a parent and child document (I hope) https://gist.github.com/anonymous/d03747661ef03923de74 2. I tried a block join query which didn't return any results (I tried the Block Join Parent Query Parser approach described in this link https://cwiki.apache.org/confluence/display/solr/Other+Parsers). I expected to get back the parent doc of a child which has ATTRIBUTES.STATE:TX, which I did not , That is what I'm trying to figure out. Thanks http://localhost:8088/solr/test_core/select?q={!parent which=content_type:parentDocument}ATTRIBUTES.STATE:TXwt=jsonindent=true ( equivalent to http://localhost:8088/solr/test_core/select?q=%7b!parent+which%3d%22content_type%3aparentDocument%22%7dATTRIBUTES.STATE%3aTX%26wt%3djson%26indent%3dtrue ) Resulting in response lst name=responseHeader int name=status0/int int name=QTime1/int lst name=params str name=q {!parent which=content_type:parentDocument}ATTRIBUTES.STATE:TXwt=jsonindent=true /str /lst /lst result name=response numFound=0 start=0/ /response On Mon, Jun 23, 2014 at 4:04 PM, Erick Erickson erickerick...@gmail.com wrote: Well, what do you mean by not working? You might review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Mon, Jun 23, 2014 at 12:20 PM, Vinay B, vybe3...@gmail.com wrote: Hi, I've been trying to experiment with block joins and parent / child docs as described in this thread (input described in my first post of the thread, .. and block join in my second post, as per the suggestions given). What else am I missing? Thanks http://lucene.472066.n3.nabble.com/Why-aren-t-my-nested-documents-nesting-tt4142702.html#none -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Does one need to perform an optimize soon after doing a batch indexing using SolrJ ?
Your indexing process looks fine, there's no reason to change it. Optimizing is _probably_ unnecessary at all. In fact in the 4.x world it was changed to forceMerge to make it seem less attractive (I mean, who wouldn't want an optimized index?) That said, the batch indexing process has nothing at all to do with optimization. Nothing in the process of adding docs to a server will trigger an optimize. In your case, since your index only changes once a week it will help your performance a little (but perhaps so little you won't notice) to optimize after the batch index is done. In short, your process seems fine. Indexes are never optimized unless you explicitly do it. After all, how would Solr know that you are done with your batch indexing? Best, Erick On Tue, Jun 24, 2014 at 5:32 AM, RadhaJayalakshmi rlakshminaraya...@inautix.co.in wrote: I am using Solr 4.5.1. I have two collections: Collection 1 - 2 shards, 3 replicas (Size of Shard 1 - 115 MB, Size of Shard 2 - 55 MB) Collection 2 - 2 shards, 3 replicas (Size of Shard 2 - 3.5 GB, Size of Shard 2 - 1 GB) I have a batch process that performs indexing (full refresh) - once a week on the same index. Here is some information on how I index: a) I use SolrJ's bulk ADD API for indexing - CloudSolrServer.add(Collection docs). b) I have an autoCommit (hardcommit) setting of for both my Collections (solrConfig.xml): autoCommit maxDocs10/maxDocs openSearcherfalse/openSearcher /autoCommit c) I do a programatic hardcommit at the end of the indexing cycle - with an open searcher of true - so that the documents show up on the Search Results. d) I neither programatically soft commit (nor have any autoSoftCommit seetings) during the batch indexing process e) When I re-index all my data again (the following week) into the same index - I don't delete existing docs. Rather, I just re-index into the same Collection. f) I am using the default mergefactor of 10 in my solrconfig.xml mergeFactor10/mergeFactor Here is what I am observing: 1) After a batch indexing cycle - the segment counts for each shard / core is pretty high. The Solr Dashboard reports segment counts between 8 - 30 segments on the variousr cores. 2) Sometimes the Solr Dashboard shows the status of my Core as - NOT OPTIMIZED. This I find unusual - since I have just finished a Batch indexing cycle - and would assume that the Index should already be optimized - Is this happening because I don't delete my docs before re-indexing all my data ? 3) After I run an optimize on my Collections - the segment count does reduce to significantly - to 1 segment. Am I doing indexing the right way ? Is there a better strategy ? Is it necessary to perform an optimize after every batch indexing cycle ?? The outcome I am looking for is that I need an optimized index after every major Batch Indexing cycle. Thanks!! -- View this message in context: http://lucene.472066.n3.nabble.com/Does-one-need-to-perform-an-optimize-soon-after-doing-a-batch-indexing-using-SolrJ-tp4143686.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Does one need to perform an optimize soon after doing a batch indexing using SolrJ ?
Hi, You don't need to optimize just based on segment counts. Solr doesn't optimize automatically because often it doesn't improve things enough to justify the computational cost of optimizing. You shouldn't optimize unless you do a benchmark and discover that optimizing improves performance. If you're just worried about the segment count, you can tune that in solrconfig.xml and Solr will merge down your index on the fly as it indexes. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Jun 24, 2014 at 8:32 AM, RadhaJayalakshmi rlakshminaraya...@inautix.co.in wrote: I am using Solr 4.5.1. I have two collections: Collection 1 - 2 shards, 3 replicas (Size of Shard 1 - 115 MB, Size of Shard 2 - 55 MB) Collection 2 - 2 shards, 3 replicas (Size of Shard 2 - 3.5 GB, Size of Shard 2 - 1 GB) I have a batch process that performs indexing (full refresh) - once a week on the same index. Here is some information on how I index: a) I use SolrJ's bulk ADD API for indexing - CloudSolrServer.add(Collection docs). b) I have an autoCommit (hardcommit) setting of for both my Collections (solrConfig.xml): autoCommit maxDocs10/maxDocs openSearcherfalse/openSearcher /autoCommit c) I do a programatic hardcommit at the end of the indexing cycle - with an open searcher of true - so that the documents show up on the Search Results. d) I neither programatically soft commit (nor have any autoSoftCommit seetings) during the batch indexing process e) When I re-index all my data again (the following week) into the same index - I don't delete existing docs. Rather, I just re-index into the same Collection. f) I am using the default mergefactor of 10 in my solrconfig.xml mergeFactor10/mergeFactor Here is what I am observing: 1) After a batch indexing cycle - the segment counts for each shard / core is pretty high. The Solr Dashboard reports segment counts between 8 - 30 segments on the variousr cores. 2) Sometimes the Solr Dashboard shows the status of my Core as - NOT OPTIMIZED. This I find unusual - since I have just finished a Batch indexing cycle - and would assume that the Index should already be optimized - Is this happening because I don't delete my docs before re-indexing all my data ? 3) After I run an optimize on my Collections - the segment count does reduce to significantly - to 1 segment. Am I doing indexing the right way ? Is there a better strategy ? Is it necessary to perform an optimize after every batch indexing cycle ?? The outcome I am looking for is that I need an optimized index after every major Batch Indexing cycle. Thanks!! -- View this message in context: http://lucene.472066.n3.nabble.com/Does-one-need-to-perform-an-optimize-soon-after-doing-a-batch-indexing-using-SolrJ-tp4143686.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: No results for a wildcard query for text_general field in solr 4.1
I think I am officially tired of having to explain why Solr doesn't do what users expect for this query. I mean, I can accept that low level Lucene should work strictly on the decomposed terms of test test-or*, but is is very reasonable for users (even EXPERT users) to expect that the Solr query parser will generate what the complex phrase query parser generates. See: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser Having to use a separate query parser for this obvious, common case is... absurd. (What does Elasticsearch do for this case??) -- Jack Krupansky -Original Message- From: Erick Erickson Sent: Tuesday, June 24, 2014 11:38 AM To: solr-user@lucene.apache.org ; Ahmet Arslan Subject: Re: No results for a wildcard query for text_general field in solr 4.1 Wildcards are a tough thing to get your head around. I think my first post on the users list was titled I just don't get wildcards at all or something like that... Right, wildcards aren't tokenized. So by getting your term through the query parsing as a single token, including the hyphen, when the analyzer sees that it's a wildcard it doesn't break on the hyphen. So it's looking for a single token. And since there is not single term like test-or123 you get no matches. I'm afraid this is just how it works. You can do something like replace the hyphen at the app layer. But I don't think there's a way to do what you want OOB. Best, Erick On Tue, Jun 24, 2014 at 1:55 AM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Sven, StandardTokenizerFactory splits it into two pieces. You can confirm this at analysis page. If this is something you don't want, lets us know. We can help you to create an analysis chain that suits your needs. Ahmet On Tuesday, June 24, 2014 10:39 AM, Sven Schönfeldt schoenfe...@subshell.com wrote: Hi Erick, that is what i did, tried that input on analysis page. The index field splitting the value into two words: „test“ and „or123 Now checking the query at analysis page, and there are the word ist splitting into „test“ and „or123“. By doing the query and look into the debug result, i see that there is no splitting of words. Thats what i expect… str name=rawquerystringsearchField_t:test\-or123*/str str name=querystringsearchField_t:test\-or123*/str str name=parsedquerysearchField_t:test-or123*/str str name=parsedquery_toStringsearchField_t:test-or123*/str Without the wildcard, the word is splitting also in two parts: str name=rawquerystringsearchField_t:test\-or123/str str name=querystringsearchField_t:test\-or123/str str name=parsedquerysearchField_t:test searchField_t:or123/str str name=parsedquery_toStringsearchField_t:test searchField_t:or123/str Any idea which configuration has the responsibility for that behavior? Thanks! Am 23.06.2014 um 22:55 schrieb Erick Erickson erickerick...@gmail.com: Well, you can do more than guess by looking at the admin/analysis page and trying your input on the field in question. That'll show you what actual transformations are performed. You're probably right though. Try adding debug=query to your URL to see what the actual parsed query looks like and compare with the admin/analysis page But yeah, it's a matter of getting all the parts (query parser and analysis chains) to do the right thing. Best, Erick On Mon, Jun 23, 2014 at 7:30 AM, Sven Schönfeldt schoenfe...@subshell.com wrote: Hi Solr-Users, i am trying to do a wildcard query on a dynamic textfield (_t), but don’t get the right result. The configuration for the field type is „text_general“, the default configuration: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Input for the textfield is test-or123 and my query looks like test\-or*“. It seems that the input is allready split into two words: „test“ and „or123“, but that's just a guess. Anyone who can help me, and know why i don’t find the document and whats todo to make the quert working? Regards!
Re: Slow QTimes - 5 seconds for Small sized Collections
That is strange indeed. The usual culprit is that there is a commit in there and no autowarming, so you see pauses when the first query hits after a commit. But you say you only build the index once which would seem to rule that out. I'd be interested in what is in your Solr logs around the time in question. Say 10,000 lines leading up to a slow query (10,000 lines is completely arbitrary, hopefully it's enough to see something interesting). Best, Erick On Tue, Jun 24, 2014 at 5:26 AM, RadhaJayalakshmi rlakshminaraya...@inautix.co.in wrote: I am running Solr 4.5.1. Here is how my setup looks: Have 2 modest sized Collections. Collection 1 - 2 shards, 3 replicas (Size of Shard 1 - 115 MB, Size of Shard 2 - 55 MB) Collection 2 - 2 shards, 3 replicas (Size of Shard 2 - 3.5 GB, Size of Shard 2 - 1 GB) These two collections are distributed across: 6 Tomcat Nodes setup on 3 VMs (2 Nodes per VM) Each of the 6 Tomcat nodes has a XmS / XmX setting of 2 GB Each of the 3 VMs have a Physical Memory (RAM) of 32 GB As you can see my Collections are pretty small - This is actually a test environment (and NOT Production), However my users (only have a handful of testers) are complaining of sporadic performances issues on the Search. Here are my observations from the application logs: 1) Out of 200 sample searches across both collections - 13 requests are slow (3 slow responses on Collection 1 and 10 slow responses on Collection 2). 2) When things run fast - they are really fast (Qtimes of 25 - 100 milliseconds) - but when things are slow - I can see that the QTime consistently hovers around the 5 second (or 5000 millisecond mark). I am seeing responses of the order of 5024, 5094, 5035 ms - as though something just hung for 5 seconds. I am observing this 5 second delay on both Collections - which I feel is unusual - because both contain very different data sets. I am unable to figure out whats causing the QTime to be so consistent around the 5 second mark. 3) I build my index only once. I did try running an optimize on both Collection 1 and Collection 2 after the users complained - I did notice that post the optimize the segment count on each of the four shards did come down - but that still didn't resolve the slowness on the searches (I was hoping it would). 4) I am looking at the Solr Dashboard for more clues - My TomCat nodes are definitely NOT running out of memory - the 6 nodes are consuming anywhere between 500 MB - 1 GB RAM 5) The File Descriptor counts are under control - can only see a maximum of 100 file descriptors being used of a total of 4096 6) The Solr dashboard is however showing that 0.2% (or 9.8MB) of Swap Space being consumed on one of the 3 VMs. Is this a concern ? 7) Also looked at the Plugin / Stats for every core on the Solr Dashboard. I can't see any evictions happening in any of the caches - Its always ZERO. Has anyone encountered such an issue ? What else should I be looking for to debug my problem ? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Slow-QTimes-5-seconds-for-Small-sized-Collections-tp4143681.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TokenFilter not working at index time
By quickly looking at it, I think you have unreachable code in the NorwegianLemmatizerFilter class (certainly, attaching debugging would be your best bet): @Override public boolean incrementToken() throws IOException { if (input.incrementToken()) { if (!keywordAttr.isKeyword()) { final String[] values = stemmer.stem(termAtt.buffer()); if (values == null || values.length == 0) { return false; } else { termAtt.setEmpty().append(values[0]); if (values.length 1) { for (int i = 1; i values.length; i++) { terms.add(values[i]); } } return true; } } return false; } else if (!terms.isEmpty()) { termAtt.setEmpty().append(terms.poll()); // I don't think this will exhaust terms queue at full for this token, // because on the next call to incrementToken() method // input.incrementToken() is called return true; } else { return false; } } Instead I would do something like this: [code] private IteratorString iterator; @Override public boolean incrementToken() throws IOException { String nextStem = next(); if (next == null) return false; // chain the stems; // if this is undesired, you can put them into the same position by restoring previous state termAtt.setEmpty(); termAtt.append(nextStem); termAtt.setLength(nextStem.length()); return true; } public String next() throws IOException { if ((iterator == null) || (!iterator.hasNext())) { if (!input.incrementToken()) return null; char[] buffer = termAtt.buffer(); if (buffer == null || buffer.length == 0) return null; final String tokenTerm = new String(buffer, 0, termAtt.length()); final String lcTokenTerm = tokenTerm.toLowerCase(); CollectionString stems = new ArrayList(); Collections.addAll(stems, stemmer.stem(lcTokenTerm)); iterator = stems.iterator(); } if (iterator.hasNext()) { String next = iterator.next(); if (next != null) { return next; } } return null; } [/code] On Tue, Jun 24, 2014 at 3:00 PM, Erlend Garåsen e.f.gara...@usit.uio.no wrote: I'm trying to create a Norwegian Lemmatizer based on a dictionary, but for some odd reason I don't get any search results even thought the Analyzer in Solr Admin shows that it does the right thing. It works at query time if I have reindexed everything based on another stemmer, e.g. NorwegianMinimalStemmer. Here's a screenshot of how it lemmatizes the Norwegian word studenter (masculine indefinite noun, plural - English: students). The stem is student. So far so good: http://folk.uio.no/erlendfg/solr/lemmatizer.png But I get no/few results if I search for studenter compared to student. If I switch to solr.NorwegianMinimalStemFilterFactory in schema.xml at index time and reindexes everything, it works as it should: analyzer type=index filter class=solr.NorwegianMinimalStemFilterFactory variant=no/ What is wrong with my TokenFilter and/or how can I debug this further? I have tried a lot of different things without any luck, for example decode everything explicitly to UTF8 (the wordlist is in iso-8859-1, but I'm reading it properly by setting the correct character set) and trim all the words without any help. The byte sequence also seems to be correct for the stemmed word. My lemmatizer shows [73 74 75 64 65 6e 74], exactly the same as when I have configured NorwegianMinimalStemFilterFactory in schema.xml. Here's the source code of my lemmatizer. Please note that it is not finished: http://folk.uio.no/erlendfg/solr/ Here's the line in my wordlist which contains the word studenter: 66235 student studenter subst mask appell fl ub normert 700 3 The following line returns the stem (input is studenter): final String[] values = stemmer.stem(termAtt.buffer()); The rest of the code is in NorwegianLemmatizerFilter. If several stems are returned, they are all added. Erlend -- Dmitry Kan Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
limit solr results before join
Is there any way to limit the results of a query on the from index before it gets joined? The SQL analogy might be... SELECT * from toIndex join (select * from fromIndex where some query limit 1000 ) fromIndex on fromIndex.from=toIndex.to Example: _query_:{!join fromIndex=expressionData from=anatomyID to=anatomyID v='(anatomy:\brain\)'} Say I have an index representing data for gene expression (we work with genetics), and you query it by anatomy term. So the above would query for all data that shows gene expression in brain. Now I want to get a set of related data for each anatomy term via the join. Is there any way to get the related data for only anatomy terms in the first 1000 expression data documents (fromIndex)? The reason is because there could be millions of data documents (fromIndex), and we process them in batches to load a visualization of the query results. Doing the join on all the results for each batch I process is becoming a bottleneck for large sets of data. Thanks, -Kevin The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.
Re: Does one need to perform an optimize soon after doing a batch indexing using SolrJ ?
The one exception that we should always note is that if your batch includes deletion of existing documents, an optimize can be appropriate since the term frequencies stored by Lucene may be off since the deleted documents still count as existing terms. Is this exception noted in the Solr ref guide? -- Jack Krupansky -Original Message- From: Erick Erickson Sent: Tuesday, June 24, 2014 11:46 AM To: solr-user@lucene.apache.org Subject: Re: Does one need to perform an optimize soon after doing a batch indexing using SolrJ ? Your indexing process looks fine, there's no reason to change it. Optimizing is _probably_ unnecessary at all. In fact in the 4.x world it was changed to forceMerge to make it seem less attractive (I mean, who wouldn't want an optimized index?) That said, the batch indexing process has nothing at all to do with optimization. Nothing in the process of adding docs to a server will trigger an optimize. In your case, since your index only changes once a week it will help your performance a little (but perhaps so little you won't notice) to optimize after the batch index is done. In short, your process seems fine. Indexes are never optimized unless you explicitly do it. After all, how would Solr know that you are done with your batch indexing? Best, Erick On Tue, Jun 24, 2014 at 5:32 AM, RadhaJayalakshmi rlakshminaraya...@inautix.co.in wrote: I am using Solr 4.5.1. I have two collections: Collection 1 - 2 shards, 3 replicas (Size of Shard 1 - 115 MB, Size of Shard 2 - 55 MB) Collection 2 - 2 shards, 3 replicas (Size of Shard 2 - 3.5 GB, Size of Shard 2 - 1 GB) I have a batch process that performs indexing (full refresh) - once a week on the same index. Here is some information on how I index: a) I use SolrJ's bulk ADD API for indexing - CloudSolrServer.add(Collection docs). b) I have an autoCommit (hardcommit) setting of for both my Collections (solrConfig.xml): autoCommit maxDocs10/maxDocs openSearcherfalse/openSearcher /autoCommit c) I do a programatic hardcommit at the end of the indexing cycle - with an open searcher of true - so that the documents show up on the Search Results. d) I neither programatically soft commit (nor have any autoSoftCommit seetings) during the batch indexing process e) When I re-index all my data again (the following week) into the same index - I don't delete existing docs. Rather, I just re-index into the same Collection. f) I am using the default mergefactor of 10 in my solrconfig.xml mergeFactor10/mergeFactor Here is what I am observing: 1) After a batch indexing cycle - the segment counts for each shard / core is pretty high. The Solr Dashboard reports segment counts between 8 - 30 segments on the variousr cores. 2) Sometimes the Solr Dashboard shows the status of my Core as - NOT OPTIMIZED. This I find unusual - since I have just finished a Batch indexing cycle - and would assume that the Index should already be optimized - Is this happening because I don't delete my docs before re-indexing all my data ? 3) After I run an optimize on my Collections - the segment count does reduce to significantly - to 1 segment. Am I doing indexing the right way ? Is there a better strategy ? Is it necessary to perform an optimize after every batch indexing cycle ?? The outcome I am looking for is that I need an optimized index after every major Batch Indexing cycle. Thanks!! -- View this message in context: http://lucene.472066.n3.nabble.com/Does-one-need-to-perform-an-optimize-soon-after-doing-a-batch-indexing-using-SolrJ-tp4143686.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Evaluate function only on subset of documents
: Let's take this query sample: : XXX OR AAA AND {!frange ...} : : For my use case: : AAA returns a subset of 100k documents. : frange returns 5k documents, all part of these 100k documents. : : Therefore, frange skips the most documents. From what you are saying, : frange is going to be applied on all documents (since it skips the most : documents) and AAA is going to be applied on the subset. This is kind of : what I've originally noticed. My goal is to have this in reverse order, That's not exactly it ... there's no way for the query to know in advance how many documents it matches -- what BooleanQuery asks each clause is looking at the index, tell me the (internal) lucene docid of the first do you match. it then looks at the lowest matching docid of each clause, and the Occur property of the clause (MUST, MUST_NOT, SHOULD) to be able to tell if/when it can say things like clause AAA is mandatory but the lowest id it matches is doc# 8675 -- so it doesn't mater that clause XXX's lowest match is doc# 10 or that clause {!frange}'s lowest matche is doc# 100 it can then ask XXX and {!frange} to both skip ahead, and find lowest docid they each match that is no less then 8675, etc... from the perspective of {!frange} in particular, this means that on the first call it will evaluate itself against docid #0, #1, #2, etc... untill it finds a match. and on the secod call it will evaluate itself against docid #8675, 8676, etc... until it finds a match... : since frange is much more expensive than AAA. : I was hoping to do so by specifying the cost, saying that Hey, frange has There is no support for specifying cost on individual clauses instead of a BooleanQuery. But i really want to re-iterate, that even with the example you posted above you *still* don't need to nest your {!frange} instead of a boolean query -- what you have is this: XXX OR AAA AND {!frange ...} in which the {!frange ...} clause is completely mandatory -- so my previous point #2 still applies... : 2) based on the example you give, what you're trying to do here doesn't : really depend on using SHOULD (ie: OR) type logic against the frange: : the only disjunction you have is in a sub-query of a top level : conjunction (e: all required) ... the frange itself is still mandatory. : : so you could still use it as a non-cached postfilter just like in your : previous example: q=XXX OR AAA fq={!frange cost=150 cache=false ...} -Hoss http://www.lucidworks.com/
SolrCloud copy the index to another cluster.
Hello, We do have a running SolrCloud cluster, a simple set up of 4 nodes — 2 shards and 2 replicas and ≈ 140GB index. And now we have to move to another server and need to somehow copy existing index without downtime (if applicable). New config is exactly the same, same 4 nodes, same collections and their own zookeeper. What options do we have? What I was thinking about is to add 2 nodes (from the new cluster, those that are supposed to be shards) as replicas for the existing old cluster and when the replication is done simply switch the app to use those new replicas. Then reconfigure these replicas and run them as shards with their own zookeper. So there will be minimal downtime just to restart the new cluster. My concerns are: * Will those new replicas be automatically populated with the index from the old cluster? * Will I then be able to disconnect them from the old cluster and run as primary shards with their own zookeper and then add their own replicas from the new cluster? Thank you, Alex -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-copy-the-index-to-another-cluster-tp4143759.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud copy the index to another cluster.
I've just realized that old and new clusters do use different installations, configs and lib paths. So the nodes from the new cluster will probably simply refuse to start using configs from the old zookeper. Only if there is a way to run them with their own zookeper and then manually add as replicas to the old cluster, so old and new clusters keep using their own zookepers. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-copy-the-index-to-another-cluster-tp4143759p4143769.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Block Join Not Working - what am I doing wrong?
Hi, Yes, the query ATTRIBUTES.STATE:TX returns the child doc (see response below) . Is there something else that I'm missing to link the parent and the child? I followed your advice from my last thread and used a block join in this attempt, but still don't see how the parent and child realize their association. We're using solr 4.8.1 Thanks Query Response { responseHeader:{ status:0, QTime:0, params:{ indent:true, q:ATTRIBUTES.STATE:TX, wt:json}}, response:{numFound:1,start:0,docs:[ { id:1-A, ATTRIBUTES.STATE:[LA, TX]}] }} Raw doc dump { responseHeader:{ status:0, QTime:0, params:{ indent:true, q:*:*, wt:json}}, response:{numFound:2,start:0,docs:[ { id:1-A, ATTRIBUTES.STATE:[LA, TX]}, { id:1, content_type:parentDocument, _version_:1471814208097091584}] }} On Tue, Jun 24, 2014 at 10:45 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: did you run the underneath query ATTRIBUTES. STATE:TX. does it return anything? On Tue, Jun 24, 2014 at 6:59 PM, Vinay B, vybe3...@gmail.com wrote: Okay, Let me try again. 1. Here is some sample SolrJ code that creates a parent and child document (I hope) https://gist.github.com/anonymous/d03747661ef03923de74 2. I tried a block join query which didn't return any results (I tried the Block Join Parent Query Parser approach described in this link https://cwiki.apache.org/confluence/display/solr/Other+Parsers). I expected to get back the parent doc of a child which has ATTRIBUTES.STATE:TX, which I did not , That is what I'm trying to figure out. Thanks http://localhost:8088/solr/test_core/select?q={!parent which=content_type:parentDocument}ATTRIBUTES.STATE:TXwt=jsonindent=true ( equivalent to http://localhost:8088/solr/test_core/select?q=%7b!parent+which%3d%22content_type%3aparentDocument%22%7dATTRIBUTES.STATE%3aTX%26wt%3djson%26indent%3dtrue ) Resulting in response lst name=responseHeader int name=status0/int int name=QTime1/int lst name=params str name=q {!parent which=content_type:parentDocument}ATTRIBUTES.STATE:TXwt=jsonindent=true /str /lst /lst result name=response numFound=0 start=0/ /response On Mon, Jun 23, 2014 at 4:04 PM, Erick Erickson erickerick...@gmail.com wrote: Well, what do you mean by not working? You might review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Mon, Jun 23, 2014 at 12:20 PM, Vinay B, vybe3...@gmail.com wrote: Hi, I've been trying to experiment with block joins and parent / child docs as described in this thread (input described in my first post of the thread, .. and block join in my second post, as per the suggestions given). What else am I missing? Thanks http://lucene.472066.n3.nabble.com/Why-aren-t-my-nested-documents-nesting-tt4142702.html#none -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Slow QTimes - 5 seconds for Small sized Collections
Two ideas: 1) monitor the GC activity with jvisualvm (comes with Oracle JDK), install a VisualGC plugin, it is quite helpful. The idea is to try to find the GC stop-the-world activities. If any found, look at tweaking the GC parameters. Some insight: http://wiki.apache.org/solr/ShawnHeisey Some more on tools for GC monitoring: http://architects.dzone.com/articles/how-monitor-java-garbage 2) monitor the network latency. Any possibility of the network being periodically congested? Can you plot a graph with number of concurrent (per second) queries versus their Qtimes ? On Tue, Jun 24, 2014 at 3:26 PM, RadhaJayalakshmi rlakshminaraya...@inautix.co.in wrote: I am running Solr 4.5.1. Here is how my setup looks: Have 2 modest sized Collections. Collection 1 - 2 shards, 3 replicas (Size of Shard 1 - 115 MB, Size of Shard 2 - 55 MB) Collection 2 - 2 shards, 3 replicas (Size of Shard 2 - 3.5 GB, Size of Shard 2 - 1 GB) These two collections are distributed across: 6 Tomcat Nodes setup on 3 VMs (2 Nodes per VM) Each of the 6 Tomcat nodes has a XmS / XmX setting of 2 GB Each of the 3 VMs have a Physical Memory (RAM) of 32 GB As you can see my Collections are pretty small - This is actually a test environment (and NOT Production), However my users (only have a handful of testers) are complaining of sporadic performances issues on the Search. Here are my observations from the application logs: 1) Out of 200 sample searches across both collections - 13 requests are slow (3 slow responses on Collection 1 and 10 slow responses on Collection 2). 2) When things run fast - they are really fast (Qtimes of 25 - 100 milliseconds) - but when things are slow - I can see that the QTime consistently hovers around the 5 second (or 5000 millisecond mark). I am seeing responses of the order of 5024, 5094, 5035 ms - as though something just hung for 5 seconds. I am observing this 5 second delay on both Collections - which I feel is unusual - because both contain very different data sets. I am unable to figure out whats causing the QTime to be so consistent around the 5 second mark. 3) I build my index only once. I did try running an optimize on both Collection 1 and Collection 2 after the users complained - I did notice that post the optimize the segment count on each of the four shards did come down - but that still didn't resolve the slowness on the searches (I was hoping it would). 4) I am looking at the Solr Dashboard for more clues - My TomCat nodes are definitely NOT running out of memory - the 6 nodes are consuming anywhere between 500 MB - 1 GB RAM 5) The File Descriptor counts are under control - can only see a maximum of 100 file descriptors being used of a total of 4096 6) The Solr dashboard is however showing that 0.2% (or 9.8MB) of Swap Space being consumed on one of the 3 VMs. Is this a concern ? 7) Also looked at the Plugin / Stats for every core on the Solr Dashboard. I can't see any evictions happening in any of the caches - Its always ZERO. Has anyone encountered such an issue ? What else should I be looking for to debug my problem ? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Slow-QTimes-5-seconds-for-Small-sized-Collections-tp4143681.html Sent from the Solr - User mailing list archive at Nabble.com. -- Dmitry Kan Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
Re: Block Join Not Working - what am I doing wrong?
Vinay, pls upload your index dir somewhere, I can try to check what's wrong with it. On Tue, Jun 24, 2014 at 9:43 PM, Vinay B, vybe3...@gmail.com wrote: Hi, Yes, the query ATTRIBUTES.STATE:TX returns the child doc (see response below) . Is there something else that I'm missing to link the parent and the child? I followed your advice from my last thread and used a block join in this attempt, but still don't see how the parent and child realize their association. We're using solr 4.8.1 Thanks Query Response { responseHeader:{ status:0, QTime:0, params:{ indent:true, q:ATTRIBUTES.STATE:TX, wt:json}}, response:{numFound:1,start:0,docs:[ { id:1-A, ATTRIBUTES.STATE:[LA, TX]}] }} Raw doc dump { responseHeader:{ status:0, QTime:0, params:{ indent:true, q:*:*, wt:json}}, response:{numFound:2,start:0,docs:[ { id:1-A, ATTRIBUTES.STATE:[LA, TX]}, { id:1, content_type:parentDocument, _version_:1471814208097091584}] }} On Tue, Jun 24, 2014 at 10:45 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: did you run the underneath query ATTRIBUTES. STATE:TX. does it return anything? On Tue, Jun 24, 2014 at 6:59 PM, Vinay B, vybe3...@gmail.com wrote: Okay, Let me try again. 1. Here is some sample SolrJ code that creates a parent and child document (I hope) https://gist.github.com/anonymous/d03747661ef03923de74 2. I tried a block join query which didn't return any results (I tried the Block Join Parent Query Parser approach described in this link https://cwiki.apache.org/confluence/display/solr/Other+Parsers). I expected to get back the parent doc of a child which has ATTRIBUTES.STATE:TX, which I did not , That is what I'm trying to figure out. Thanks http://localhost:8088/solr/test_core/select?q={!parent which=content_type:parentDocument}ATTRIBUTES.STATE:TXwt=jsonindent=true ( equivalent to http://localhost:8088/solr/test_core/select?q=%7b!parent+which%3d%22content_type%3aparentDocument%22%7dATTRIBUTES.STATE%3aTX%26wt%3djson%26indent%3dtrue ) Resulting in response lst name=responseHeader int name=status0/int int name=QTime1/int lst name=params str name=q {!parent which=content_type:parentDocument}ATTRIBUTES.STATE:TXwt=jsonindent=true /str /lst /lst result name=response numFound=0 start=0/ /response On Mon, Jun 23, 2014 at 4:04 PM, Erick Erickson erickerick...@gmail.com wrote: Well, what do you mean by not working? You might review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Mon, Jun 23, 2014 at 12:20 PM, Vinay B, vybe3...@gmail.com wrote: Hi, I've been trying to experiment with block joins and parent / child docs as described in this thread (input described in my first post of the thread, .. and block join in my second post, as per the suggestions given). What else am I missing? Thanks http://lucene.472066.n3.nabble.com/Why-aren-t-my-nested-documents-nesting-tt4142702.html#none -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: limit solr results before join
Hello Kevin, You can only apply some restriction clauses (with +) to the from side query. On Tue, Jun 24, 2014 at 8:09 PM, Kevin Stone kevin.st...@jax.org wrote: Is there any way to limit the results of a query on the from index before it gets joined? The SQL analogy might be... SELECT * from toIndex join (select * from fromIndex where some query limit 1000 ) fromIndex on fromIndex.from=toIndex.to Example: _query_:{!join fromIndex=expressionData from=anatomyID to=anatomyID v='(anatomy:\brain\)'} Say I have an index representing data for gene expression (we work with genetics), and you query it by anatomy term. So the above would query for all data that shows gene expression in brain. Now I want to get a set of related data for each anatomy term via the join. Is there any way to get the related data for only anatomy terms in the first 1000 expression data documents (fromIndex)? The reason is because there could be millions of data documents (fromIndex), and we process them in batches to load a visualization of the query results. Doing the join on all the results for each batch I process is becoming a bottleneck for large sets of data. Thanks, -Kevin The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: OOM during indexing nested docs
enable heap dump on OOME, and build the histogram by jhat. Did you try to reduce MaxRamBuffer or max buffered docs? or enable autocommit? On Tue, Jun 24, 2014 at 7:43 PM, adfel70 adfe...@gmail.com wrote: Hi, I am getting OOM during indexing 400 million docs (nested 7-20 children). The memory usage gets higher while indexing until it gets to 24g. also after OOM and stop indexing, the memory stays on 24g, *seems like a leak.* *Solr Collection Info: * solr 4.8 , 6 shards, 1 replicas per shard, 24g for jvm Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/OOM-during-indexing-nested-docs-tp4143722.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
RE: limit solr results before join
I don't know what that means. Is that a no? From: Mikhail Khludnev [mkhlud...@griddynamics.com] Sent: Tuesday, June 24, 2014 2:18 PM To: solr-user Subject: Re: limit solr results before join Hello Kevin, You can only apply some restriction clauses (with +) to the from side query. On Tue, Jun 24, 2014 at 8:09 PM, Kevin Stone kevin.st...@jax.org wrote: Is there any way to limit the results of a query on the from index before it gets joined? The SQL analogy might be... SELECT * from toIndex join (select * from fromIndex where some query limit 1000 ) fromIndex on fromIndex.from=toIndex.to Example: _query_:{!join fromIndex=expressionData from=anatomyID to=anatomyID v='(anatomy:\brain\)'} Say I have an index representing data for gene expression (we work with genetics), and you query it by anatomy term. So the above would query for all data that shows gene expression in brain. Now I want to get a set of related data for each anatomy term via the join. Is there any way to get the related data for only anatomy terms in the first 1000 expression data documents (fromIndex)? The reason is because there could be millions of data documents (fromIndex), and we process them in batches to load a visualization of the query results. Doing the join on all the results for each batch I process is becoming a bottleneck for large sets of data. Thanks, -Kevin The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.
Re: Block Join Not Working - what am I doing wrong?
Michael, try this, Thanks https://www.dropbox.com/s/074p0wpjz916d78/test_core.tar.gz On Tue, Jun 24, 2014 at 1:16 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Vinay, pls upload your index dir somewhere, I can try to check what's wrong with it. On Tue, Jun 24, 2014 at 9:43 PM, Vinay B, vybe3...@gmail.com wrote: Hi, Yes, the query ATTRIBUTES.STATE:TX returns the child doc (see response below) . Is there something else that I'm missing to link the parent and the child? I followed your advice from my last thread and used a block join in this attempt, but still don't see how the parent and child realize their association. We're using solr 4.8.1 Thanks Query Response { responseHeader:{ status:0, QTime:0, params:{ indent:true, q:ATTRIBUTES.STATE:TX, wt:json}}, response:{numFound:1,start:0,docs:[ { id:1-A, ATTRIBUTES.STATE:[LA, TX]}] }} Raw doc dump { responseHeader:{ status:0, QTime:0, params:{ indent:true, q:*:*, wt:json}}, response:{numFound:2,start:0,docs:[ { id:1-A, ATTRIBUTES.STATE:[LA, TX]}, { id:1, content_type:parentDocument, _version_:1471814208097091584}] }} On Tue, Jun 24, 2014 at 10:45 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: did you run the underneath query ATTRIBUTES. STATE:TX. does it return anything? On Tue, Jun 24, 2014 at 6:59 PM, Vinay B, vybe3...@gmail.com wrote: Okay, Let me try again. 1. Here is some sample SolrJ code that creates a parent and child document (I hope) https://gist.github.com/anonymous/d03747661ef03923de74 2. I tried a block join query which didn't return any results (I tried the Block Join Parent Query Parser approach described in this link https://cwiki.apache.org/confluence/display/solr/Other+Parsers). I expected to get back the parent doc of a child which has ATTRIBUTES.STATE:TX, which I did not , That is what I'm trying to figure out. Thanks http://localhost:8088/solr/test_core/select?q={!parent which=content_type:parentDocument}ATTRIBUTES.STATE:TXwt=jsonindent=true ( equivalent to http://localhost:8088/solr/test_core/select?q=%7b!parent+which%3d%22content_type%3aparentDocument%22%7dATTRIBUTES.STATE%3aTX%26wt%3djson%26indent%3dtrue ) Resulting in response lst name=responseHeader int name=status0/int int name=QTime1/int lst name=params str name=q {!parent which=content_type:parentDocument}ATTRIBUTES.STATE:TXwt=jsonindent=true /str /lst /lst result name=response numFound=0 start=0/ /response On Mon, Jun 23, 2014 at 4:04 PM, Erick Erickson erickerick...@gmail.com wrote: Well, what do you mean by not working? You might review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Mon, Jun 23, 2014 at 12:20 PM, Vinay B, vybe3...@gmail.com wrote: Hi, I've been trying to experiment with block joins and parent / child docs as described in this thread (input described in my first post of the thread, .. and block join in my second post, as per the suggestions given). What else am I missing? Thanks http://lucene.472066.n3.nabble.com/Why-aren-t-my-nested-documents-nesting-tt4142702.html#none -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: limit solr results before join
_query_:{!join fromIndex=expressionData from=anatomyID to=anatomyID v='(anatomy:\brain\) +id:[1 TO 1]'} On Tue, Jun 24, 2014 at 10:24 PM, Kevin Stone kevin.st...@jax.org wrote: I don't know what that means. Is that a no? From: Mikhail Khludnev [mkhlud...@griddynamics.com] Sent: Tuesday, June 24, 2014 2:18 PM To: solr-user Subject: Re: limit solr results before join Hello Kevin, You can only apply some restriction clauses (with +) to the from side query. On Tue, Jun 24, 2014 at 8:09 PM, Kevin Stone kevin.st...@jax.org wrote: Is there any way to limit the results of a query on the from index before it gets joined? The SQL analogy might be... SELECT * from toIndex join (select * from fromIndex where some query limit 1000 ) fromIndex on fromIndex.from=toIndex.to Example: _query_:{!join fromIndex=expressionData from=anatomyID to=anatomyID v='(anatomy:\brain\)'} Say I have an index representing data for gene expression (we work with genetics), and you query it by anatomy term. So the above would query for all data that shows gene expression in brain. Now I want to get a set of related data for each anatomy term via the join. Is there any way to get the related data for only anatomy terms in the first 1000 expression data documents (fromIndex)? The reason is because there could be millions of data documents (fromIndex), and we process them in batches to load a visualization of the query results. Doing the join on all the results for each batch I process is becoming a bottleneck for large sets of data. Thanks, -Kevin The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: SolrCloud copy the index to another cluster.
I'm currently playing around with Solr Cloud migration strategies, too. I'm wondering... when you say zero downtime, do you mean zero *read* downtime, or zero downtime altogether? Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Jun 24, 2014 at 1:43 PM, heaven aheave...@gmail.com wrote: I've just realized that old and new clusters do use different installations, configs and lib paths. So the nodes from the new cluster will probably simply refuse to start using configs from the old zookeper. Only if there is a way to run them with their own zookeper and then manually add as replicas to the old cluster, so old and new clusters keep using their own zookepers. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-copy-the-index-to-another-cluster-tp4143759p4143769.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud copy the index to another cluster.
Zero read would be enough, we can safely stop index updates for a while. But have some API endpoints, where read downtime is very undesirable. Best, Alex -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-copy-the-index-to-another-cluster-tp4143759p4143795.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Block Join Not Working - what am I doing wrong?
I wonder what can be wrong there.. it works for me absolutely fine proofpic http://postimg.org/image/51qrsm48p/ query http://localhost:8983/solr/collection1/select?q=%7B!parent+which%3D%22content_type%3AparentDocument%22%7DATTRIBUTES.STATE%3ATXwt=jsonindent=truedebugQuery=true gives response:{numFound:1,start:0,docs:[ { id:1, content_type:[parentDocument], _version_:1471814208097091584}] }, I just can wish you good luck, I can't help. Two minor notes: I declared the field explicitly in schema.xml it might help to you if you didn't it yet.. I r'lly dunno field name=ATTRIBUTES.STATE type=string indexed=true stored=true required=false multiValued=true / Just a hint to debug block join is using wt=csv that shows the block alignment pretty well. On Tue, Jun 24, 2014 at 10:38 PM, Vinay B, vybe3...@gmail.com wrote: Michael, try this, Thanks https://www.dropbox.com/s/074p0wpjz916d78/test_core.tar.gz On Tue, Jun 24, 2014 at 1:16 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Vinay, pls upload your index dir somewhere, I can try to check what's wrong with it. On Tue, Jun 24, 2014 at 9:43 PM, Vinay B, vybe3...@gmail.com wrote: Hi, Yes, the query ATTRIBUTES.STATE:TX returns the child doc (see response below) . Is there something else that I'm missing to link the parent and the child? I followed your advice from my last thread and used a block join in this attempt, but still don't see how the parent and child realize their association. We're using solr 4.8.1 Thanks Query Response { responseHeader:{ status:0, QTime:0, params:{ indent:true, q:ATTRIBUTES.STATE:TX, wt:json}}, response:{numFound:1,start:0,docs:[ { id:1-A, ATTRIBUTES.STATE:[LA, TX]}] }} Raw doc dump { responseHeader:{ status:0, QTime:0, params:{ indent:true, q:*:*, wt:json}}, response:{numFound:2,start:0,docs:[ { id:1-A, ATTRIBUTES.STATE:[LA, TX]}, { id:1, content_type:parentDocument, _version_:1471814208097091584}] }} On Tue, Jun 24, 2014 at 10:45 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: did you run the underneath query ATTRIBUTES. STATE:TX. does it return anything? On Tue, Jun 24, 2014 at 6:59 PM, Vinay B, vybe3...@gmail.com wrote: Okay, Let me try again. 1. Here is some sample SolrJ code that creates a parent and child document (I hope) https://gist.github.com/anonymous/d03747661ef03923de74 2. I tried a block join query which didn't return any results (I tried the Block Join Parent Query Parser approach described in this link https://cwiki.apache.org/confluence/display/solr/Other+Parsers). I expected to get back the parent doc of a child which has ATTRIBUTES.STATE:TX, which I did not , That is what I'm trying to figure out. Thanks http://localhost:8088/solr/test_core/select?q={!parent which=content_type:parentDocument}ATTRIBUTES.STATE:TXwt=jsonindent=true ( equivalent to http://localhost:8088/solr/test_core/select?q=%7b!parent+which%3d%22content_type%3aparentDocument%22%7dATTRIBUTES.STATE%3aTX%26wt%3djson%26indent%3dtrue ) Resulting in response lst name=responseHeader int name=status0/int int name=QTime1/int lst name=params str name=q {!parent which=content_type:parentDocument}ATTRIBUTES.STATE:TXwt=jsonindent=true /str /lst /lst result name=response numFound=0 start=0/ /response On Mon, Jun 23, 2014 at 4:04 PM, Erick Erickson erickerick...@gmail.com wrote: Well, what do you mean by not working? You might review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Mon, Jun 23, 2014 at 12:20 PM, Vinay B, vybe3...@gmail.com wrote: Hi, I've been trying to experiment with block joins and parent / child docs as described in this thread (input described in my first post of the thread, .. and block join in my second post, as per the suggestions given). What else am I missing? Thanks http://lucene.472066.n3.nabble.com/Why-aren-t-my-nested-documents-nesting-tt4142702.html#none -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com
Re: SolrCloud copy the index to another cluster.
So what I'm playing with now is creating a new collection on the target cluster, turning off the target cluster, wiping the indexes, and manually just copying the indexes over to the correct directories and starting again. In the middle, you can run an optimize or use the Lucene index upgrader tool to bring yourself up to the new version. Part of this for me is a migration to HDFSDirectory so there's an added level of complication there. I would assume that since you only need to preserve reads, you could cut over once your collections were created on the new cloud? Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Jun 24, 2014 at 3:25 PM, heaven aheave...@gmail.com wrote: Zero read would be enough, we can safely stop index updates for a while. But have some API endpoints, where read downtime is very undesirable. Best, Alex -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-copy-the-index-to-another-cluster-tp4143759p4143795.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Block Join Not Working - what am I doing wrong?
Thanks, I Figured it out based on your last response, I mistakenly UUencoded the wt=json and indent = true when i manufacturing the request. %26wt%3djson%26indent%3dtrue Incidentally, this translates to {!parent which=content_type:parentDocument}ATTRIBUTES.STATE:TXwt=jsonindent=true and returns an xml response (malformed) For your correct request (expressed as xml), the query looks like this {!parent which=content_type:parentDocument}ATTRIBUTES.STATE:TX In any case, I'll write up a practical HOW TO using SOLRJ for the benefit of the community. On Tue, Jun 24, 2014 at 2:29 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: I wonder what can be wrong there.. it works for me absolutely fine proofpic http://postimg.org/image/51qrsm48p/ query http://localhost:8983/solr/collection1/select?q=%7B!parent+which%3D%22content_type%3AparentDocument%22%7DATTRIBUTES.STATE%3ATXwt=jsonindent=truedebugQuery=true gives response:{numFound:1,start:0,docs:[ { id:1, content_type:[parentDocument], _version_:1471814208097091584}] }, I just can wish you good luck, I can't help. Two minor notes: I declared the field explicitly in schema.xml it might help to you if you didn't it yet.. I r'lly dunno field name=ATTRIBUTES.STATE type=string indexed=true stored=true required=false multiValued=true / Just a hint to debug block join is using wt=csv that shows the block alignment pretty well. On Tue, Jun 24, 2014 at 10:38 PM, Vinay B, vybe3...@gmail.com wrote: Michael, try this, Thanks https://www.dropbox.com/s/074p0wpjz916d78/test_core.tar.gz On Tue, Jun 24, 2014 at 1:16 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Vinay, pls upload your index dir somewhere, I can try to check what's wrong with it. On Tue, Jun 24, 2014 at 9:43 PM, Vinay B, vybe3...@gmail.com wrote: Hi, Yes, the query ATTRIBUTES.STATE:TX returns the child doc (see response below) . Is there something else that I'm missing to link the parent and the child? I followed your advice from my last thread and used a block join in this attempt, but still don't see how the parent and child realize their association. We're using solr 4.8.1 Thanks Query Response { responseHeader:{ status:0, QTime:0, params:{ indent:true, q:ATTRIBUTES.STATE:TX, wt:json}}, response:{numFound:1,start:0,docs:[ { id:1-A, ATTRIBUTES.STATE:[LA, TX]}] }} Raw doc dump { responseHeader:{ status:0, QTime:0, params:{ indent:true, q:*:*, wt:json}}, response:{numFound:2,start:0,docs:[ { id:1-A, ATTRIBUTES.STATE:[LA, TX]}, { id:1, content_type:parentDocument, _version_:1471814208097091584}] }} On Tue, Jun 24, 2014 at 10:45 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: did you run the underneath query ATTRIBUTES. STATE:TX. does it return anything? On Tue, Jun 24, 2014 at 6:59 PM, Vinay B, vybe3...@gmail.com wrote: Okay, Let me try again. 1. Here is some sample SolrJ code that creates a parent and child document (I hope) https://gist.github.com/anonymous/d03747661ef03923de74 2. I tried a block join query which didn't return any results (I tried the Block Join Parent Query Parser approach described in this link https://cwiki.apache.org/confluence/display/solr/Other+Parsers). I expected to get back the parent doc of a child which has ATTRIBUTES.STATE:TX, which I did not , That is what I'm trying to figure out. Thanks http://localhost:8088/solr/test_core/select?q={!parent which=content_type:parentDocument}ATTRIBUTES.STATE:TXwt=jsonindent=true ( equivalent to http://localhost:8088/solr/test_core/select?q=%7b!parent+which%3d%22content_type%3aparentDocument%22%7dATTRIBUTES.STATE%3aTX%26wt%3djson%26indent%3dtrue ) Resulting in response lst name=responseHeader int name=status0/int int name=QTime1/int lst name=params str name=q {!parent which=content_type:parentDocument}ATTRIBUTES.STATE:TXwt=jsonindent=true /str /lst /lst result name=response numFound=0 start=0/ /response On Mon, Jun 23, 2014 at 4:04 PM, Erick Erickson erickerick...@gmail.com wrote: Well, what do you mean by not working? You might review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Mon, Jun
Clubbing queries with different criterias together?
Hi, I have a number of documents in single core getting inedxed from different sources with common properties but different values. Problem is while fetching from one set of documents, i need to use Raw Query Parameters as below. http://solrserver/solr/collection1/select?q=*%3A*wt=jsonindent=true_query_=%22AuthenticatedUserName=lalit%22 But for second set of documents, i need to use filter queries. http://solrserver/solr/collection1/select?q=*%3A*fq=alf_acls%3AGROUP_EVERYONEwt=jsonindent=true One way of getting all documents is to make two different queries and combine their results but i want to avoid two queries due to performance reasons as it will be double the load on system. Is there any way exist where i can use a single query and get results from both sets simultaneously? Thanks for help! -- View this message in context: http://lucene.472066.n3.nabble.com/Clubbing-queries-with-different-criterias-together-tp4143829.html Sent from the Solr - User mailing list archive at Nabble.com.
Questions about solr.SuggestComponent
Hi, I'm testing SuggestComponent so I came up with some questions. 1. How can I set the term frequency as the weightField? 2. Why does it only work with stored fields? How can I return the value resulted from my filter transformations (index)? Thanks! -- Sergio Roberto Charpinel Jr.
Re: Clubbing queries with different criterias together?
Hi Lalit, _query_ is a magic field name. Please see : http://searchhub.org/2009/03/31/nested-queries-in-solr/ What do you use _query_=AuthenticatedUserName=lalit ? It is simply ignored. Ahmet On Tuesday, June 24, 2014 11:34 PM, lalitjangra lalit.j.jan...@gmail.com wrote: Hi, I have a number of documents in single core getting inedxed from different sources with common properties but different values. Problem is while fetching from one set of documents, i need to use Raw Query Parameters as below. http://solrserver/solr/collection1/select?q=*%3A*wt=jsonindent=true_query_=%22AuthenticatedUserName=lalit%22 But for second set of documents, i need to use filter queries. http://solrserver/solr/collection1/select?q=*%3A*fq=alf_acls%3AGROUP_EVERYONEwt=jsonindent=true One way of getting all documents is to make two different queries and combine their results but i want to avoid two queries due to performance reasons as it will be double the load on system. Is there any way exist where i can use a single query and get results from both sets simultaneously? Thanks for help! -- View this message in context: http://lucene.472066.n3.nabble.com/Clubbing-queries-with-different-criterias-together-tp4143829.html Sent from the Solr - User mailing list archive at Nabble.com.
How to extend the behavior of a common text field (such as text_general) to recognize regex
This is easy if I only reqdefine a custom field to identify the desired patterns (numbers, in my case) For example, I could define a field thus: !-- A text field that identifies numberical entities-- fieldType name=text_num class=solr.TextField analyzer tokenizer class=solr.PatternTokenizerFactory pattern=\s*[0-9][0-9-]*[0-9]?\s* group=0/ /analyzer /fieldType Input: hello, world bye 123-45 abcd sdfssdf --- aaa Output: 123-45 , However, I also want to retain the behavio
Re: How to extend the behavior of a common text field (such as text_general) to recognize regex
Sorry, previous post got sent prematurely. Here is the complete post: This is easy if I only reqdefine a custom field to identify the desired patterns (numbers, in my case) For example, I could define a field thus: !-- A text field that identifies numberical entities-- fieldType name=text_num class=solr.TextField analyzer tokenizer class=solr.PatternTokenizerFactory pattern=\s*[0-9][0-9-]*[0-9]?\s* group=0/ /analyzer /fieldType Input: hello, world bye 123-45 abcd sdfssdf --- aaa Output: 123-45 , However, I also want to retain the behavior of the default text_general field , that is recognize the usual text tokens (hello, world, bye etc ...). What is the best way to achieve this. I've looked at PatternCaptureGroupFilterFactory ( http://lucene.apache.org/core/4_7_0/analyzers-common/org/apache/lucene/analysis/pattern/PatternCaptureGroupFilterFactory.html ) but I suspect that it too is subject to the behavior of the prior tokenizer (which for text_general is StandardTokenizerFactory ). Thanks
Re: Evaluate function only on subset of documents
Hi Chris, Thanks for your patience, I've now got a better image on how things work. I don't believe however that the two queries (the one with the post filter and the one without one) are equivalent. Suppose out of the whole document set: XXX returns documents 1,2,3. AAA returns documents 6,7,8. {!frange}customfunction returns documents 7,8. Running this query: XXX OR AAA AND {!frange ...} Matched documents are: (1,2,3) OR (6,7,8) AND (7,8) = (1,2,3) OR (7,8) = 1,2,3,7,8 With the post filter: q=XXX OR AAA fq={!frange cost=150 cache=false ...} Matched documents are: (1,2,3) OR (6,7,8) = (1,2,3,6,7,8) with post filter (7,8) = (7,8) I was hoping that the evaluation process would be short circuit. Document set: 1,2,3,4,5,6,7,8 Document id 1: Does it match XXX? Yes. Document matches query. Skip the second clause (AAA AND {!frange ...}) and evaluate next doc. Document id 2: Does it match XXX? Yes. Document matches query. Skip second clause and evaluate next doc. Document id 3: Does it match XXX? Yes. Document matches query. Skip second clause and evaluate next doc. Document id 4: Does it match XXX? No. Does it match AAA? No. Document does not match query. Skip frange and evaluate next doc. Document id 5: Does it match XXX? No. Does it match AAA? No. Document does not match query. Skip frange and evaluate next doc. Document id 6: Does it match XXX? No. Does it match AAA? Yes. Does it match frange? No. Document does not match query. [Only here the custom function would be evaluated first.] Document id 7: Does it match XXX? No. Does it match AAA? Yes. Does it match frange? Yes. Document matches query. Document id 8: Does it match XXX? No. Does it match AAA? Yes. Does it match frange? Yes. Document matches query. Returned documents: 1,2,3,7,8. So with this logic the custom function would be evaluated on documents 6,7,8 rather than on the whole set to see the smallest doc index, like you've described in your last email. I hope I'm not rambling. :-) Does it make sense? Costi On Tue, Jun 24, 2014 at 7:26 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Let's take this query sample: : XXX OR AAA AND {!frange ...} : : For my use case: : AAA returns a subset of 100k documents. : frange returns 5k documents, all part of these 100k documents. : : Therefore, frange skips the most documents. From what you are saying, : frange is going to be applied on all documents (since it skips the most : documents) and AAA is going to be applied on the subset. This is kind of : what I've originally noticed. My goal is to have this in reverse order, That's not exactly it ... there's no way for the query to know in advance how many documents it matches -- what BooleanQuery asks each clause is looking at the index, tell me the (internal) lucene docid of the first do you match. it then looks at the lowest matching docid of each clause, and the Occur property of the clause (MUST, MUST_NOT, SHOULD) to be able to tell if/when it can say things like clause AAA is mandatory but the lowest id it matches is doc# 8675 -- so it doesn't mater that clause XXX's lowest match is doc# 10 or that clause {!frange}'s lowest matche is doc# 100 it can then ask XXX and {!frange} to both skip ahead, and find lowest docid they each match that is no less then 8675, etc... from the perspective of {!frange} in particular, this means that on the first call it will evaluate itself against docid #0, #1, #2, etc... untill it finds a match. and on the secod call it will evaluate itself against docid #8675, 8676, etc... until it finds a match... : since frange is much more expensive than AAA. : I was hoping to do so by specifying the cost, saying that Hey, frange has There is no support for specifying cost on individual clauses instead of a BooleanQuery. But i really want to re-iterate, that even with the example you posted above you *still* don't need to nest your {!frange} instead of a boolean query -- what you have is this: XXX OR AAA AND {!frange ...} in which the {!frange ...} clause is completely mandatory -- so my previous point #2 still applies... : 2) based on the example you give, what you're trying to do here doesn't : really depend on using SHOULD (ie: OR) type logic against the frange: : the only disjunction you have is in a sub-query of a top level : conjunction (e: all required) ... the frange itself is still mandatory. : : so you could still use it as a non-cached postfilter just like in your : previous example: q=XXX OR AAA fq={!frange cost=150 cache=false ...} -Hoss http://www.lucidworks.com/
Re: Evaluate function only on subset of documents
: I don't believe however that the two queries (the one with the post filter : and the one without one) are equivalent. : : Suppose out of the whole document set: : XXX returns documents 1,2,3. : AAA returns documents 6,7,8. : {!frange}customfunction returns documents 7,8. : : Running this query: : XXX OR AAA AND {!frange ...} : Matched documents are: : (1,2,3) OR (6,7,8) AND (7,8) = (1,2,3) OR (7,8) = 1,2,3,7,8 Did you actually test out that specific example, because those results don't make sense to me given how the parser deals with multiple AND and OR keywords in a single BooleanQuery (which is why i hate AND and OR and advise anyone who will listen to never use them)... http://searchhub.org//2011/12/28/why-not-and-or-and-not/ $ curl -sS 'http://localhost:8983/solr/select?q=xxx%20OR%20AaA%20AND%20zZzdebug=querywt=jsonindent=true' | grep xxx q:xxx OR AaA AND zZz, rawquerystring:xxx OR AaA AND zZz, querystring:xxx OR AaA AND zZz, parsedquery:text:xxx +text:aaa +text:zzz, parsedquery_toString:text:xxx +text:aaa +text:zzz, Based on your walk through of the logic you'd like to have, it seems like the query you ment to write is something like this... XXX (+AAA +{!frange ...}) ...aka... XXX OR (AAA AND {!frange ...}) ...in which case i'm afraid i don't have many good suggestions for you on how to minimize the number of times the function is called to eliminate any doc that already matches XXX (or to force it to check AAA first) Looking at one of your specific examples... : Document id 1: : Does it match XXX? Yes. Document matches query. Skip the second clause (AAA : AND {!frange ...}) and evaluate next doc. ...this type of skipping fundementally can't happen with a BooleanQuery because of the way scoring works in lucene -- even if it matches the XXX clause, the other clauses will still be consulted to determine what the total score will be -- all SHOULD and MUST clauses that match contribute to the final score. : I hope I'm not rambling. :-) : Does it make sense? You're not rambling -- there's ust no general way to force the kind of check this last optimization you're hoping for in all cases, and even if there was, it wouldn't help you as much as you might think because of the scoring. -Hoss http://www.lucidworks.com/
Re: Trouble with TrieDateFields
: I am upgrading an index from Solr 3.6 to 4.2.0. : Everything has been picked up except for the old DateFields. Just to be crystal clear: 1) 4.2 is alreayd over a year old. the current rleease of Solr is 4.8, and 4.9 will most likeley be available within a day or two 2) Even in 4.9, solr.DateField still exists -- it has been deprecated and removed from the example schema, and will not be supported in 5.0, but just because you are upgrading to 4.x doesn't mean you have to stop using solr.DateField if it currently works for you. : I read some posts that due to the extra functionality of the : TrieDateField you would need to re-index for those fields. It's not a question of extra functionality -- the internal representation of the dates in the index is completley different. : To avoid re-indexing I was trying to do a Partial Update : (http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/), you can't use partial updates to work arround a problem like this -- partial updates only work if the stored values in the index can be read by solr, the modified by the update command, and then written back out. but upgrading from 3.6, and try to trick solr by changing a solr.DateField to a solr.TrieDateField just fundementally won't work. Solr' won't be able to correctly read the stored date fields to return them, let alone modify them and write them back to hte index. If you really can't re-index from scratch, and all of your fields are in fact stored in your 3.6 index, and you really wnat to switch to using TrieDateField, then your best option is to fetch every doc from your 3.6 solr instance (like you were doing with your partial updates approach but pull back every field) and then push each doc to a *new* 4.x instance you've setup with the updated schema.xml using TrieDateField. -Hoss http://www.lucidworks.com/
Re: How to extend the behavior of a common text field (such as text_general) to recognize regex
What about copyField'ing the content into the second field where you apply the alternative processing. Than eDismax searching both. Don't have to store the other field, just index. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Wed, Jun 25, 2014 at 5:55 AM, Vinay B, vybe3...@gmail.com wrote: Sorry, previous post got sent prematurely. Here is the complete post: This is easy if I only reqdefine a custom field to identify the desired patterns (numbers, in my case) For example, I could define a field thus: !-- A text field that identifies numberical entities-- fieldType name=text_num class=solr.TextField analyzer tokenizer class=solr.PatternTokenizerFactory pattern=\s*[0-9][0-9-]*[0-9]?\s* group=0/ /analyzer /fieldType Input: hello, world bye 123-45 abcd sdfssdf --- aaa Output: 123-45 , However, I also want to retain the behavior of the default text_general field , that is recognize the usual text tokens (hello, world, bye etc ...). What is the best way to achieve this. I've looked at PatternCaptureGroupFilterFactory ( http://lucene.apache.org/core/4_7_0/analyzers-common/org/apache/lucene/analysis/pattern/PatternCaptureGroupFilterFactory.html ) but I suspect that it too is subject to the behavior of the prior tokenizer (which for text_general is StandardTokenizerFactory ). Thanks
Re: DIH on Solr
Check out the HBase Indexer http://ngdata.github.io/hbase-indexer/ Wolfgang. On Jun 24, 2014, at 3:55 AM, Ahmet Arslan iori...@yahoo.com.INVALID wrote: Hi, There is no DataSource or EntityProcessor for HBase, I think. May be http://www.lilyproject.org/lily/index.html works for you? AHmet On Tuesday, June 24, 2014 1:27 PM, atp annamalai...@hcl.com wrote: Hi experts, We have a requirement to import the data from hbase tables using solr, we have tried with help of Dataimporthandler, we couldn't find the configuration streps or document for dataimporthandler for HBASE, can anybody please share the steps to configure, we tried with basic configuration but while select full import its throwing error , please share the docs or links to configure DIH for hbase table. 6/24/2014 3:44:00 PM WARN ZKPropertiesWriter Could not read DIH properties from /configs/collection1/dataimport.properties :class org.apache.zookeeper.KeeperException$NoNodeException 6/24/2014 3:44:00 PM ERROR DataImporter Full Import failed:java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to load EntityProcessor implementation for entity:msg Processing Document # 1 Thanks in Advance -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-on-Solr-tp4143669.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Double cast exception with grouping and sort function
: I recently tried upgrading our setup from 4.5.1 to 4.7+, and I'm : seeing an exception when I use (1) a function to sort and (2) result : grouping. The same query works fine with either (1) or (2) alone. : Example below. Did you modify your schema in any way when upgrading? Can you provide some sample data to demonstrates the problem? (ideally using the 4.x example configs - but if you can't reproduce with that then providing your own configs would be helpful) I was unabled to reproduce doing a quick sanity check using the example with a shard param to force a distrib query... http://localhost:8983/solr/select?q=*:*shards=localhost:8983/solrsort=sum%281,1%29%20descgroup=truegroup.field=inStock It's possible that the distributed grouping code has a bug in it related to the marshalling of sort values and i'm just not tickling that bug with my quick check ... but if i remember correctly work was done to fix grouped sorting to correctly deal with this when FieldType.marshalSortValue was introduced. : Example (v4.8.1): : { : responseHeader: { : status: 500, : QTime: 14, : params: { : sort: sum(1,1) desc, : indent: true, : q: title:solr, : _: 1403586036335, : group.field: type, : group: true, : wt: json : } : }, : error: { : msg: java.lang.Double cannot be cast to org.apache.lucene.util.BytesRef, : trace: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: : java.lang.Double cannot be cast to org.apache.lucene.util.BytesRef : code: 500 : } : } : : From the log: : : org.apache.solr.common.SolrException; : null:java.lang.ClassCastException: java.lang.Double cannot be cast to : org.apache.lucene.util.BytesRef : at org.apache.solr.schema.FieldType.marshalStringSortValue(FieldType.java:981) : at org.apache.solr.schema.TextField.marshalSortValue(TextField.java:176) : at org.apache.solr.search.grouping.distributed.shardresultserializer.SearchGroupsResultTransformer.serializeSearchGroup(SearchGroupsResultTransformer.java:125) : at org.apache.solr.search.grouping.distributed.shardresultserializer.SearchGroupsResultTransformer.transform(SearchGroupsResultTransformer.java:65) : at org.apache.solr.search.grouping.distributed.shardresultserializer.SearchGroupsResultTransformer.transform(SearchGroupsResultTransformer.java:43) : at org.apache.solr.search.grouping.CommandHandler.processResult(CommandHandler.java:193) -Hoss http://www.lucidworks.com/
Does updating a child document destroy the parent - child relationship
When I edit a child document, a block join query for the parent no longer returns any hits. I thought I read that this was the way things worked but needed to know for sure. If so, is there any other way to achieve this functionality (I can deal with creating the child doc with the parent, but would like to edit it separately). My rough prototype code is at https://github.com/balamuru/SolrChildDocs and the code in question is commented out in https://github.com/balamuru/SolrChildDocs/blob/master/src/main/java/com/vgb/solr/SolrApp.java Thanks
Re: fq= more then one ?
good. -- View this message in context: http://lucene.472066.n3.nabble.com/fq-more-then-one-tp959849p4143943.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Does updating a child document destroy the parent - child relationship
Block join is a very specialized feature of Solr - it requires that creation and update of the parent and all children be done as a single update operation for all of the documents. So... you cannot update a child document by itself, but need to update the entire block. Unfortunately, this limitation does not appear to be documented in the Solr ref guide. -- Jack Krupansky -Original Message- From: Vinay B, Sent: Tuesday, June 24, 2014 10:40 PM To: solr-user Subject: Does updating a child document destroy the parent - child relationship When I edit a child document, a block join query for the parent no longer returns any hits. I thought I read that this was the way things worked but needed to know for sure. If so, is there any other way to achieve this functionality (I can deal with creating the child doc with the parent, but would like to edit it separately). My rough prototype code is at https://github.com/balamuru/SolrChildDocs and the code in question is commented out in https://github.com/balamuru/SolrChildDocs/blob/master/src/main/java/com/vgb/solr/SolrApp.java Thanks
Re: Solr on S3FileSystem, Kosmos, GlusterFS, etc….
Hi Solr ! I got this working . Here's how : With the example jetty runner, you can Extract the tarball, and go to the examples/ directory, where you can launch an embedded core. Then, find the solrconfig.xml file. Edit it to contain the following xml: directoryFactory name=DirectoryFactory class=org.apache.solr.core.HdfsDirectoryFactory str name=solr.hdfs.homemyhcfs:///solr/str str name=solr.hdfs.confdir/etc/hadoop/conf/str /directoryFactory the confdir is important: That is where you will have something like a core-site.xml that defines all the parameters for your filesystem (fs.defaultFS, fs.mycfs.impl…. and so on). This tells solr, when launched, to use myhcfs as the underlying file store. You also should make sure that the jar for your plugin (in our case glisters, but hadoop will reference it by looking up the dynamically generated parameters that come from the base uri myhcfs… classes are on the class path, and the hadoop-common jar is also there (Some HCFS shims will need FilterFileSystem to run correctly, which is only in hadoop-common.jar). So - how to modify the running sold core's class path? To do so – you can update the solrconfig.xml jar directives. There are a bunch of regular expression templates you can modify in the examples/.../solrconfig.xml file. You can also copy the jars in at runtime, to be really safe. Once your example core with gluster configuration is setup, launch it with the following properties: java -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs -Dsolr.data.dir=glusterfs:///solr -Dsolr.updatelog=glusterfs:///solr -Dlog4j.configuration=file:/opt/solr-4.4.0-cdh5.0.2/example/etc/logging.properties -jar start.jar This starts a basic SOLR server on port 8983. If you are running from the simple jetty based examples which I've used to describe this above, then you should see the collection1 core up and running, and you should see its index sitting inside the /solr directory of your file system. Hope this helps those interested in expanding the use of SolrCloud outside of a single FS. On Jun 23, 2014, at 6:16 PM, Jay Vyas jayunit100.apa...@gmail.com wrote: Hi folks. Does anyone deploy solr indices on other HCFS implementations (S3FileSystem, for example) regularly ? If so I'm wondering 1) Where are the docs for doing this - or examples? Seems like everything, including parameter names for dfs setup, are based around hdfs. Maybe I should file a JIRA similar to https://issues.apache.org/jira/browse/FLUME-2410 (to make the generic deployment of SOLR on any file system explicit / obvious). 2) if there are any interesting requirements (i.e. createNonRecursive, Atomic mkdirs, sharing, blocking expectations etc etc) which need to be implemented
Re: Solr on S3FileSystem, Kosmos, GlusterFS, etc….
I've always been under the impression that file-system-access-speed is crucial for Lucene-based storage and have always advocated to not use NFS for that (for which we had slowness of a factor of 5 approximately). Has there any performance measurement made for such a setting? Is FS-caching suddenly getting so much better that it is not a problem. Also, as far as I know S3 bills by the amount of (giga-)bytes exchanged…. this gives plenty of room but if each starts needs to exchange a big part of the index from the storage to the solr server because of cache filling, it looks like it won't be that cheap. thanks for experience report. paul On 25 juin 2014, at 07:16, Jay Vyas jayunit100.apa...@gmail.com wrote: Hi Solr ! I got this working . Here's how : With the example jetty runner, you can Extract the tarball, and go to the examples/ directory, where you can launch an embedded core. Then, find the solrconfig.xml file. Edit it to contain the following xml: directoryFactory name=DirectoryFactory class=org.apache.solr.core.HdfsDirectoryFactory str name=solr.hdfs.homemyhcfs:///solr/str str name=solr.hdfs.confdir/etc/hadoop/conf/str /directoryFactory the confdir is important: That is where you will have something like a core-site.xml that defines all the parameters for your filesystem (fs.defaultFS, fs.mycfs.impl…. and so on). This tells solr, when launched, to use myhcfs as the underlying file store. You also should make sure that the jar for your plugin (in our case glisters, but hadoop will reference it by looking up the dynamically generated parameters that come from the base uri myhcfs… classes are on the class path, and the hadoop-common jar is also there (Some HCFS shims will need FilterFileSystem to run correctly, which is only in hadoop-common.jar). So - how to modify the running sold core's class path? To do so – you can update the solrconfig.xml jar directives. There are a bunch of regular expression templates you can modify in the examples/.../solrconfig.xml file. You can also copy the jars in at runtime, to be really safe. Once your example core with gluster configuration is setup, launch it with the following properties: java -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs -Dsolr.data.dir=glusterfs:///solr -Dsolr.updatelog=glusterfs:///solr -Dlog4j.configuration=file:/opt/solr-4.4.0-cdh5.0.2/example/etc/logging.properties -jar start.jar This starts a basic SOLR server on port 8983. If you are running from the simple jetty based examples which I've used to describe this above, then you should see the collection1 core up and running, and you should see its index sitting inside the /solr directory of your file system. Hope this helps those interested in expanding the use of SolrCloud outside of a single FS. On Jun 23, 2014, at 6:16 PM, Jay Vyas jayunit100.apa...@gmail.com wrote: Hi folks. Does anyone deploy solr indices on other HCFS implementations (S3FileSystem, for example) regularly ? If so I'm wondering 1) Where are the docs for doing this - or examples? Seems like everything, including parameter names for dfs setup, are based around hdfs. Maybe I should file a JIRA similar to https://issues.apache.org/jira/browse/FLUME-2410 (to make the generic deployment of SOLR on any file system explicit / obvious). 2) if there are any interesting requirements (i.e. createNonRecursive, Atomic mkdirs, sharing, blocking expectations etc etc) which need to be implemented