Re: SolrTestCaseJ4 Error: java.lang.RuntimeException: java.io.IOException: Can't find resource...
Hi All, I'm trying to run a simple piece of code, to get SolrTestCaseJ4 to work. Here's my code: public class MyTest extends SolrTestCaseJ4 { @BeforeClass public static void init() throws Exception { initCore(solrconfig.xml, schema.xml); lrf = h.getRequestFactory(standard, 0, 20); } @Test public void testNothing() { } } I have the required solrconfig.xml and schema.xml inside *./src/test/resources/solr/collection1/conf* However, when I run a test on testNothing() method, I get the following error: *java.lang.RuntimeException: java.io.IOException: Can't find resource* 'rs_A_count_gte300k.txt' in classpath or '/Users/nishantkelkar/IdeaProjects/k2/solr-component/target/test-classes/solr/collection1/conf' at __randomizedtesting.SeedInfo.seed([BB606BF344F3401]:0) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:169) at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69) at org.apache.solr.util.TestHarness.init(TestHarness.java:98) at org.apache.solr.SolrTestCaseJ4.createCore(SolrTestCaseJ4.java:472) at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:464) at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:273) at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:260) What could the issue be over here? *P.S:* This is my first question on this mailing list. Pardon me if I haven't stuck to some convention everyone follows here! Best Regards, Nishant Kelkar
Re: SolrTestCaseJ4 Error: java.lang.RuntimeException: java.io.IOException: Can't find resource...
As an additional issue related to the one above, I sometimes also get this error (and it's pretty random, the times that I get it): *java.lang.AssertionError: fix your classpath to have tests-framework.jar before lucene-core.jar* at __randomizedtesting.SeedInfo.seed([50225DA1F52F32BB]:0) at org.apache.lucene.util.TestRuleSetupAndRestoreClassEnv.before(TestRuleSetupAndRestoreClassEnv.java:189) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at java.lang.Thread.run(Thread.java:745) I found a related post here http://stackoverflow.com/questions/25721320/assertion-error-when-running-solrtestcasej4-tests-fix-your-classpath-to-have, but unfortunately, the accepted answer isn't clear enough to me. Any pointers as to how to fix this would be helpful too. Thank you! Best Regards, Nishant Kelkar On Thu, Nov 27, 2014 at 12:45 AM, Nishant Kelkar nishant@gmail.com wrote: Hi All, I'm trying to run a simple piece of code, to get SolrTestCaseJ4 to work. Here's my code: public class MyTest extends SolrTestCaseJ4 { @BeforeClass public static void init() throws Exception { initCore(solrconfig.xml, schema.xml); lrf = h.getRequestFactory(standard, 0, 20); } @Test public void testNothing() { } } I have the required solrconfig.xml and schema.xml inside *./src/test/resources/solr/collection1/conf* However, when I run a test on testNothing() method, I get the following error: *java.lang.RuntimeException: java.io.IOException: Can't find resource* 'rs_A_count_gte300k.txt' in classpath or '/Users/nishantkelkar/IdeaProjects/k2/solr-component/target/test-classes/solr/collection1/conf' at __randomizedtesting.SeedInfo.seed([BB606BF344F3401]:0) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:169) at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69) at org.apache.solr.util.TestHarness.init(TestHarness.java:98) at org.apache.solr.SolrTestCaseJ4.createCore(SolrTestCaseJ4.java:472) at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:464) at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:273) at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:260) What could the issue be over here? *P.S:* This is my first question on this mailing list. Pardon me if I haven't stuck to some convention everyone follows here! Best Regards, Nishant Kelkar
Re: SolrTestCaseJ4 Error: java.lang.RuntimeException: java.io.IOException: Can't find resource...
Seems like I've resolved these issues: 1. A text search for rs_A_count_gte300k.txt throughout my IntelliJ project revealed that a file by that name was being expected by my schema.xml (thank you, blind copy/pasting). After removing the conflicting fields and a few other fields for which I didn't have data files for, I got the test to work. 2. For the second issue, I've updated this http://stackoverflow.com/questions/25721320/assertion-error-when-running-solrtestcasej4-tests-fix-your-classpath-to-have/27166848#27166848 post with my answer, to better explain the solution. Best Regards, Nishant Kelkar On Thu, Nov 27, 2014 at 12:52 AM, Nishant Kelkar nishant@gmail.com wrote: As an additional issue related to the one above, I sometimes also get this error (and it's pretty random, the times that I get it): *java.lang.AssertionError: fix your classpath to have tests-framework.jar before lucene-core.jar* at __randomizedtesting.SeedInfo.seed([50225DA1F52F32BB]:0) at org.apache.lucene.util.TestRuleSetupAndRestoreClassEnv.before(TestRuleSetupAndRestoreClassEnv.java:189) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at java.lang.Thread.run(Thread.java:745) I found a related post here http://stackoverflow.com/questions/25721320/assertion-error-when-running-solrtestcasej4-tests-fix-your-classpath-to-have, but unfortunately, the accepted answer isn't clear enough to me. Any pointers as to how to fix this would be helpful too. Thank you! Best Regards, Nishant Kelkar On Thu, Nov 27, 2014 at 12:45 AM, Nishant Kelkar nishant@gmail.com wrote: Hi All, I'm trying to run a simple piece of code, to get SolrTestCaseJ4 to work. Here's my code: public class MyTest extends SolrTestCaseJ4 { @BeforeClass public static void init() throws Exception { initCore(solrconfig.xml, schema.xml); lrf = h.getRequestFactory(standard, 0, 20); } @Test public void testNothing() { } } I have the required solrconfig.xml and schema.xml inside *./src/test/resources/solr/collection1/conf* However, when I run a test on testNothing() method, I get the following error: *java.lang.RuntimeException: java.io.IOException: Can't find resource* 'rs_A_count_gte300k.txt' in classpath or '/Users/nishantkelkar/IdeaProjects/k2/solr-component/target/test-classes/solr/collection1/conf' at __randomizedtesting.SeedInfo.seed([BB606BF344F3401]:0) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:169) at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69) at org.apache.solr.util.TestHarness.init(TestHarness.java:98) at org.apache.solr.SolrTestCaseJ4.createCore(SolrTestCaseJ4.java:472) at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:464) at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:273) at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:260) What could the issue be over here? *P.S:* This is my first question on this mailing list. Pardon me if I haven't stuck to some convention everyone follows here! Best Regards, Nishant Kelkar
Solr mlt doesn't return documents with exactly the same contents
I have two documents with ids aaa and bbb, and the titles of both documents are a black fox jumps over a red flower. I imported both documents, along with several other testing documents, two a core test. I want solr to return documents similar to document aaa, so I submited the following: http://localhost:8983/solr/test/select?q=id:aaamlt=truemlt.fl=title Solr returned some similar documents. However, document bbb, which should be the most similar document of aaa, was not in the mlt returned list. Any ideas how this could happen? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-mlt-doesn-t-return-documents-with-exactly-the-same-contents-tp4171284.html Sent from the Solr - User mailing list archive at Nabble.com.
Terms vector for multiple documents
I'm working with social media data. We have blog posts in our index - text + authors_id. Now we need to clusterize authors by their texts. We need to get term vector not for documents, but one vector per one author (for all authors documents). We can't get all documents and then unite 'em cause It'll take ages. And we can't just concat all posts in one mega-post per author (to have one document per author), cause our data grows every day and we receive new posts for authors. Can u suggest any solution? -- View this message in context: http://lucene.472066.n3.nabble.com/Terms-vector-for-multiple-documents-tp4171297.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Terms vector for multiple documents
Presumably requesting pivot facets returns what are you asking for. However, it takes a time. Overall problem seems like more suitable for Mahout, or (really sorry for mentioning it) Hadoop. On Thu, Nov 27, 2014 at 3:01 PM, Norgorn lsunnyd...@mail.ru wrote: I'm working with social media data. We have blog posts in our index - text + authors_id. Now we need to clusterize authors by their texts. We need to get term vector not for documents, but one vector per one author (for all authors documents). We can't get all documents and then unite 'em cause It'll take ages. And we can't just concat all posts in one mega-post per author (to have one document per author), cause our data grows every day and we receive new posts for authors. Can u suggest any solution? -- View this message in context: http://lucene.472066.n3.nabble.com/Terms-vector-for-multiple-documents-tp4171297.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Terms vector for multiple documents
Thanks, I'll learn about facets. Actually, we want to use Mahout, but it needs term vectors - so we faced the problem of receiving term vector for author from set of documents. Anyway the main reason of my question was the desire to learn, if I'm missing some simple solution, or not. So, thank u again. -- View this message in context: http://lucene.472066.n3.nabble.com/Terms-vector-for-multiple-documents-tp4171297p4171312.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TrieLongField not store large longs correctly
On Wed, Nov 26, 2014 at 10:38 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Looks like one of these: http://stackoverflow.com/questions/1379934/large-numbers-erroneously-rounded-in-javascript Yeah, that's what Brendan pointed to earlier in this thread. In the UI code, we just seem to be using JSON object's native functions. OH, the irony that one can't use JavaScript Object Notation in JavaScript w/o losing information! -Yonik http://heliosearch.org - native code faceting, facet functions, sub-facets, off-heap data
Re: SolrTestCaseJ4 Error: java.lang.RuntimeException: java.io.IOException: Can't find resource...
Thanks for closing this off _and_ providing info to others! Best, Erick On Thu, Nov 27, 2014 at 1:15 AM, Nishant Kelkar nishant@gmail.com wrote: Seems like I've resolved these issues: 1. A text search for rs_A_count_gte300k.txt throughout my IntelliJ project revealed that a file by that name was being expected by my schema.xml (thank you, blind copy/pasting). After removing the conflicting fields and a few other fields for which I didn't have data files for, I got the test to work. 2. For the second issue, I've updated this http://stackoverflow.com/questions/25721320/assertion-error-when-running-solrtestcasej4-tests-fix-your-classpath-to-have/27166848#27166848 post with my answer, to better explain the solution. Best Regards, Nishant Kelkar On Thu, Nov 27, 2014 at 12:52 AM, Nishant Kelkar nishant@gmail.com wrote: As an additional issue related to the one above, I sometimes also get this error (and it's pretty random, the times that I get it): *java.lang.AssertionError: fix your classpath to have tests-framework.jar before lucene-core.jar* at __randomizedtesting.SeedInfo.seed([50225DA1F52F32BB]:0) at org.apache.lucene.util.TestRuleSetupAndRestoreClassEnv.before(TestRuleSetupAndRestoreClassEnv.java:189) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at java.lang.Thread.run(Thread.java:745) I found a related post here http://stackoverflow.com/questions/25721320/assertion-error-when-running-solrtestcasej4-tests-fix-your-classpath-to-have, but unfortunately, the accepted answer isn't clear enough to me. Any pointers as to how to fix this would be helpful too. Thank you! Best Regards, Nishant Kelkar On Thu, Nov 27, 2014 at 12:45 AM, Nishant Kelkar nishant@gmail.com wrote: Hi All, I'm trying to run a simple piece of code, to get SolrTestCaseJ4 to work. Here's my code: public class MyTest extends SolrTestCaseJ4 { @BeforeClass public static void init() throws Exception { initCore(solrconfig.xml, schema.xml); lrf = h.getRequestFactory(standard, 0, 20); } @Test public void testNothing() { } } I have the required solrconfig.xml and schema.xml inside *./src/test/resources/solr/collection1/conf* However, when I run a test on testNothing() method, I get the following error: *java.lang.RuntimeException: java.io.IOException: Can't find resource* 'rs_A_count_gte300k.txt' in classpath or '/Users/nishantkelkar/IdeaProjects/k2/solr-component/target/test-classes/solr/collection1/conf' at __randomizedtesting.SeedInfo.seed([BB606BF344F3401]:0) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:169) at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69) at org.apache.solr.util.TestHarness.init(TestHarness.java:98) at org.apache.solr.SolrTestCaseJ4.createCore(SolrTestCaseJ4.java:472) at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:464) at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:273) at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:260) What could the issue be over here? *P.S:* This is my first question on this mailing list. Pardon me if I haven't stuck to some convention everyone follows here! Best Regards, Nishant Kelkar
confused about how to set a solr query timeout when using tomcat
I inherited a set of some old 1.4x Solrs running under tomcat6/java6 while I will eventually upgrade them to a more recent solr/tomcat/java, I am unable to do in near term one of my priority fixes tho is to implement some sort of timeout for solr queries that exceed 1000ms (or so); ie if the query takes longer than that, I want to abort that query (returning nothing or an error or whatever) so that solr can process other queries. while we have optimized our queries for an average 50ms response time, we do occasionally see some that can run between 10 and 100 seconds. I know that this version of Solr itself doesn't have a built in timeout mechanism, which leaves me with figuring out what to do (it seems to me that I have to figure out how to get Tomcat to timeout the queries somehow) note that I DID google until my fingers hurt and have not been able to find clear (at least not clear to me) instructions on how do to so Details: 1. the setup uses the DataImportHandler to updates Solr, and updates occur often and can be quite large; we use batchSize=1 and autoCommit=true with doc size being around 1400 to 1600 bytes. I dont want the timeout to kill the imports of course 2. I tried adding a timeout param to the tomcat configuration but it doesnt work: Connector port=quot;8086quot; protocol=quot;HTTP/1.1quot; connectionTimeout=quot;2quot; protocol=quot;HTTP/1.1quot; timeout=quot;1quot; / any thoughts?? can anyone point me in the right direction on how to implement this? any help appreciated. thx in advance -- View this message in context: http://lucene.472066.n3.nabble.com/confused-about-how-to-set-a-solr-query-timeout-when-using-tomcat-tp4171363.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: confused about how to set a solr query timeout when using tomcat
How big is the index (document count, gigabytes)? How much RAM is on the servers? How big is your Java heap? How are the servers hosted? AWS? Long queries are often caused by long-tail queries fetched from disk. There are several ways to speed these up, but they all use RAM or SSD. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Nov 27, 2014, at 12:56 PM, solr-user solr-u...@hotmail.com wrote: I inherited a set of some old 1.4x Solrs running under tomcat6/java6 while I will eventually upgrade them to a more recent solr/tomcat/java, I am unable to do in near term one of my priority fixes tho is to implement some sort of timeout for solr queries that exceed 1000ms (or so); ie if the query takes longer than that, I want to abort that query (returning nothing or an error or whatever) so that solr can process other queries. while we have optimized our queries for an average 50ms response time, we do occasionally see some that can run between 10 and 100 seconds. I know that this version of Solr itself doesn't have a built in timeout mechanism, which leaves me with figuring out what to do (it seems to me that I have to figure out how to get Tomcat to timeout the queries somehow) note that I DID google until my fingers hurt and have not been able to find clear (at least not clear to me) instructions on how do to so Details: 1. the setup uses the DataImportHandler to updates Solr, and updates occur often and can be quite large; we use batchSize=1 and autoCommit=true with doc size being around 1400 to 1600 bytes. I dont want the timeout to kill the imports of course 2. I tried adding a timeout param to the tomcat configuration but it doesnt work: Connector port=quot;8086quot; protocol=quot;HTTP/1.1quot; connectionTimeout=quot;2quot; protocol=quot;HTTP/1.1quot; timeout=quot;1quot; / any thoughts?? can anyone point me in the right direction on how to implement this? any help appreciated. thx in advance -- View this message in context: http://lucene.472066.n3.nabble.com/confused-about-how-to-set-a-solr-query-timeout-when-using-tomcat-tp4171363.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: confused about how to set a solr query timeout when using tomcat
millions of documents per shard, with a number of shards ~40gb index folder size 12gb of heap on a 16gb machine (this old Solr doesnt use O/S mem space like 4.x does) servers are hosted internally, and are powerful understood. as mentioned, we tuned the bulk of our queries to run very quickly (50ms or less), but we do occasionally see queries (ie internal ones for statistics/tests) that can be excessively long running Basically, we want to be able to enforce how long those long running queries are allowed to run -- View this message in context: http://lucene.472066.n3.nabble.com/confused-about-how-to-set-a-solr-query-timeout-when-using-tomcat-tp4171363p4171368.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr mlt doesn't return documents with exactly the same contents
Hey hhc, I am new to Solr, so pardon me if this throws you off. But I think the following piece of code is relevant to your problem from MoreLikeThisHandler#handleRequestBody(): // Find documents MoreLikeThis - either with a reader or a query // if (reader != null) { mltDocs = mlt.getMoreLikeThis(reader, start, rows, filters, interesting, flags); } else if (q != null) { // Matching options boolean includeMatch = params.getBool(MoreLikeThisParams.MATCH_INCLUDE, true); int matchOffset = params.getInt(MoreLikeThisParams.MATCH_OFFSET, 0); // Find the base match*DocList match = searcher.getDocList(query, null, null, matchOffset, 1, *flags); // only get the first one... if (includeMatch) { rsp.add(match, match); } // This is an iterator, but we only handle the first match* DocIterator iterator = match.iterator(); *if (iterator.hasNext()) { // do a MoreLikeThis query for each document in results *int id = iterator.nextDoc(); mltDocs = mlt.getMoreLikeThis(id, start, rows, filters, interesting, flags);* } } else { throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, MoreLikeThis requires either a query (?q=) or text to find similar documents.); } } finally { if (reader != null) { reader.close(); } } From the code in bold, it seems like it pulls the first document from the top 10 list (which is most likely your duplicate document, as it seems to be ranked by score), and issues an mlt query on that. As an experiment to verify this, you can try the following: 1. Add a *third* document, similar to aaa, let's say it's called ccc. 2. Issue the same query that you posted above: http://localhost:8983/solr/test/select?q=id:aaamlt=truemlt.fl=title 3. If you see document ccc in the results list, that confirms the above notion of mine. Let us know how it goes! Best Regards, Nishant Kelkar On Thu, Nov 27, 2014 at 2:33 AM, hhc hhchen1...@gmail.com wrote: I have two documents with ids aaa and bbb, and the titles of both documents are a black fox jumps over a red flower. I imported both documents, along with several other testing documents, two a core test. I want solr to return documents similar to document aaa, so I submited the following: http://localhost:8983/solr/test/select?q=id:aaamlt=truemlt.fl=title Solr returned some similar documents. However, document bbb, which should be the most similar document of aaa, was not in the mlt returned list. Any ideas how this could happen? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-mlt-doesn-t-return-documents-with-exactly-the-same-contents-tp4171284.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: confused about how to set a solr query timeout when using tomcat
solr-user [solr-u...@hotmail.com] wrote: while we have optimized our queries for an average 50ms response time, we do occasionally see some that can run between 10 and 100 seconds. That sounds suspicious. Response times so far from your average indicates that there is special processing going on, such as uninverting facet fields after an index update or garbage collection with a very large heap. In both of these cases, termination (if it were possible) would be undesirable as those jobs needs to be done. I know that this version of Solr itself doesn't have a built in timeout mechanism, which leaves me with figuring out what to do (it seems to me that I have to figure out how to get Tomcat to timeout the queries somehow) You can get tomcat to timeout, but it only breaks the connection with the client: The request is still processed in Solr and will end with an error entry in the log as it cannot deliver the result back to the client. Even if you were able to somehow kill the started thread from the outside, which Java does not support, it might leave the Solr structures in a problematic state: The only technically sane way to do time-based termination of a request is to build it into the application, i.e. Solr. - Toke Eskildsen
RE: confused about how to set a solr query timeout when using tomcat
yes, that solr queries continue to run the query on the solr server even after a connection is broken was my understanding and concern as well I was hoping I had overlooked or missed something in Solr or Tomcat documentation that might do the job it is unfortunate if anyone else can think of something, let me know -- View this message in context: http://lucene.472066.n3.nabble.com/confused-about-how-to-set-a-solr-query-timeout-when-using-tomcat-tp4171363p4171379.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Exception in unit tests for distributed search component
Is that the complete stack trace? There are multiple indexDoc methods in that class. Some of them assert that the response from control collection and the default collection are the same. However, in this case, it seems that an AssertionError is being sent from the server itself as a RemoteSolrException. Without more details about the test case and the server response, I can't say much. Maybe you should try printing out the response from the server to see what is being returned. On Wed, Nov 26, 2014 at 5:11 AM, Suchi Amalapurapu su...@bloomreach.com wrote: Hi I am trying to test a custom distributed component with solr 4.6.1 which extends BaseDistributedSearchTestCase but end up with the following error. There are lot of tests in the solr code base which extend BaseDistributedSearchTestCase. Not sure what is wrong here. Suchi testDistribSearch(com.test.DistributedTest) Time elapsed: 2.288 sec ERROR! org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: java.lang.AssertionError at __randomizedtesting.SeedInfo.seed([EB2AD095C59CFFE7:6ACC5E8DB2C39FDB]:0) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102) at org.apache.solr.BaseDistributedSearchTestCase.indexDoc(BaseDistributedSearchTestCase.java:436) -- Regards, Shalin Shekhar Mangar.
Re: Solr mlt doesn't return documents with exactly the same contents
Hi Nishant, Thank you for the reply. I believe that solr removes the first document from the mlt list because a document is most similar to itself and thus should be removed. In my case, aaa and bbb are two different documents. When search for documents similar to aaa, the document aaa should be removed from the list, but bbb should be kept. I did the experiment you suggested. Unfortunately, the document ccc was not in the mlt list. I modify the title of ccc to a somewhat different sentence a black fox jumps over a yellow flower, but the document ccc was not in the list either. :-( Anyone has any clues on this? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-mlt-doesn-t-return-documents-with-exactly-the-same-contents-tp4171284p4171382.html Sent from the Solr - User mailing list archive at Nabble.com.
Trying to get ALL scores from a previous search in a custom search component (last-components)
Hello, I am trying to implement a Rollup Search component on a version of SOLR that exists previously to the parent/child additions, so I am trying to implement my own. The searches will be executed exclusively against the child documents, and I want to “rollup” those child documents into the parent documents. The interface is going to allow the user to add the following parameters to the SOLR query: rollup=truerollup.parentField=idrollup.childField=parentId My code so far is below. What I have works so far, except my second parent query loses the order. I would like to be able to sort my parent query by the score of the previous child search. Perhaps I would take the highest score from all children (haven’t decided yet). My problem however is that I don’t know how I can get the score from all the hits in the original search, just what is returned. If my child query gets 10,000 hits, but only return 100 records, I can’t get all the scores I need. Does anyone have any recommendations? Thanks!! Darin //Loop through all the records and look for the parent reference field SetString parentRefs = new HashSetString(); DocIterator docSetIterator = rb.getResults().docSet.iterator(); while(docSetIterator.hasNext()){ int docInt = docSetIterator.next(); String fieldValues[] = rb.req.getSearcher().doc(docInt).getValues(childFieldName); for(String fieldValue : fieldValues){ if(fieldValue != null fieldValue.length() 0 !parentRefs.contains(fieldValue)){ parentRefs.add(fieldValue); } } } //Build a boolean query of term queries BooleanQuery parentQuery = new BooleanQuery(); IteratorString parentIdIterator = parentRefs.iterator(); while(parentIdIterator.hasNext()){ String parentId = parentIdIterator.next(); TermQuery termQuery = new TermQuery(new Term(parentFieldName, parentId)); parentQuery.add(termQuery, BooleanClause.Occur.SHOULD); } DocList parentList = searcher.getDocList(parentQuery, new ArrayListQuery(), null, 0, 100, 1); //TODO: use correct start/end/flags later... //Add parent results ResultContext resultContext = new ResultContext(); resultContext.docs = parentList; resultContext.query = parentQuery; rb.rsp.add(parents, resultContext); rb.rsp.getToLog().add(hits, parentList.matches());
Re: Solr mlt doesn't return documents with exactly the same contents
After carefully reading the mlt parameters here https://wiki.apache.org/solr/MoreLikeThis I found that I can specify the following parameters to return bbb when search for similar documents of aaa: mlt.mintf=1 mlt.mindf=2 Details: mlt.mintf: Minimum Term Frequency - the frequency below which terms will be ignored in the source doc. DEFAULT_MIN_TERM_FREQ = 2 mlt.mindf: Minimum Document Frequency - the frequency at which words will be ignored which do not occur in at least this many docs. DEFAULT_MIN_DOC_FREQ = 5 Hope this is helpful to those who are confused about the mlt returns. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-mlt-doesn-t-return-documents-with-exactly-the-same-contents-tp4171284p4171399.html Sent from the Solr - User mailing list archive at Nabble.com.