Re: SolrTestCaseJ4 Error: java.lang.RuntimeException: java.io.IOException: Can't find resource...

2014-11-27 Thread Nishant Kelkar
Hi All,

I'm trying to run a simple piece of code, to get SolrTestCaseJ4 to work.
Here's my code:

public class MyTest extends SolrTestCaseJ4 {


@BeforeClass
public static void init() throws Exception {
initCore(solrconfig.xml, schema.xml);
lrf = h.getRequestFactory(standard, 0, 20);
}

@Test
public void testNothing() {

}
}

I have the required solrconfig.xml and schema.xml inside
*./src/test/resources/solr/collection1/conf*

However, when I run a test on testNothing() method, I get the following
error:

*java.lang.RuntimeException: java.io.IOException: Can't find resource*
'rs_A_count_gte300k.txt' in classpath or
'/Users/nishantkelkar/IdeaProjects/k2/solr-component/target/test-classes/solr/collection1/conf'
at __randomizedtesting.SeedInfo.seed([BB606BF344F3401]:0)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:169)
at
org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)
at
org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)
at org.apache.solr.util.TestHarness.init(TestHarness.java:98)
at org.apache.solr.SolrTestCaseJ4.createCore(SolrTestCaseJ4.java:472)
at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:464)
at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:273)
at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:260)

What could the issue be over here?

*P.S:* This is my first question on this mailing list. Pardon me if I
haven't stuck to some convention everyone follows here!

Best Regards,
Nishant Kelkar


Re: SolrTestCaseJ4 Error: java.lang.RuntimeException: java.io.IOException: Can't find resource...

2014-11-27 Thread Nishant Kelkar
As an additional issue related to the one above, I sometimes also get this
error (and it's pretty random, the times that I get it):

*java.lang.AssertionError: fix your classpath to have tests-framework.jar
before lucene-core.jar*
at __randomizedtesting.SeedInfo.seed([50225DA1F52F32BB]:0)
at
org.apache.lucene.util.TestRuleSetupAndRestoreClassEnv.before(TestRuleSetupAndRestoreClassEnv.java:189)
at
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at java.lang.Thread.run(Thread.java:745)

I found a related post here
http://stackoverflow.com/questions/25721320/assertion-error-when-running-solrtestcasej4-tests-fix-your-classpath-to-have,
but unfortunately, the accepted answer isn't clear enough to me. Any
pointers as to how to fix this would be helpful too.

Thank you!

Best Regards,
Nishant Kelkar

On Thu, Nov 27, 2014 at 12:45 AM, Nishant Kelkar nishant@gmail.com
wrote:

 Hi All,

 I'm trying to run a simple piece of code, to get SolrTestCaseJ4 to work.
 Here's my code:

 public class MyTest extends SolrTestCaseJ4 {


 @BeforeClass
 public static void init() throws Exception {
 initCore(solrconfig.xml, schema.xml);
 lrf = h.getRequestFactory(standard, 0, 20);
 }

 @Test
 public void testNothing() {

 }
 }

 I have the required solrconfig.xml and schema.xml inside
 *./src/test/resources/solr/collection1/conf*

 However, when I run a test on testNothing() method, I get the following
 error:

 *java.lang.RuntimeException: java.io.IOException: Can't find resource*
 'rs_A_count_gte300k.txt' in classpath or
 '/Users/nishantkelkar/IdeaProjects/k2/solr-component/target/test-classes/solr/collection1/conf'
 at __randomizedtesting.SeedInfo.seed([BB606BF344F3401]:0)
 at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:169)
 at
 org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)
 at
 org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)
 at org.apache.solr.util.TestHarness.init(TestHarness.java:98)
 at org.apache.solr.SolrTestCaseJ4.createCore(SolrTestCaseJ4.java:472)
 at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:464)
 at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:273)
 at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:260)

 What could the issue be over here?

 *P.S:* This is my first question on this mailing list. Pardon me if I
 haven't stuck to some convention everyone follows here!

 Best Regards,
 Nishant Kelkar



Re: SolrTestCaseJ4 Error: java.lang.RuntimeException: java.io.IOException: Can't find resource...

2014-11-27 Thread Nishant Kelkar
Seems like I've resolved these issues:
1. A text search for rs_A_count_gte300k.txt throughout my IntelliJ
project revealed that a file by that name was being expected by my
schema.xml (thank you, blind copy/pasting). After removing the conflicting
fields and a few other fields for which I didn't have data files for, I got
the test to work.

2. For the second issue, I've updated this
http://stackoverflow.com/questions/25721320/assertion-error-when-running-solrtestcasej4-tests-fix-your-classpath-to-have/27166848#27166848
post with my answer, to better explain the solution.

Best Regards,
Nishant Kelkar

On Thu, Nov 27, 2014 at 12:52 AM, Nishant Kelkar nishant@gmail.com
wrote:

 As an additional issue related to the one above, I sometimes also get this
 error (and it's pretty random, the times that I get it):

 *java.lang.AssertionError: fix your classpath to have tests-framework.jar
 before lucene-core.jar*
 at __randomizedtesting.SeedInfo.seed([50225DA1F52F32BB]:0)
 at
 org.apache.lucene.util.TestRuleSetupAndRestoreClassEnv.before(TestRuleSetupAndRestoreClassEnv.java:189)
 at
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
 at
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
 at
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 at
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 at
 org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
 at
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 at java.lang.Thread.run(Thread.java:745)

 I found a related post here
 http://stackoverflow.com/questions/25721320/assertion-error-when-running-solrtestcasej4-tests-fix-your-classpath-to-have,
 but unfortunately, the accepted answer isn't clear enough to me. Any
 pointers as to how to fix this would be helpful too.

 Thank you!

 Best Regards,
 Nishant Kelkar

 On Thu, Nov 27, 2014 at 12:45 AM, Nishant Kelkar nishant@gmail.com
 wrote:

 Hi All,

 I'm trying to run a simple piece of code, to get SolrTestCaseJ4 to work.
 Here's my code:

 public class MyTest extends SolrTestCaseJ4 {


 @BeforeClass
 public static void init() throws Exception {
 initCore(solrconfig.xml, schema.xml);
 lrf = h.getRequestFactory(standard, 0, 20);
 }

 @Test
 public void testNothing() {

 }
 }

 I have the required solrconfig.xml and schema.xml inside
 *./src/test/resources/solr/collection1/conf*

 However, when I run a test on testNothing() method, I get the following
 error:

 *java.lang.RuntimeException: java.io.IOException: Can't find resource*
 'rs_A_count_gte300k.txt' in classpath or
 '/Users/nishantkelkar/IdeaProjects/k2/solr-component/target/test-classes/solr/collection1/conf'
 at __randomizedtesting.SeedInfo.seed([BB606BF344F3401]:0)
 at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:169)
 at
 org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)
 at
 org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)
 at org.apache.solr.util.TestHarness.init(TestHarness.java:98)
 at org.apache.solr.SolrTestCaseJ4.createCore(SolrTestCaseJ4.java:472)
 at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:464)
 at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:273)
 at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:260)

 What could the issue be over here?

 *P.S:* This is my first question on this mailing list. Pardon me if I
 haven't stuck to some convention everyone follows here!

 Best Regards,
 Nishant Kelkar





Solr mlt doesn't return documents with exactly the same contents

2014-11-27 Thread hhc
I have two documents with ids aaa and bbb, and the titles of both
documents are a black fox jumps over a red flower.  I imported both
documents, along with several other testing documents, two a core test.

I want solr to return documents similar to document aaa, so I submited the
following:

http://localhost:8983/solr/test/select?q=id:aaamlt=truemlt.fl=title

Solr returned some similar documents.  However, document bbb, which should
be the most similar document of aaa, was not in the mlt returned list. 
Any ideas how this could happen?  Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-mlt-doesn-t-return-documents-with-exactly-the-same-contents-tp4171284.html
Sent from the Solr - User mailing list archive at Nabble.com.


Terms vector for multiple documents

2014-11-27 Thread Norgorn
I'm working with social media data.
We have blog posts in our index - text + authors_id.
Now we need to clusterize authors by their texts. We need to get term vector
not for documents, but one vector per one author (for all authors
documents).

We can't get all documents and then unite 'em cause It'll take ages.

And we can't just concat all posts in one mega-post per author (to have  one
document per author), cause our data grows every day and we receive new
posts for authors.

Can u suggest any solution?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Terms-vector-for-multiple-documents-tp4171297.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Terms vector for multiple documents

2014-11-27 Thread Mikhail Khludnev
Presumably requesting pivot facets returns what are you asking for.
However, it takes a time. Overall problem seems like more suitable for
Mahout, or (really sorry for mentioning it) Hadoop.

On Thu, Nov 27, 2014 at 3:01 PM, Norgorn lsunnyd...@mail.ru wrote:

 I'm working with social media data.
 We have blog posts in our index - text + authors_id.
 Now we need to clusterize authors by their texts. We need to get term
 vector
 not for documents, but one vector per one author (for all authors
 documents).

 We can't get all documents and then unite 'em cause It'll take ages.

 And we can't just concat all posts in one mega-post per author (to have
 one
 document per author), cause our data grows every day and we receive new
 posts for authors.

 Can u suggest any solution?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Terms-vector-for-multiple-documents-tp4171297.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: Terms vector for multiple documents

2014-11-27 Thread Norgorn
Thanks, I'll learn about facets.

Actually, we want to use Mahout, but it needs term vectors - so we faced the
problem of receiving term vector for author from set of documents.

Anyway the main reason of my question was the desire to learn, if I'm
missing some simple solution, or not.
So, thank u again.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Terms-vector-for-multiple-documents-tp4171297p4171312.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: TrieLongField not store large longs correctly

2014-11-27 Thread Yonik Seeley
On Wed, Nov 26, 2014 at 10:38 PM, Alexandre Rafalovitch
arafa...@gmail.com wrote:
 Looks like one of these:
 http://stackoverflow.com/questions/1379934/large-numbers-erroneously-rounded-in-javascript

Yeah, that's what Brendan pointed to earlier in this thread.

 In the UI code, we just seem to be using JSON object's native functions.

OH, the irony that one can't use JavaScript Object Notation in
JavaScript w/o losing information!

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data


Re: SolrTestCaseJ4 Error: java.lang.RuntimeException: java.io.IOException: Can't find resource...

2014-11-27 Thread Erick Erickson
Thanks for closing this off _and_ providing info to others!

Best,
Erick

On Thu, Nov 27, 2014 at 1:15 AM, Nishant Kelkar nishant@gmail.com wrote:
 Seems like I've resolved these issues:
 1. A text search for rs_A_count_gte300k.txt throughout my IntelliJ
 project revealed that a file by that name was being expected by my
 schema.xml (thank you, blind copy/pasting). After removing the conflicting
 fields and a few other fields for which I didn't have data files for, I got
 the test to work.

 2. For the second issue, I've updated this
 http://stackoverflow.com/questions/25721320/assertion-error-when-running-solrtestcasej4-tests-fix-your-classpath-to-have/27166848#27166848
 post with my answer, to better explain the solution.

 Best Regards,
 Nishant Kelkar

 On Thu, Nov 27, 2014 at 12:52 AM, Nishant Kelkar nishant@gmail.com
 wrote:

 As an additional issue related to the one above, I sometimes also get this
 error (and it's pretty random, the times that I get it):

 *java.lang.AssertionError: fix your classpath to have tests-framework.jar
 before lucene-core.jar*
 at __randomizedtesting.SeedInfo.seed([50225DA1F52F32BB]:0)
 at
 org.apache.lucene.util.TestRuleSetupAndRestoreClassEnv.before(TestRuleSetupAndRestoreClassEnv.java:189)
 at
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
 at
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
 at
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 at
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 at
 org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
 at
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 at java.lang.Thread.run(Thread.java:745)

 I found a related post here
 http://stackoverflow.com/questions/25721320/assertion-error-when-running-solrtestcasej4-tests-fix-your-classpath-to-have,
 but unfortunately, the accepted answer isn't clear enough to me. Any
 pointers as to how to fix this would be helpful too.

 Thank you!

 Best Regards,
 Nishant Kelkar

 On Thu, Nov 27, 2014 at 12:45 AM, Nishant Kelkar nishant@gmail.com
 wrote:

 Hi All,

 I'm trying to run a simple piece of code, to get SolrTestCaseJ4 to work.
 Here's my code:

 public class MyTest extends SolrTestCaseJ4 {


 @BeforeClass
 public static void init() throws Exception {
 initCore(solrconfig.xml, schema.xml);
 lrf = h.getRequestFactory(standard, 0, 20);
 }

 @Test
 public void testNothing() {

 }
 }

 I have the required solrconfig.xml and schema.xml inside
 *./src/test/resources/solr/collection1/conf*

 However, when I run a test on testNothing() method, I get the following
 error:

 *java.lang.RuntimeException: java.io.IOException: Can't find resource*
 'rs_A_count_gte300k.txt' in classpath or
 '/Users/nishantkelkar/IdeaProjects/k2/solr-component/target/test-classes/solr/collection1/conf'
 at __randomizedtesting.SeedInfo.seed([BB606BF344F3401]:0)
 at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:169)
 at
 org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)
 at
 org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)
 at org.apache.solr.util.TestHarness.init(TestHarness.java:98)
 at org.apache.solr.SolrTestCaseJ4.createCore(SolrTestCaseJ4.java:472)
 at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:464)
 at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:273)
 at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:260)

 What could the issue be over here?

 *P.S:* This is my first question on this mailing list. Pardon me if I
 haven't stuck to some convention everyone follows here!

 Best Regards,
 Nishant Kelkar





confused about how to set a solr query timeout when using tomcat

2014-11-27 Thread solr-user
I inherited a set of some old 1.4x Solrs running under tomcat6/java6

while I will eventually upgrade them to a more recent solr/tomcat/java, I am
unable to do in near term

one of my priority fixes tho is to implement some sort of timeout for solr
queries that exceed 1000ms (or so); ie if the query takes longer than that,
I want to abort that query (returning nothing or an error or whatever) so
that solr can process other queries.  while we have optimized our queries
for an average 50ms response time, we do occasionally see some that can run
between 10 and 100 seconds.

I know that this version of Solr itself doesn't have a built in timeout
mechanism, which leaves me with figuring out what to do (it seems to me that
I have to figure out how to get Tomcat to timeout the queries somehow)

note that I DID google until my fingers hurt and have not been able to find
clear (at least not clear to me) instructions on how do to so 

Details:

1. the setup uses the DataImportHandler to updates Solr, and updates occur
often and can be quite large; we use batchSize=1 and autoCommit=true
with doc size being around 1400 to 1600 bytes.  I dont want the timeout to
kill the imports of course

2. I tried adding a timeout param to the tomcat configuration but it doesnt
work:  Connector port=quot;8086quot; protocol=quot;HTTP/1.1quot;
connectionTimeout=quot;2quot; protocol=quot;HTTP/1.1quot;
timeout=quot;1quot; /

any thoughts??   can anyone point me in the right direction on how to
implement this?

any help appreciated.  thx in advance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/confused-about-how-to-set-a-solr-query-timeout-when-using-tomcat-tp4171363.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: confused about how to set a solr query timeout when using tomcat

2014-11-27 Thread Walter Underwood
How big is the index (document count, gigabytes)?

How much RAM is on the servers?

How big is your Java heap?

How are the servers hosted? AWS?

Long queries are often caused by long-tail queries fetched from disk. There are 
several ways to speed these up, but they all use RAM or SSD.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/


On Nov 27, 2014, at 12:56 PM, solr-user solr-u...@hotmail.com wrote:

 I inherited a set of some old 1.4x Solrs running under tomcat6/java6
 
 while I will eventually upgrade them to a more recent solr/tomcat/java, I am
 unable to do in near term
 
 one of my priority fixes tho is to implement some sort of timeout for solr
 queries that exceed 1000ms (or so); ie if the query takes longer than that,
 I want to abort that query (returning nothing or an error or whatever) so
 that solr can process other queries.  while we have optimized our queries
 for an average 50ms response time, we do occasionally see some that can run
 between 10 and 100 seconds.
 
 I know that this version of Solr itself doesn't have a built in timeout
 mechanism, which leaves me with figuring out what to do (it seems to me that
 I have to figure out how to get Tomcat to timeout the queries somehow)
 
 note that I DID google until my fingers hurt and have not been able to find
 clear (at least not clear to me) instructions on how do to so 
 
 Details:
 
 1. the setup uses the DataImportHandler to updates Solr, and updates occur
 often and can be quite large; we use batchSize=1 and autoCommit=true
 with doc size being around 1400 to 1600 bytes.  I dont want the timeout to
 kill the imports of course
 
 2. I tried adding a timeout param to the tomcat configuration but it doesnt
 work:  Connector port=quot;8086quot; protocol=quot;HTTP/1.1quot;
 connectionTimeout=quot;2quot; protocol=quot;HTTP/1.1quot;
 timeout=quot;1quot; /
 
 any thoughts??   can anyone point me in the right direction on how to
 implement this?
 
 any help appreciated.  thx in advance
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/confused-about-how-to-set-a-solr-query-timeout-when-using-tomcat-tp4171363.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: confused about how to set a solr query timeout when using tomcat

2014-11-27 Thread solr-user
millions of documents per shard, with a number of shards
~40gb index folder size
12gb of heap on a 16gb machine (this old Solr doesnt use O/S mem space like
4.x does)
servers are hosted internally, and are powerful

understood.  as mentioned, we tuned the bulk of our queries to run very
quickly (50ms or less), but we do occasionally see queries (ie internal ones
for statistics/tests) that can be excessively long running

Basically, we want to be able to enforce how long those long running queries
are allowed to run



--
View this message in context: 
http://lucene.472066.n3.nabble.com/confused-about-how-to-set-a-solr-query-timeout-when-using-tomcat-tp4171363p4171368.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr mlt doesn't return documents with exactly the same contents

2014-11-27 Thread Nishant Kelkar
Hey hhc,

I am new to Solr, so pardon me if this throws you off. But I think the
following piece of code is relevant to your problem from
MoreLikeThisHandler#handleRequestBody():

  // Find documents MoreLikeThis - either with a reader or a query
  // 

  if (reader != null) {
mltDocs = mlt.getMoreLikeThis(reader, start, rows, filters,
interesting, flags);
  } else if (q != null) {
// Matching options
boolean includeMatch = params.getBool(MoreLikeThisParams.MATCH_INCLUDE,
true);
int matchOffset = params.getInt(MoreLikeThisParams.MATCH_OFFSET, 0);
// Find the base match*DocList match =
searcher.getDocList(query, null, null, matchOffset, 1,
*flags); // only get the first one...
if (includeMatch) {
  rsp.add(match, match);
}

// This is an iterator, but we only handle the first match*
DocIterator iterator = match.iterator();
*if (iterator.hasNext()) {
  // do a MoreLikeThis query for each document in results
  *int id = iterator.nextDoc();
  mltDocs = mlt.getMoreLikeThis(id, start, rows, filters, interesting,
  flags);*
}
  } else {
throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
MoreLikeThis requires either a query (?q=) or text to
find similar documents.);
  }

} finally {
  if (reader != null) {
reader.close();
  }
}

From the code in bold, it seems like it pulls the first document from the
top 10 list (which is most likely your duplicate document, as it seems to
be ranked by score), and issues an mlt query on that.

As an experiment to verify this, you can try the following:
1. Add a *third* document, similar to aaa, let's say it's called ccc.
2. Issue the same query that you posted above:
http://localhost:8983/solr/test/select?q=id:aaamlt=truemlt.fl=title
3. If you see document ccc in the results list, that confirms the above
notion of mine.

Let us know how it goes!

Best Regards,
Nishant Kelkar

On Thu, Nov 27, 2014 at 2:33 AM, hhc hhchen1...@gmail.com wrote:

 I have two documents with ids aaa and bbb, and the titles of both
 documents are a black fox jumps over a red flower.  I imported both
 documents, along with several other testing documents, two a core test.

 I want solr to return documents similar to document aaa, so I submited
 the
 following:

 http://localhost:8983/solr/test/select?q=id:aaamlt=truemlt.fl=title

 Solr returned some similar documents.  However, document bbb, which
 should
 be the most similar document of aaa, was not in the mlt returned list.
 Any ideas how this could happen?  Thanks!



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-mlt-doesn-t-return-documents-with-exactly-the-same-contents-tp4171284.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: confused about how to set a solr query timeout when using tomcat

2014-11-27 Thread Toke Eskildsen
solr-user [solr-u...@hotmail.com] wrote:
 while we have optimized our queries for an average 50ms response time,
 we do occasionally see some that can run between 10 and 100 seconds.

That sounds suspicious. Response times so far from your average indicates that 
there is special processing going on, such as uninverting facet fields after an 
index update or garbage collection with a very large heap. In both of these 
cases, termination (if it were possible) would be undesirable as those jobs 
needs to be done.

 I know that this version of Solr itself doesn't have a built in timeout
 mechanism, which leaves me with figuring out what to do (it seems to me that
 I have to figure out how to get Tomcat to timeout the queries somehow)

You can get tomcat to timeout, but it only breaks the connection with the 
client: The request is still processed in Solr and will end with an error entry 
in the log as it cannot deliver the result back to the client. Even if you were 
able to somehow kill the started thread from the outside, which Java does not 
support, it might leave the Solr structures in a problematic state: The only 
technically sane way to do time-based termination of a request is to build it 
into the application, i.e. Solr.

- Toke Eskildsen


RE: confused about how to set a solr query timeout when using tomcat

2014-11-27 Thread solr-user
yes, that solr queries continue to run the query on the solr server even
after a connection is broken was my understanding and concern as well

I was hoping I had overlooked or missed something in Solr or Tomcat
documentation that might do the job

it is unfortunate

if anyone else can think of something, let me know




--
View this message in context: 
http://lucene.472066.n3.nabble.com/confused-about-how-to-set-a-solr-query-timeout-when-using-tomcat-tp4171363p4171379.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Exception in unit tests for distributed search component

2014-11-27 Thread Shalin Shekhar Mangar
Is that the complete stack trace? There are multiple indexDoc methods in
that class. Some of them assert that the response from control collection
and the default collection are the same. However, in this case, it seems
that an AssertionError is being sent from the server itself as a
RemoteSolrException.

Without more details about the test case and the server response, I can't
say much. Maybe you should try printing out the response from the server to
see what is being returned.

On Wed, Nov 26, 2014 at 5:11 AM, Suchi Amalapurapu su...@bloomreach.com
wrote:

 Hi
 I am trying to test a custom distributed component with solr 4.6.1 which
 extends
 BaseDistributedSearchTestCase but end up with the following error.

 There are lot of tests in the solr code base which extend
 BaseDistributedSearchTestCase. Not sure what is wrong here.
 Suchi

 testDistribSearch(com.test.DistributedTest)  Time elapsed: 2.288 sec  
 ERROR!

 org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
 java.lang.AssertionError

 at __randomizedtesting.SeedInfo.seed([EB2AD095C59CFFE7:6ACC5E8DB2C39FDB]:0)

 at

 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)

 at

 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)

 at

 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)

 at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)

 at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)

 at

 org.apache.solr.BaseDistributedSearchTestCase.indexDoc(BaseDistributedSearchTestCase.java:436)




-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr mlt doesn't return documents with exactly the same contents

2014-11-27 Thread hhc
Hi Nishant,

Thank you for the reply.  

I believe that solr removes the first document from the mlt list because a
document is most similar to itself and thus should be removed.  In my
case, aaa and bbb are two different documents.  When search for
documents similar to aaa, the document aaa should be removed from the
list, but bbb should be kept.

I did the experiment you suggested.  Unfortunately, the document ccc was
not in the mlt list.  I modify the title of ccc to a somewhat different
sentence a black fox jumps over a yellow flower, but the document ccc
was not in the list either.  :-(

Anyone has any clues on this?  Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-mlt-doesn-t-return-documents-with-exactly-the-same-contents-tp4171284p4171382.html
Sent from the Solr - User mailing list archive at Nabble.com.


Trying to get ALL scores from a previous search in a custom search component (last-components)

2014-11-27 Thread Darin Amos
Hello,

I am trying to implement a Rollup Search component on a version of SOLR that 
exists previously to the parent/child additions, so I am trying to implement my 
own. The searches will be executed exclusively against the child documents, and 
I want to “rollup” those child documents into the parent documents. 

The interface is going to allow the user to add the following parameters to the 
SOLR query:

rollup=truerollup.parentField=idrollup.childField=parentId

My code so far is below. What I have works so far, except my second parent 
query loses the order. I would like to be able to sort my parent query by the 
score of the previous child search. Perhaps I would take the highest score from 
all children (haven’t decided yet). My problem however is that I don’t know how 
I can get the score from all the hits in the original search, just what is 
returned. If my child query gets 10,000 hits, but only return 100 records, I 
can’t get all the scores I need.

Does anyone have any recommendations?

Thanks!!
Darin


//Loop through all the records and look for the parent 
reference field 
SetString parentRefs = new HashSetString();
DocIterator docSetIterator = 
rb.getResults().docSet.iterator();
while(docSetIterator.hasNext()){
int docInt = docSetIterator.next();
String fieldValues[] = 
rb.req.getSearcher().doc(docInt).getValues(childFieldName);

for(String fieldValue : fieldValues){
if(fieldValue != null  
fieldValue.length()  0  !parentRefs.contains(fieldValue)){
parentRefs.add(fieldValue);
}
}
}

//Build a boolean query of term queries
BooleanQuery parentQuery = new BooleanQuery();
IteratorString parentIdIterator = 
parentRefs.iterator();
while(parentIdIterator.hasNext()){
String parentId = parentIdIterator.next();
TermQuery termQuery = new TermQuery(new 
Term(parentFieldName, parentId));
parentQuery.add(termQuery, 
BooleanClause.Occur.SHOULD);
}


DocList parentList = searcher.getDocList(parentQuery, 
new ArrayListQuery(), null, 0, 100, 1); //TODO: use correct start/end/flags 
later...

//Add parent results
ResultContext resultContext = new ResultContext();
resultContext.docs = parentList;
resultContext.query = parentQuery;
rb.rsp.add(parents, resultContext);
rb.rsp.getToLog().add(hits, parentList.matches());

Re: Solr mlt doesn't return documents with exactly the same contents

2014-11-27 Thread hhc
After carefully reading the mlt parameters here
https://wiki.apache.org/solr/MoreLikeThis

I found that I can specify the following parameters to return bbb when
search for similar documents of aaa:

mlt.mintf=1
mlt.mindf=2

Details:
mlt.mintf: Minimum Term Frequency - the frequency below which terms will be
ignored in the source doc.
DEFAULT_MIN_TERM_FREQ = 2
mlt.mindf: Minimum Document Frequency - the frequency at which words will be
ignored which do not occur in at least this many docs.
DEFAULT_MIN_DOC_FREQ = 5

Hope this is helpful to those who are confused about the mlt returns.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-mlt-doesn-t-return-documents-with-exactly-the-same-contents-tp4171284p4171399.html
Sent from the Solr - User mailing list archive at Nabble.com.