RE: Facet? Search problem

2017-03-14 Thread Scott Smith
Thanks.  I'll look at that as well.

-Original Message-
From: Stefan Matheis [mailto:matheis.ste...@gmail.com] 
Sent: Tuesday, March 14, 2017 1:20 PM
To: solr-user@lucene.apache.org
Subject: RE: Facet? Search problem

Scott

Depending on what you're looking for
https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
might be worth a look as well.

-Stefan

On Mar 14, 2017 7:25 PM, "Scott Smith" <ssm...@mainstreamdata.com> wrote:

> Grouping appears to be exactly what I'm looking for.  I added 
> "group=true=category" to my search and It appears that I 
> get a list of groups, one document in each group that matches the 
> search along with (bonus) the number of documents in the category that 
> match that search. Perfect.  Thank you very much.
>
> -Original Message-
> From: Dave [mailto:hastings.recurs...@gmail.com]
> Sent: Monday, March 13, 2017 7:59 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Facet? Search problem
>
> Perhaps look into grouping on that field.
>
> > On Mar 13, 2017, at 9:08 PM, Scott Smith <ssm...@mainstreamdata.com>
> wrote:
> >
> > I'm trying to solve a search problem and wondering if facets (or
> something else) might solve the problem.
> >
> > Let's assume I have a bunch of documents (100 million+).  Each 
> > document
> has a category (keyword) assigned to it.  A single document my only 
> have one category, but there may be multiple documents with the same 
> category (1 to a few hundred documents may be in any one category).  
> There are several million categories.
> >
> > Supposed I'm doing a search with a page size of 50.  What I want to 
> > do
> is do a search (e.g., "dog") and get back the top 50 documents that 
> match the contain the word "dog" and are all in different categories.  
> So, there needs to be one document from 50 different categories.
> >
> > If that's not possible, then is it possible to do it if I know the 
> > 50
> categories up-front and hand that off as part of the search (so "find 
> 50 documents that match the term 'dog' and there is one document from 
> each of
> 50 specified categories").
> >
> > Is there a way to do this?
> >
> > I'm not extremely knowledgeable about facets, but thought that might 
> > be
> a solution.  But, it doesn't have to be facets.
> >
> > Thanks for any help
> >
> > Scott
> >
> >
>


RE: Facet? Search problem

2017-03-14 Thread Scott Smith
Grouping appears to be exactly what I'm looking for.  I added 
"group=true=category" to my search and It appears that I get a list 
of groups, one document in each group that matches the search along with 
(bonus) the number of documents in the category that match that search. 
Perfect.  Thank you very much.

-Original Message-
From: Dave [mailto:hastings.recurs...@gmail.com] 
Sent: Monday, March 13, 2017 7:59 PM
To: solr-user@lucene.apache.org
Subject: Re: Facet? Search problem

Perhaps look into grouping on that field. 

> On Mar 13, 2017, at 9:08 PM, Scott Smith <ssm...@mainstreamdata.com> wrote:
> 
> I'm trying to solve a search problem and wondering if facets (or something 
> else) might solve the problem.
> 
> Let's assume I have a bunch of documents (100 million+).  Each document has a 
> category (keyword) assigned to it.  A single document my only have one 
> category, but there may be multiple documents with the same category (1 to a 
> few hundred documents may be in any one category).  There are several million 
> categories.
> 
> Supposed I'm doing a search with a page size of 50.  What I want to do is do 
> a search (e.g., "dog") and get back the top 50 documents that match the 
> contain the word "dog" and are all in different categories.  So, there needs 
> to be one document from 50 different categories.
> 
> If that's not possible, then is it possible to do it if I know the 50 
> categories up-front and hand that off as part of the search (so "find 50 
> documents that match the term 'dog' and there is one document from each of 50 
> specified categories").
> 
> Is there a way to do this?
> 
> I'm not extremely knowledgeable about facets, but thought that might be a 
> solution.  But, it doesn't have to be facets.
> 
> Thanks for any help
> 
> Scott
> 
> 


Facet? Search problem

2017-03-13 Thread Scott Smith
I'm trying to solve a search problem and wondering if facets (or something 
else) might solve the problem.

Let's assume I have a bunch of documents (100 million+).  Each document has a 
category (keyword) assigned to it.  A single document my only have one 
category, but there may be multiple documents with the same category (1 to a 
few hundred documents may be in any one category).  There are several million 
categories.

Supposed I'm doing a search with a page size of 50.  What I want to do is do a 
search (e.g., "dog") and get back the top 50 documents that match the contain 
the word "dog" and are all in different categories.  So, there needs to be one 
document from 50 different categories.

If that's not possible, then is it possible to do it if I know the 50 
categories up-front and hand that off as part of the search (so "find 50 
documents that match the term 'dog' and there is one document from each of 50 
specified categories").

Is there a way to do this?

I'm not extremely knowledgeable about facets, but thought that might be a 
solution.  But, it doesn't have to be facets.

Thanks for any help

Scott




Accessing document stored fields in a custom function

2014-09-23 Thread Scott Smith
I'm creating a custom function (extends ValueSource).  I'm generating a value 
that will both be returned as a value in the hit for each doc and also be used 
to sort.  As I read the documentation, this is not difficult.

To determine the value for a document, I need to access the stored fields for 
that document (i.e., the value that the function will generate partially 
depends on stored information in the document).  How do I access them from the 
getValues() method?  Is this via the FieldCache.DEFAULT?  I'm using solr 4.8 if 
that makes a difference (which I think it does since older examples seem to 
have been deprecated).  For example, if I have a field called Fred, how do I 
access that field from the document?

Is accessing the stored data going to have a big impact on the time to return 
results?

Thanks

Scott


RE: Help on custom sort

2014-09-22 Thread Scott Smith
I'll take a look at that.  Thanks

-Original Message-
From: Apoorva Gaurav [mailto:apoorva.gau...@myntra.com] 
Sent: Sunday, September 21, 2014 11:32 PM
To: solr-user
Subject: Re: Help on custom sort

Try using a custom value source parser and pass the formula of computing the 
price to solr; something like this 
http://java.dzone.com/articles/connecting-redis-solr-boosting

On Mon, Sep 22, 2014 at 1:38 AM, Scott Smith ssm...@mainstreamdata.com
wrote:

 There are likely several hundred groups.  Also, new groups will be 
 added and some groups will be deleted.  So, I don't think putting a 
 field in the docs works.  Having to add a new group price into 100 
 million+ documents doesn't seem reasonable.

 Right now I'm looking at
 http://sujitpal.blogspot.com/2011/05/custom-sorting-in-solr-using-external.html.
 This reference a much older version of solr (the blog is from 2011) 
 and so I will need to update the classes referenced.

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Saturday, September 20, 2014 11:58 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Help on custom sort

 How many different groups are there? And can user A ever be part of 
 more than one group?
 If
 1 there are a reasonably small number of groups ( 100 or so as a
 place to start)
 and
 2 a user is always part of a single group

 then you could store separate prices in each document by group, thus 
 you'd have some fields like
 price_group_a: $100
 price_group_b: $101

 then sorting  becomes trivial, you just specify a sort_group_a for 
 users in group A etc. If the number of groups is unknown-but-not-huge 
 dynamic fields could be used.

 If that's not the case, then you might be able to get clever with 
 sorting by function, here's a place to start:
 https://cwiki.apache.org/confluence/display/solr/Function+Queries

 These can be arbitrarily complex, but I'm thinking something where the 
 price returned by the function respects the group the user is in, 
 perhaps even the min/max of all the groups the user is in. I admit I 
 haven't really thought that through well though...

 Best,
 Erick

 On Sat, Sep 20, 2014 at 9:26 AM, Scott Smith 
 ssm...@mainstreamdata.com
 wrote:
  I need to provide a custom sort option for sorting by price and I 
  would
 like some suggestions.  It's not the straightforward just sort by a 
 price field in the document scenario or I wouldn't be asking for 
 help.  Here's the scenario I'm dealing with.
 
  I have 100 million+ documents (so multi-sharded).  Users search for
 documents they are interested in using a standard keyword search.  
 They then purchase documents they are interested in.  So far, nothing hard.
 
  Here's where things get interesting.  The documents come from 
  multiple
 suppliers.  Each supplier sets a price for his documents and different 
 suppliers will provide different pricing.
 
  That wouldn't be difficult except that *users* are divided up into
 different groups and depending on which group they are in, the 
 supplier will charge the user a different price.  So, user A may pay 
 one price for a document and user B may pay a different price for the 
 same document just because user A and user B are in different groups.  
 I don't even know if the relative order or pricing is the same between 
 different groups (e.g., if document X is more expensive than document 
 Y for a user in group M, it may not be more expensive for a user in 
 group N).  The one thing that may make this doable is that supplier A 
 will likely have the same price for all of his documents for each of 
 the user groups.  So, a user in group A will pay the same price 
 regardless of which document he buys from supplier 1.  A user in group 
 B will also pay the same price for any document from supplier 1; it's 
 just that a user in group B will likely pay a different price than a 
 user in group A.  So, within a supplier, the price varies based on user 
 group, not the document.
 
  To summarize, one of the requirements for the system is that we 
  provide
 the ability to sort search results based on price.  This would be easy 
 except that the price a user pays not only depends on what he wants to 
 buy, but on what group the he is in.
 
  I suspect there is some kind of custom solr module I'm going to have 
  to
 write.  I'm thinking that the user group gets passed in as a custom 
 solr parameter (I'm assuming that's possible??).  Then I'm thinking 
 that there has to be some kind of in memory database that tracks 
 pricing based on user group and document supplier).
 
  I'm happy to go read code, documents, links, etc if someone can 
  point me
 in the right direction.  What kind of solr module am I likely going to 
 write (extend) and are there some examples somewhere?  Maybe there's a 
 way to do this without having to extend a solr module??
 
  Hope this makes sense.  Any help is appreciated.
 
  Scott
 
 




--
Thanks  Regards,
Apoorva


Help on custom sort

2014-09-20 Thread Scott Smith
I need to provide a custom sort option for sorting by price and I would like 
some suggestions.  It's not the straightforward just sort by a price field in 
the document scenario or I wouldn't be asking for help.  Here's the scenario 
I'm dealing with.

I have 100 million+ documents (so multi-sharded).  Users search for documents 
they are interested in using a standard keyword search.  They then purchase 
documents they are interested in.  So far, nothing hard.

Here's where things get interesting.  The documents come from multiple 
suppliers.  Each supplier sets a price for his documents and different 
suppliers will provide different pricing.

That wouldn't be difficult except that *users* are divided up into different 
groups and depending on which group they are in, the supplier will charge the 
user a different price.  So, user A may pay one price for a document and user B 
may pay a different price for the same document just because user A and user B 
are in different groups.  I don't even know if the relative order or pricing is 
the same between different groups (e.g., if document X is more expensive than 
document Y for a user in group M, it may not be more expensive for a user in 
group N).  The one thing that may make this doable is that supplier A will 
likely have the same price for all of his documents for each of the user 
groups.  So, a user in group A will pay the same price regardless of which 
document he buys from supplier 1.  A user in group B will also pay the same 
price for any document from supplier 1; it's just that a user in group B will 
likely pay a different price than a user in group A.  So, within a supplier, 
the price varies based on user group, not the document.

To summarize, one of the requirements for the system is that we provide the 
ability to sort search results based on price.  This would be easy except that 
the price a user pays not only depends on what he wants to buy, but on what 
group the he is in.

I suspect there is some kind of custom solr module I'm going to have to write.  
I'm thinking that the user group gets passed in as a custom solr parameter (I'm 
assuming that's possible??).  Then I'm thinking that there has to be some kind 
of in memory database that tracks pricing based on user group and document 
supplier).

I'm happy to go read code, documents, links, etc if someone can point me in the 
right direction.  What kind of solr module am I likely going to write (extend) 
and are there some examples somewhere?  Maybe there's a way to do this without 
having to extend a solr module??

Hope this makes sense.  Any help is appreciated.

Scott




Tie breakers when sorting equal items

2014-01-26 Thread Scott Smith
I promised to ask this on the forum just to confirm what I assume is true.

Suppose you're returning results using a sort order based on some field (so, 
not relevancy). For example, suppose it's a date field which indicates when the 
document was loaded into the solr index.   Suppose two items have exactly the 
same date/time in the field.  Would solr return the two items in the order in 
which they were inserted.  I would assume that the answer is not necessarily.

I know that you can have secondary sort fields if something exists that would 
provide the desired functionality.  I know that I could set up some kind of 
numbering scheme that would provide the same result (the customer doesn't want 
to pay for that).

So, I'm really just asking if Solr has any guarantees that when you sort on a 
field and two items have the same value, they will be sorted in the order they 
were inserted into the index.  Again, I assume the answer is no, but I said I 
would ask.


Solr query processing

2013-09-23 Thread Scott Smith
I just want to state a couple of things and hear someone say, that's right.


1.   In a solr query you can have multiple fq's, but only a single q.  And 
yes, I can simply AND the multiple qs together.  Just want to avoid that if 
I'm wrong.

2.   A subtler issue is that when a full query is executied, Solr must look 
at the schema to see how each field was tokenized (or not) and the various 
other filters applied to a field so that it can properly transform fields data 
(e.g., tokenize the text, but not keywords).  As an aside, it would be nice if 
the queryparser could do the same thing in Lucene (I know, wrong forum :)).
Scott


RE: Custom Solr indexer/searcher

2012-11-16 Thread Scott Smith
Thanks for the suggestions.  I'll take a look at these things.

-Original Message-
From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] 
Sent: Thursday, November 15, 2012 11:54 PM
To: solr-user@lucene.apache.org
Subject: Re: Custom Solr indexer/searcher

Scott,
It sounds like you need to look into few samples of similar things in Lucene. 
On top of my head FuzzyQuery from 4.0, which finds terms similar to the given 
in FST for query expansion. Generic query expansion is done via MultiTermQuery. 
Index time terms expansion is shown in TrieField and btw NumericRangeQuery (it 
should match with your goal a lot). All these are single dimension samples, but 
AFAIK KD-tree is multidimensional, look into GeoHashField which puts two 
dimensional points into single terms with ability to build ranges on them see 
GeoHashField.createSpatialQuery().

Happy hacking!


On Fri, Nov 16, 2012 at 10:34 AM, John Whelan whelanl...@gmail.com wrote:

 Scott,

 I probably have no idea as to what I'm saying, but if you're looking 
 for finding results in a N-dimensional space, you might look at 
 creating a field of type 'point'. Point-type fields have a dimension 
 attribute; I believe that it can be set to a large integer value.

 Barring that, there is also a 'dist()' function that can be used to 
 work with multiple numeric fields in order sort results based on 
 closeness to a desired coordinate. The 'dist function takes a 
 parameter to specify the means of calculating the distance. (For example, 2 
 - 'Euclidean distance'.
 I don't know the other options.)

 In the worst case, my response is worthless, but pops your question 
 back up in the e-mails...

 Regards,
 John




--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Custom Solr indexer/searcher

2012-11-13 Thread Scott Smith
Suppose I have a special data search type (something different than a string or 
numeric value) that I want to integrate into the Solr server.  For example, 
suppose I wanted to implement a KD-tree as a filter that would integrate with 
standard Solr filters and queries.  I might want to say find all of the 
documents in the index with the word 'tree' in them that are within a certain 
distance of a particular document in the KD-tree.  Let me add that I'm not 
really looking for a KD-Tree implementation for Solr; I just assume that a fair 
number of people will know what a KD-tree is and so, have some idea that I'm 
talking about adding a new data type (different than string, long, etc.) that 
Solr will need to be able to index and search with.  It's important that the 
new data type should integrate with the existing standard Solr data types for 
searching purposes.

First, is there a way to build and specify a plugin that provides Solr both the 
indexer and search interfaces and therefore hides the internal details of 
what's going on in the search from Solr so it just thinks it's another search 
type?  Or, would I have to hack Solr in a lot of places to add my custom data 
type in?

Second, if the interface(s) exists to add in a new data type, is there 
documentation (tutorial, examples, etc.) anywhere on how to do this.  Or, is my 
only option to dig into the Solr code?

Mostly, I'm looking for some links or suggestions on where to start looking.  I 
doubt this subject is simple enough to fit into an email post (though I'd be 
happy to be surprised :) ).  You can assume Solr 4.0 if that makes things 
easier.  You can also assume that I have some familiarity with Lucene (though I 
haven't hacked that code either).

Hopefully, I've explained this well enough so that people know what I'm looking 
for.

Cheers

Scott



Exception in Solr server on more like this

2011-12-22 Thread Scott Smith
I've been trying to get More like this running under solr 3.5.  I get the 
Exception below. The http request is also highlighted below.

I've looked at the FieldType code and I don't understand what's going on there. 
 So, while I know what a null pointer exception means, it isn't telling me what 
I did or didn't do.

FYI - the Body field has termVectors set to true which I thought was 
sufficient for MLT.

What I'm trying to do is submit the phrase country now is the time country to 
MLT to determine the interesting words (which I want returned) and then 
return the top most relevant documents.

Any help on what might be wrong would be appreciated.

Scott

6975 [main] INFO com.mainstreamdata.MediasIndexer.mediasBrowser.SearchFactory  
- SearchFactory:SearchFactory: Search Factory initialized
SolrQuery:: (country now is the time country)
Filter:: (Language:en)
15274 [main] ERROR com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch  - 
SolrSearch:getDocTier: Unable to do search:
org.apache.solr.client.solrj.SolrServerException: Error executing query
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
at 
org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:266)
at 
com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getTier(SolrSearch.java:309)
at 
com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getFirstTier(SolrSearch.java:93)
at 
com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getNextOlderTier(SolrSearch.java:175)
at 
com.mainstreamdata.MediasIndexer.SolrMgrTest.testMoreLikeThis(SolrMgrTest.java:209)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at junit.framework.TestCase.runTest(TestCase.java:164)
at junit.framework.TestCase.runBare(TestCase.java:130)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:120)
at junit.framework.TestSuite.runTest(TestSuite.java:230)
at junit.framework.TestSuite.run(TestSuite.java:225)
at 
org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: org.apache.solr.common.SolrException: null  
java.lang.NullPointerException
   at 
org.apache.solr.schema.FieldType.storedToIndexed(FieldType.java:374)
at 
org.apache.solr.handler.MoreLikeThisHandler$MoreLikeThisHelper.getMoreLikeThis(MoreLikeThisHandler.java:320)
   at 
org.apache.solr.handler.component.MoreLikeThisComponent.getMoreLikeThese(MoreLikeThisComponent.java:82)
   at 
org.apache.solr.handler.component.MoreLikeThisComponent.process(MoreLikeThisComponent.java:57)
   at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:208)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
   at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
   at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
   at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at 
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
   at 

RE: Exception in Solr server on more like this

2011-12-22 Thread Scott Smith
This turned out to be SOLR-2986.

-Original Message-
From: Scott Smith [mailto:ssm...@mainstreamdata.com] 
Sent: Thursday, December 22, 2011 1:24 PM
To: solr-user@lucene.apache.org
Subject: Exception in Solr server on more like this

I've been trying to get More like this running under solr 3.5.  I get the 
Exception below. The http request is also highlighted below.

I've looked at the FieldType code and I don't understand what's going on there. 
 So, while I know what a null pointer exception means, it isn't telling me what 
I did or didn't do.

FYI - the Body field has termVectors set to true which I thought was 
sufficient for MLT.

What I'm trying to do is submit the phrase country now is the time country to 
MLT to determine the interesting words (which I want returned) and then 
return the top most relevant documents.

Any help on what might be wrong would be appreciated.

Scott

6975 [main] INFO com.mainstreamdata.MediasIndexer.mediasBrowser.SearchFactory  
- SearchFactory:SearchFactory: Search Factory initialized
SolrQuery:: (country now is the time country)
Filter:: (Language:en)
15274 [main] ERROR com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch  - 
SolrSearch:getDocTier: Unable to do search:
org.apache.solr.client.solrj.SolrServerException: Error executing query
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
at 
org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:266)
at 
com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getTier(SolrSearch.java:309)
at 
com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getFirstTier(SolrSearch.java:93)
at 
com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getNextOlderTier(SolrSearch.java:175)
at 
com.mainstreamdata.MediasIndexer.SolrMgrTest.testMoreLikeThis(SolrMgrTest.java:209)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at junit.framework.TestCase.runTest(TestCase.java:164)
at junit.framework.TestCase.runBare(TestCase.java:130)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:120)
at junit.framework.TestSuite.runTest(TestSuite.java:230)
at junit.framework.TestSuite.run(TestSuite.java:225)
at 
org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: org.apache.solr.common.SolrException: null  
java.lang.NullPointerException
   at 
org.apache.solr.schema.FieldType.storedToIndexed(FieldType.java:374)
at 
org.apache.solr.handler.MoreLikeThisHandler$MoreLikeThisHelper.getMoreLikeThis(MoreLikeThisHandler.java:320)
   at 
org.apache.solr.handler.component.MoreLikeThisComponent.getMoreLikeThese(MoreLikeThisComponent.java:82)
   at 
org.apache.solr.handler.component.MoreLikeThisComponent.process(MoreLikeThisComponent.java:57)
   at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:208)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
   at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
   at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
   at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766

MoreLikeThis questions

2011-12-09 Thread Scott Smith
I'm implementing a MoreLikeThis  search.  I have a couple of questions.  I'm 
implementing this with solrj so I would appreciate it if any code snippets 
reflect that.

First, I want to provide the text that Solr should check for interesting 
words and do the search on.  This means I don't want to specify a document in 
the collection.  I think the documentation implies I can do this.  However, it 
seems like using the q parameter would be the wrong thing since I think it 
would just take doc 0 of the result of searching the default field with those 
words.  However, I don't see any other parameter that looks like it's the 
correct one.

Second, I need to access the interesting terms.  It's not clear to me how to 
get these.  I see the parameter I need to set to have the interesting terms 
included in the response.  I'm just not sure how to get at them with solrj once 
the response comes back.

Can someone point me to examples of how to do this?


RE: MoreLikeThis questions

2011-12-09 Thread Scott Smith
I realized I probably should have said Solr 3.5 in case that makes a difference.

-Original Message-
From: Scott Smith [mailto:ssm...@mainstreamdata.com] 
Sent: Friday, December 09, 2011 2:29 PM
To: solr-user@lucene.apache.org
Subject: MoreLikeThis questions

I'm implementing a MoreLikeThis  search.  I have a couple of questions.  I'm 
implementing this with solrj so I would appreciate it if any code snippets 
reflect that.

First, I want to provide the text that Solr should check for interesting 
words and do the search on.  This means I don't want to specify a document in 
the collection.  I think the documentation implies I can do this.  However, it 
seems like using the q parameter would be the wrong thing since I think it 
would just take doc 0 of the result of searching the default field with those 
words.  However, I don't see any other parameter that looks like it's the 
correct one.

Second, I need to access the interesting terms.  It's not clear to me how to 
get these.  I see the parameter I need to set to have the interesting terms 
included in the response.  I'm just not sure how to get at them with solrj once 
the response comes back.

Can someone point me to examples of how to do this?


RE: MoreLikeThis questions

2011-12-09 Thread Scott Smith
OK.  I just found Juan Grande's 7/1/2011 post.  It seems like that gives me 
some ideas on the second question.

I still don't know what to do about the first question.  Maybe if I saw the 
Request xml, it would give me a hint what to do with the solrj stuff.

Anybody have any thoughts?

Scott

-Original Message-
From: Scott Smith [mailto:ssm...@mainstreamdata.com] 
Sent: Friday, December 09, 2011 3:14 PM
To: solr-user@lucene.apache.org
Subject: RE: MoreLikeThis questions

I realized I probably should have said Solr 3.5 in case that makes a difference.

-Original Message-
From: Scott Smith [mailto:ssm...@mainstreamdata.com] 
Sent: Friday, December 09, 2011 2:29 PM
To: solr-user@lucene.apache.org
Subject: MoreLikeThis questions

I'm implementing a MoreLikeThis  search.  I have a couple of questions.  I'm 
implementing this with solrj so I would appreciate it if any code snippets 
reflect that.

First, I want to provide the text that Solr should check for interesting 
words and do the search on.  This means I don't want to specify a document in 
the collection.  I think the documentation implies I can do this.  However, it 
seems like using the q parameter would be the wrong thing since I think it 
would just take doc 0 of the result of searching the default field with those 
words.  However, I don't see any other parameter that looks like it's the 
correct one.

Second, I need to access the interesting terms.  It's not clear to me how to 
get these.  I see the parameter I need to set to have the interesting terms 
included in the response.  I'm just not sure how to get at them with solrj once 
the response comes back.

Can someone point me to examples of how to do this?


To optimize or not - Solr vs Lucene

2011-12-06 Thread Scott Smith
Wasn't sure which mailing list to send this to.  I'm writing an application 
that can be configured to run directly with lucene or with solr and I'm trying 
to figure out whether optimization of the index should be totally eliminated, 
eliminated in the lucene case only or what.

If I read the 3.5 lucene javadocs, optimize() has been deprecated because it is 
rarely justified with the current lucene index implementation (I started with 
lucene in the 1.42 days when I think it was pretty much a necessity).  However, 
If I read the lucid imagination 3.4 manual (page 176), it talks about how 
optimizing will merge a lot of small blocks together making the index more 
efficient-which is exactly what I thought optimize did.  Since solr is based on 
lucene, I'm wondering if the 3.4 manual is simply out-of-date on this point or 
whether there is something else going on.

Our application is indexing content in real time and so the index changes 
frequently during the day.  Some of our indexes only contain a few hundred 
thousand documents.  However, in one of our applications there are over 50 
million documents (using Solr with multiple shards).  I thought optimization 
was a way to keep the index segments merged and thus make the searching more 
efficient.  I thought it was especially needed if the index was being updated 
frequently.

When should I optimize?

Thanks in advance for any feedback.

Scott


RE: Lucene-SOLR transition

2011-09-19 Thread Scott Smith
OK.  Thanks for all of the suggestions.

Cheers

Scott

-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com] 
Sent: Monday, September 19, 2011 3:27 AM
To: solr-user@lucene.apache.org
Subject: Re: Lucene-SOLR transition


On Sep 18, 2011, at 19:43 , Michael Sokolov wrote:

 On 9/15/2011 8:30 PM, Scott Smith wrote:
 
 2.   Assuming that the answer to 1 is correct, then is there an easy 
 way to take a lucene query (with nested Boolean queries, filter queries, 
 etc.) and generate a SOLR query string with q and fq components?
 
 
 I believe that Query.toString() will probably get you back something that can 
 be parsed in turn by the traditional lucene QueryParser, thus completing the 
 circle and returning your original Query.  But why would you want to do that?

No, you can't rely on Query.toString() roundtripping (think stemming, for 
example - but many other examples that won't work that way too).

What you can do, since you know Lucene's API well, is write a QParser(Plugin) 
that takes request parameters as strings and generates the Query from that like 
you are now with your Lucene app.

Erik



Lucene-SOLR transition

2011-09-15 Thread Scott Smith
I've been using lucene for a number of years.  We've now decided to move to 
SOLR.  I have a couple of questions.


1.   I'm used to creating Boolean queries, filter queries, term queries, 
etc. for lucene.  Am I right in thinking that for SOLR my only option is 
creating string queries (with q and fq components) for solrj?

2.   Assuming that the answer to 1 is correct, then is there an easy way 
to take a lucene query (with nested Boolean queries, filter queries, etc.) and 
generate a SOLR query string with q and fq components?

Thanks

Scott


ANTLR SOLR query/filter parser

2011-08-01 Thread Scott Smith
I'm looking for an ANTLR parser that consumes solr queries and filters.  Before 
I write my own, thought I'd ask if anyone has one they are willing to share or 
can point me to one?

Thanks

Scott