from:"Scott Smith"

RE: Facet? Search problem

2017-03-14 Thread Scott Smith

Thanks.  I'll look at that as well.

-Original Message-
From: Stefan Matheis [mailto:matheis.ste...@gmail.com] 
Sent: Tuesday, March 14, 2017 1:20 PM
To: solr-user@lucene.apache.org
Subject: RE: Facet? Search problem

Scott

Depending on what you're looking for
https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
might be worth a look as well.

-Stefan

On Mar 14, 2017 7:25 PM, "Scott Smith"  wrote:

> Grouping appears to be exactly what I'm looking for.  I added 
> "group=true&group.field=category" to my search and It appears that I 
> get a list of groups, one document in each group that matches the 
> search along with (bonus) the number of documents in the category that 
> match that search. Perfect.  Thank you very much.
>
> -Original Message-
> From: Dave [mailto:hastings.recurs...@gmail.com]
> Sent: Monday, March 13, 2017 7:59 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Facet? Search problem
>
> Perhaps look into grouping on that field.
>
> > On Mar 13, 2017, at 9:08 PM, Scott Smith 
> wrote:
> >
> > I'm trying to solve a search problem and wondering if facets (or
> something else) might solve the problem.
> >
> > Let's assume I have a bunch of documents (100 million+).  Each 
> > document
> has a category (keyword) assigned to it.  A single document my only 
> have one category, but there may be multiple documents with the same 
> category (1 to a few hundred documents may be in any one category).  
> There are several million categories.
> >
> > Supposed I'm doing a search with a page size of 50.  What I want to 
> > do
> is do a search (e.g., "dog") and get back the top 50 documents that 
> match the contain the word "dog" and are all in different categories.  
> So, there needs to be one document from 50 different categories.
> >
> > If that's not possible, then is it possible to do it if I know the 
> > 50
> categories up-front and hand that off as part of the search (so "find 
> 50 documents that match the term 'dog' and there is one document from 
> each of
> 50 specified categories").
> >
> > Is there a way to do this?
> >
> > I'm not extremely knowledgeable about facets, but thought that might 
> > be
> a solution.  But, it doesn't have to be facets.
> >
> > Thanks for any help
> >
> > Scott
> >
> >
>

RE: Facet? Search problem

2017-03-14 Thread Scott Smith

Grouping appears to be exactly what I'm looking for.  I added 
"group=true&group.field=category" to my search and It appears that I get a list 
of groups, one document in each group that matches the search along with 
(bonus) the number of documents in the category that match that search. 
Perfect.  Thank you very much.

-Original Message-
From: Dave [mailto:hastings.recurs...@gmail.com] 
Sent: Monday, March 13, 2017 7:59 PM
To: solr-user@lucene.apache.org
Subject: Re: Facet? Search problem

Perhaps look into grouping on that field. 

> On Mar 13, 2017, at 9:08 PM, Scott Smith  wrote:
> 
> I'm trying to solve a search problem and wondering if facets (or something 
> else) might solve the problem.
> 
> Let's assume I have a bunch of documents (100 million+).  Each document has a 
> category (keyword) assigned to it.  A single document my only have one 
> category, but there may be multiple documents with the same category (1 to a 
> few hundred documents may be in any one category).  There are several million 
> categories.
> 
> Supposed I'm doing a search with a page size of 50.  What I want to do is do 
> a search (e.g., "dog") and get back the top 50 documents that match the 
> contain the word "dog" and are all in different categories.  So, there needs 
> to be one document from 50 different categories.
> 
> If that's not possible, then is it possible to do it if I know the 50 
> categories up-front and hand that off as part of the search (so "find 50 
> documents that match the term 'dog' and there is one document from each of 50 
> specified categories").
> 
> Is there a way to do this?
> 
> I'm not extremely knowledgeable about facets, but thought that might be a 
> solution.  But, it doesn't have to be facets.
> 
> Thanks for any help
> 
> Scott
> 
>

Facet? Search problem

2017-03-13 Thread Scott Smith

I'm trying to solve a search problem and wondering if facets (or something 
else) might solve the problem.

Let's assume I have a bunch of documents (100 million+).  Each document has a 
category (keyword) assigned to it.  A single document my only have one 
category, but there may be multiple documents with the same category (1 to a 
few hundred documents may be in any one category).  There are several million 
categories.

Supposed I'm doing a search with a page size of 50.  What I want to do is do a 
search (e.g., "dog") and get back the top 50 documents that match the contain 
the word "dog" and are all in different categories.  So, there needs to be one 
document from 50 different categories.

If that's not possible, then is it possible to do it if I know the 50 
categories up-front and hand that off as part of the search (so "find 50 
documents that match the term 'dog' and there is one document from each of 50 
specified categories").

Is there a way to do this?

I'm not extremely knowledgeable about facets, but thought that might be a 
solution.  But, it doesn't have to be facets.

Thanks for any help

Scott

Accessing document stored fields in a custom function

2014-09-23 Thread Scott Smith

I'm creating a custom function (extends ValueSource).  I'm generating a value 
that will both be returned as a value in the hit for each doc and also be used 
to sort.  As I read the documentation, this is not difficult.

To determine the value for a document, I need to access the "stored" fields for 
that document (i.e., the value that the function will generate partially 
depends on stored information in the document).  How do I access them from the 
getValues() method?  Is this via the FieldCache.DEFAULT?  I'm using solr 4.8 if 
that makes a difference (which I think it does since older examples seem to 
have been deprecated).  For example, if I have a field called "Fred", how do I 
access that field from the document?

Is accessing the stored data going to have a big impact on the time to return 
results?

Thanks

Scott

RE: Help on custom sort

2014-09-22 Thread Scott Smith

I'll take a look at that.  Thanks

-Original Message-
From: Apoorva Gaurav [mailto:apoorva.gau...@myntra.com] 
Sent: Sunday, September 21, 2014 11:32 PM
To: solr-user
Subject: Re: Help on custom sort

Try using a custom value source parser and pass the "formula" of computing the 
price to solr; something like this 
http://java.dzone.com/articles/connecting-redis-solr-boosting

On Mon, Sep 22, 2014 at 1:38 AM, Scott Smith 
wrote:

> There are likely several hundred groups.  Also, new groups will be 
> added and some groups will be deleted.  So, I don't think putting a 
> field in the docs works.  Having to add a new group price into 100 
> million+ documents doesn't seem reasonable.
>
> Right now I'm looking at
> http://sujitpal.blogspot.com/2011/05/custom-sorting-in-solr-using-external.html.
> This reference a much older version of solr (the blog is from 2011) 
> and so I will need to update the classes referenced.
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Saturday, September 20, 2014 11:58 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Help on custom sort
>
> How many different groups are there? And can user A ever be part of 
> more than one group?
> If
> 1> there are a reasonably small number of groups (< 100 or so as a
> place to start)
> and
> 2> a user is always part of a single group
>
> then you could store separate prices in each document by group, thus 
> you'd have some fields like
> price_group_a: $100
> price_group_b: $101
>
> then sorting  becomes trivial, you just specify a sort_group_a for 
> users in group A etc. If the number of groups is unknown-but-not-huge 
> dynamic fields could be used.
>
> If that's not the case, then you might be able to get clever with 
> sorting by function, here's a place to start:
> https://cwiki.apache.org/confluence/display/solr/Function+Queries
>
> These can be arbitrarily complex, but I'm thinking something where the 
> price returned by the function respects the group the user is in, 
> perhaps even the min/max of all the groups the user is in. I admit I 
> haven't really thought that through well though...
>
> Best,
> Erick
>
> On Sat, Sep 20, 2014 at 9:26 AM, Scott Smith 
> 
> wrote:
> > I need to provide a custom sort option for sorting by price and I 
> > would
> like some suggestions.  It's not the straightforward "just sort by a 
> price field in the document" scenario or I wouldn't be asking for 
> help.  Here's the scenario I'm dealing with.
> >
> > I have 100 million+ documents (so multi-sharded).  Users search for
> documents they are interested in using a standard keyword search.  
> They then purchase documents they are interested in.  So far, nothing hard.
> >
> > Here's where things get "interesting".  The documents come from 
> > multiple
> suppliers.  Each supplier sets a price for his documents and different 
> suppliers will provide different pricing.
> >
> > That wouldn't be difficult except that *users* are divided up into
> different groups and depending on which group they are in, the 
> supplier will charge the user a different price.  So, user A may pay 
> one price for a document and user B may pay a different price for the 
> same document just because user A and user B are in different groups.  
> I don't even know if the relative order or pricing is the same between 
> different groups (e.g., if document X is more expensive than document 
> Y for a user in group M, it may not be more expensive for a user in 
> group N).  The one thing that may make this doable is that supplier A 
> will likely have the same price for all of his documents for each of 
> the user groups.  So, a user in group A will pay the same price 
> regardless of which document he buys from supplier 1.  A user in group 
> B will also pay the same price for any document from supplier 1; it's 
> just that a user in group B will likely pay a different price than a 
> user in group A.  So, within a supplier, the price varies based on user 
> group, not the document.
> >
> > To summarize, one of the requirements for the system is that we 
> > provide
> the ability to sort search results based on price.  This would be easy 
> except that the price a user pays not only depends on what he wants to 
> buy, but on what group the he is in.
> >
> > I suspect there is some kind of custom solr module I'm going to have 
> > to
> write.  I'm thinking that the user group gets passed in as a custom 
> solr parameter (I'm assuming that's possible??).  Then I&#x

RE: Help on custom sort

2014-09-21 Thread Scott Smith

There are likely several hundred groups.  Also, new groups will be added and 
some groups will be deleted.  So, I don't think putting a field in the docs 
works.  Having to add a new group price into 100 million+ documents doesn't 
seem reasonable.

Right now I'm looking at 
http://sujitpal.blogspot.com/2011/05/custom-sorting-in-solr-using-external.html.
  This reference a much older version of solr (the blog is from 2011) and so I 
will need to update the classes referenced.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Saturday, September 20, 2014 11:58 AM
To: solr-user@lucene.apache.org
Subject: Re: Help on custom sort

How many different groups are there? And can user A ever be part of more than 
one group?
If
1> there are a reasonably small number of groups (< 100 or so as a
place to start)
and
2> a user is always part of a single group

then you could store separate prices in each document by group, thus you'd have 
some fields like
price_group_a: $100
price_group_b: $101

then sorting  becomes trivial, you just specify a sort_group_a for users in 
group A etc. If the number of groups is unknown-but-not-huge dynamic fields 
could be used.

If that's not the case, then you might be able to get clever with sorting by 
function, here's a place to start:
https://cwiki.apache.org/confluence/display/solr/Function+Queries

These can be arbitrarily complex, but I'm thinking something where the price 
returned by the function respects the group the user is in, perhaps even the 
min/max of all the groups the user is in. I admit I haven't really thought that 
through well though...

Best,
Erick

On Sat, Sep 20, 2014 at 9:26 AM, Scott Smith  wrote:
> I need to provide a custom sort option for sorting by price and I would like 
> some suggestions.  It's not the straightforward "just sort by a price field 
> in the document" scenario or I wouldn't be asking for help.  Here's the 
> scenario I'm dealing with.
>
> I have 100 million+ documents (so multi-sharded).  Users search for documents 
> they are interested in using a standard keyword search.  They then purchase 
> documents they are interested in.  So far, nothing hard.
>
> Here's where things get "interesting".  The documents come from multiple 
> suppliers.  Each supplier sets a price for his documents and different 
> suppliers will provide different pricing.
>
> That wouldn't be difficult except that *users* are divided up into different 
> groups and depending on which group they are in, the supplier will charge the 
> user a different price.  So, user A may pay one price for a document and user 
> B may pay a different price for the same document just because user A and 
> user B are in different groups.  I don't even know if the relative order or 
> pricing is the same between different groups (e.g., if document X is more 
> expensive than document Y for a user in group M, it may not be more expensive 
> for a user in group N).  The one thing that may make this doable is that 
> supplier A will likely have the same price for all of his documents for each 
> of the user groups.  So, a user in group A will pay the same price regardless 
> of which document he buys from supplier 1.  A user in group B will also pay 
> the same price for any document from supplier 1; it's just that a user in 
> group B will likely pay a different price than a user in group A.  So, within 
> a supplier, the price varies based on user group, not the document.
>
> To summarize, one of the requirements for the system is that we provide the 
> ability to sort search results based on price.  This would be easy except 
> that the price a user pays not only depends on what he wants to buy, but on 
> what group the he is in.
>
> I suspect there is some kind of custom solr module I'm going to have to 
> write.  I'm thinking that the user group gets passed in as a custom solr 
> parameter (I'm assuming that's possible??).  Then I'm thinking that there has 
> to be some kind of in memory database that tracks pricing based on user group 
> and document supplier).
>
> I'm happy to go read code, documents, links, etc if someone can point me in 
> the right direction.  What kind of solr module am I likely going to write 
> (extend) and are there some examples somewhere?  Maybe there's a way to do 
> this without having to extend a solr module??
>
> Hope this makes sense.  Any help is appreciated.
>
> Scott
>
>

Help on custom sort

2014-09-20 Thread Scott Smith

I need to provide a custom sort option for sorting by price and I would like 
some suggestions.  It's not the straightforward "just sort by a price field in 
the document" scenario or I wouldn't be asking for help.  Here's the scenario 
I'm dealing with.

I have 100 million+ documents (so multi-sharded).  Users search for documents 
they are interested in using a standard keyword search.  They then purchase 
documents they are interested in.  So far, nothing hard.

Here's where things get "interesting".  The documents come from multiple 
suppliers.  Each supplier sets a price for his documents and different 
suppliers will provide different pricing.

That wouldn't be difficult except that *users* are divided up into different 
groups and depending on which group they are in, the supplier will charge the 
user a different price.  So, user A may pay one price for a document and user B 
may pay a different price for the same document just because user A and user B 
are in different groups.  I don't even know if the relative order or pricing is 
the same between different groups (e.g., if document X is more expensive than 
document Y for a user in group M, it may not be more expensive for a user in 
group N).  The one thing that may make this doable is that supplier A will 
likely have the same price for all of his documents for each of the user 
groups.  So, a user in group A will pay the same price regardless of which 
document he buys from supplier 1.  A user in group B will also pay the same 
price for any document from supplier 1; it's just that a user in group B will 
likely pay a different price than a user in group A.  So, within a supplier, 
the price varies based on user group, not the document.

To summarize, one of the requirements for the system is that we provide the 
ability to sort search results based on price.  This would be easy except that 
the price a user pays not only depends on what he wants to buy, but on what 
group the he is in.

I suspect there is some kind of custom solr module I'm going to have to write.  
I'm thinking that the user group gets passed in as a custom solr parameter (I'm 
assuming that's possible??).  Then I'm thinking that there has to be some kind 
of in memory database that tracks pricing based on user group and document 
supplier).

I'm happy to go read code, documents, links, etc if someone can point me in the 
right direction.  What kind of solr module am I likely going to write (extend) 
and are there some examples somewhere?  Maybe there's a way to do this without 
having to extend a solr module??

Hope this makes sense.  Any help is appreciated.

Scott

Tie breakers when sorting equal items

2014-01-26 Thread Scott Smith

I promised to ask this on the forum just to confirm what I assume is true.

Suppose you're returning results using a sort order based on some field (so, 
not relevancy). For example, suppose it's a date field which indicates when the 
document was loaded into the solr index.   Suppose two items have exactly the 
same date/time in the field.  Would solr return the two items in the order in 
which they were inserted.  I would assume that the answer is "not necessarily".

I know that you can have secondary sort fields if something exists that would 
provide the desired functionality.  I know that I could set up some kind of 
numbering scheme that would provide the same result (the customer doesn't want 
to pay for that).

So, I'm really just asking if Solr has any guarantees that when you sort on a 
field and two items have the same value, they will be sorted in the order they 
were inserted into the index.  Again, I assume the answer is "no", but I said I 
would ask.

Solr query processing

2013-09-23 Thread Scott Smith

I just want to state a couple of things and hear someone say, "that's right".


1.   In a solr query you can have multiple fq's, but only a single q.  And 
yes, I can simply AND the multiple "q"s together.  Just want to avoid that if 
I'm wrong.

2.   A subtler issue is that when a full query is executied, Solr must look 
at the schema to see how each field was tokenized (or not) and the various 
other filters applied to a field so that it can properly transform fields data 
(e.g., tokenize the text, but not keywords).  As an aside, it would be nice if 
the queryparser could do the same thing in Lucene (I know, wrong forum :)).
Scott

RE: Custom Solr indexer/searcher

2012-11-16 Thread Scott Smith

Thanks for the suggestions.  I'll take a look at these things.

-Original Message-
From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] 
Sent: Thursday, November 15, 2012 11:54 PM
To: solr-user@lucene.apache.org
Subject: Re: Custom Solr indexer/searcher

Scott,
It sounds like you need to look into few samples of similar things in Lucene. 
On top of my head FuzzyQuery from 4.0, which finds terms similar to the given 
in FST for query expansion. Generic query expansion is done via MultiTermQuery. 
Index time terms expansion is shown in TrieField and btw NumericRangeQuery (it 
should match with your goal a lot). All these are single dimension samples, but 
AFAIK KD-tree is multidimensional, look into GeoHashField which puts two 
dimensional points into single terms with ability to build ranges on them see 
GeoHashField.createSpatialQuery().

Happy hacking!

On Fri, Nov 16, 2012 at 10:34 AM, John Whelan  wrote:

> Scott,
>
> I probably have no idea as to what I'm saying, but if you're looking 
> for finding results in a N-dimensional space, you might look at 
> creating a field of type 'point'. Point-type fields have a dimension 
> attribute; I believe that it can be set to a large integer value.
>
> Barring that, there is also a 'dist()' function that can be used to 
> work with multiple numeric fields in order sort results based on 
> closeness to a desired coordinate. The 'dist function takes a 
> parameter to specify the means of calculating the distance. (For example, 2 
> -> 'Euclidean distance'.
> I don't know the other options.)
>
> In the worst case, my response is worthless, but pops your question 
> back up in the e-mails...
>
> Regards,
> John
>

--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

Custom Solr indexer/searcher

2012-11-13 Thread Scott Smith

Suppose I have a special data search type (something different than a string or 
numeric value) that I want to integrate into the Solr server.  For example, 
suppose I wanted to implement a KD-tree as a filter that would integrate with 
standard Solr filters and queries.  I might want to say "find all of the 
documents in the index with the word 'tree' in them that are within a certain 
distance of a particular document in the KD-tree".  Let me add that I'm not 
really looking for a KD-Tree implementation for Solr; I just assume that a fair 
number of people will know what a KD-tree is and so, have some idea that I'm 
talking about adding a new data type (different than string, long, etc.) that 
Solr will need to be able to index and search with.  It's important that the 
new data type should integrate with the existing standard Solr data types for 
searching purposes.

First, is there a way to build and specify a plugin that provides Solr both the 
indexer and search interfaces and therefore hides the internal details of 
what's going on in the search from Solr so it just thinks it's another search 
type?  Or, would I have to hack Solr in a lot of places to add my custom data 
type in?

Second, if the interface(s) exists to add in a new data type, is there 
documentation (tutorial, examples, etc.) anywhere on how to do this.  Or, is my 
only option to dig into the Solr code?

Mostly, I'm looking for some links or suggestions on where to start looking.  I 
doubt this subject is simple enough to fit into an email post (though I'd be 
happy to be surprised :) ).  You can assume Solr 4.0 if that makes things 
easier.  You can also assume that I have some familiarity with Lucene (though I 
haven't hacked that code either).

Hopefully, I've explained this well enough so that people know what I'm looking 
for.

Cheers

Scott

RE: Exception in Solr server on "more like this"

2011-12-22 Thread Scott Smith

This turned out to be SOLR-2986.

-Original Message-
From: Scott Smith [mailto:ssm...@mainstreamdata.com] 
Sent: Thursday, December 22, 2011 1:24 PM
To: solr-user@lucene.apache.org
Subject: Exception in Solr server on "more like this"

I've been trying to get "More like this" running under solr 3.5.  I get the 
Exception below. The http request is also highlighted below.

I've looked at the FieldType code and I don't understand what's going on there. 
 So, while I know what a null pointer exception means, it isn't telling me what 
I did or didn't do.

FYI - the "Body" field has termVectors set to "true" which I thought was 
sufficient for MLT.

What I'm trying to do is submit the phrase "country now is the time country" to 
MLT to determine the "interesting words" (which I want returned) and then 
return the top most relevant documents.

Any help on what might be wrong would be appreciated.

Scott

6975 [main] INFO com.mainstreamdata.MediasIndexer.mediasBrowser.SearchFactory  
- SearchFactory:SearchFactory: Search Factory initialized
SolrQuery:: (country now is the time country)
Filter:: (Language:en)
15274 [main] ERROR com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch  - 
SolrSearch:getDocTier: Unable to do search:
org.apache.solr.client.solrj.SolrServerException: Error executing query
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
at 
org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:266)
at 
com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getTier(SolrSearch.java:309)
at 
com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getFirstTier(SolrSearch.java:93)
at 
com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getNextOlderTier(SolrSearch.java:175)
at 
com.mainstreamdata.MediasIndexer.SolrMgrTest.testMoreLikeThis(SolrMgrTest.java:209)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at junit.framework.TestCase.runTest(TestCase.java:164)
at junit.framework.TestCase.runBare(TestCase.java:130)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:120)
at junit.framework.TestSuite.runTest(TestSuite.java:230)
at junit.framework.TestSuite.run(TestSuite.java:225)
at 
org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: org.apache.solr.common.SolrException: null  
java.lang.NullPointerException
   at 
org.apache.solr.schema.FieldType.storedToIndexed(FieldType.java:374)
at 
org.apache.solr.handler.MoreLikeThisHandler$MoreLikeThisHelper.getMoreLikeThis(MoreLikeThisHandler.java:320)
   at 
org.apache.solr.handler.component.MoreLikeThisComponent.getMoreLikeThese(MoreLikeThisComponent.java:82)
   at 
org.apache.solr.handler.component.MoreLikeThisComponent.process(MoreLikeThisComponent.java:57)
   at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:208)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
   at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
   at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
   at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:

Exception in Solr server on "more like this"

2011-12-22 Thread Scott Smith

I've been trying to get "More like this" running under solr 3.5.  I get the 
Exception below. The http request is also highlighted below.

I've looked at the FieldType code and I don't understand what's going on there. 
 So, while I know what a null pointer exception means, it isn't telling me what 
I did or didn't do.

FYI - the "Body" field has termVectors set to "true" which I thought was 
sufficient for MLT.

What I'm trying to do is submit the phrase "country now is the time country" to 
MLT to determine the "interesting words" (which I want returned) and then 
return the top most relevant documents.

Any help on what might be wrong would be appreciated.

Scott

6975 [main] INFO com.mainstreamdata.MediasIndexer.mediasBrowser.SearchFactory  
- SearchFactory:SearchFactory: Search Factory initialized
SolrQuery:: (country now is the time country)
Filter:: (Language:en)
15274 [main] ERROR com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch  - 
SolrSearch:getDocTier: Unable to do search:
org.apache.solr.client.solrj.SolrServerException: Error executing query
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
at 
org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:266)
at 
com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getTier(SolrSearch.java:309)
at 
com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getFirstTier(SolrSearch.java:93)
at 
com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getNextOlderTier(SolrSearch.java:175)
at 
com.mainstreamdata.MediasIndexer.SolrMgrTest.testMoreLikeThis(SolrMgrTest.java:209)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at junit.framework.TestCase.runTest(TestCase.java:164)
at junit.framework.TestCase.runBare(TestCase.java:130)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:120)
at junit.framework.TestSuite.runTest(TestSuite.java:230)
at junit.framework.TestSuite.run(TestSuite.java:225)
at 
org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: org.apache.solr.common.SolrException: null  
java.lang.NullPointerException
   at 
org.apache.solr.schema.FieldType.storedToIndexed(FieldType.java:374)
at 
org.apache.solr.handler.MoreLikeThisHandler$MoreLikeThisHelper.getMoreLikeThis(MoreLikeThisHandler.java:320)
   at 
org.apache.solr.handler.component.MoreLikeThisComponent.getMoreLikeThese(MoreLikeThisComponent.java:82)
   at 
org.apache.solr.handler.component.MoreLikeThisComponent.process(MoreLikeThisComponent.java:57)
   at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:208)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
   at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
   at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
   at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at 
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
   at 
org.mortbay.jetty.handler

RE: MoreLikeThis questions

2011-12-09 Thread Scott Smith

OK.  I just found Juan Grande's 7/1/2011 post.  It seems like that gives me 
some ideas on the second question.

I still don't know what to do about the first question.  Maybe if I saw the 
Request xml, it would give me a hint what to do with the solrj stuff.

Anybody have any thoughts?

Scott

-Original Message-----
From: Scott Smith [mailto:ssm...@mainstreamdata.com] 
Sent: Friday, December 09, 2011 3:14 PM
To: solr-user@lucene.apache.org
Subject: RE: MoreLikeThis questions

I realized I probably should have said Solr 3.5 in case that makes a difference.

-Original Message-----
From: Scott Smith [mailto:ssm...@mainstreamdata.com] 
Sent: Friday, December 09, 2011 2:29 PM
To: solr-user@lucene.apache.org
Subject: MoreLikeThis questions

I'm implementing a MoreLikeThis  search.  I have a couple of questions.  I'm 
implementing this with solrj so I would appreciate it if any code snippets 
reflect that.

First, I want to provide the text that Solr should check for "interesting 
words" and do the search on.  This means I don't want to specify a document in 
the collection.  I think the documentation implies I can do this.  However, it 
seems like using the "q" parameter would be the wrong thing since I think it 
would just take doc 0 of the result of searching the default field with those 
words.  However, I don't see any other parameter that looks like it's the 
correct one.

Second, I need to access the "interesting terms".  It's not clear to me how to 
get these.  I see the parameter I need to set to have the interesting terms 
included in the response.  I'm just not sure how to get at them with solrj once 
the response comes back.

Can someone point me to examples of how to do this?

RE: MoreLikeThis questions

2011-12-09 Thread Scott Smith

I realized I probably should have said Solr 3.5 in case that makes a difference.

-Original Message-
From: Scott Smith [mailto:ssm...@mainstreamdata.com] 
Sent: Friday, December 09, 2011 2:29 PM
To: solr-user@lucene.apache.org
Subject: MoreLikeThis questions

I'm implementing a MoreLikeThis  search.  I have a couple of questions.  I'm 
implementing this with solrj so I would appreciate it if any code snippets 
reflect that.

First, I want to provide the text that Solr should check for "interesting 
words" and do the search on.  This means I don't want to specify a document in 
the collection.  I think the documentation implies I can do this.  However, it 
seems like using the "q" parameter would be the wrong thing since I think it 
would just take doc 0 of the result of searching the default field with those 
words.  However, I don't see any other parameter that looks like it's the 
correct one.

Second, I need to access the "interesting terms".  It's not clear to me how to 
get these.  I see the parameter I need to set to have the interesting terms 
included in the response.  I'm just not sure how to get at them with solrj once 
the response comes back.

Can someone point me to examples of how to do this?

MoreLikeThis questions

2011-12-09 Thread Scott Smith

I'm implementing a MoreLikeThis  search.  I have a couple of questions.  I'm 
implementing this with solrj so I would appreciate it if any code snippets 
reflect that.

First, I want to provide the text that Solr should check for "interesting 
words" and do the search on.  This means I don't want to specify a document in 
the collection.  I think the documentation implies I can do this.  However, it 
seems like using the "q" parameter would be the wrong thing since I think it 
would just take doc 0 of the result of searching the default field with those 
words.  However, I don't see any other parameter that looks like it's the 
correct one.

Second, I need to access the "interesting terms".  It's not clear to me how to 
get these.  I see the parameter I need to set to have the interesting terms 
included in the response.  I'm just not sure how to get at them with solrj once 
the response comes back.

Can someone point me to examples of how to do this?

To optimize or not - Solr vs Lucene

2011-12-06 Thread Scott Smith

Wasn't sure which mailing list to send this to.  I'm writing an application 
that can be configured to run directly with lucene or with solr and I'm trying 
to figure out whether optimization of the index should be totally eliminated, 
eliminated in the lucene case only or what.

If I read the 3.5 lucene javadocs, optimize() has been deprecated because it is 
"rarely justified" with the current lucene index implementation (I started with 
lucene in the 1.42 days when I think it was pretty much a necessity).  However, 
If I read the lucid imagination 3.4 manual (page 176), it talks about how 
optimizing will merge a lot of small blocks together making the index more 
efficient-which is exactly what I thought optimize did.  Since solr is based on 
lucene, I'm wondering if the 3.4 manual is simply out-of-date on this point or 
whether there is something else going on.

Our application is indexing content in "real time" and so the index changes 
frequently during the day.  Some of our indexes only contain a few hundred 
thousand documents.  However, in one of our applications there are over 50 
million documents (using Solr with multiple shards).  I thought optimization 
was a way to keep the index segments merged and thus make the searching more 
efficient.  I thought it was especially needed if the index was being updated 
frequently.

When should I optimize?

Thanks in advance for any feedback.

Scott

RE: Lucene->SOLR transition

2011-09-19 Thread Scott Smith

OK.  Thanks for all of the suggestions.

Cheers

Scott

-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com] 
Sent: Monday, September 19, 2011 3:27 AM
To: solr-user@lucene.apache.org
Subject: Re: Lucene->SOLR transition

On Sep 18, 2011, at 19:43 , Michael Sokolov wrote:

> On 9/15/2011 8:30 PM, Scott Smith wrote:
>> 
>> 2.   Assuming that the answer to 1 is "correct", then is there an easy 
>> way to take a lucene query (with nested Boolean queries, filter queries, 
>> etc.) and generate a SOLR query string with q and fq components?
>> 
>> 
> I believe that Query.toString() will probably get you back something that can 
> be parsed in turn by the traditional lucene QueryParser, thus completing the 
> circle and returning your original Query.  But why would you want to do that?

No, you can't rely on Query.toString() roundtripping (think stemming, for 
example - but many other examples that won't work that way too).

What you can do, since you know Lucene's API well, is write a QParser(Plugin) 
that takes request parameters as strings and generates the Query from that like 
you are now with your Lucene app.

Erik

Lucene->SOLR transition

2011-09-15 Thread Scott Smith

I've been using lucene for a number of years.  We've now decided to move to 
SOLR.  I have a couple of questions.


1.   I'm used to creating Boolean queries, filter queries, term queries, 
etc. for lucene.  Am I right in thinking that for SOLR my only option is 
creating string queries (with q and fq components) for solrj?

2.   Assuming that the answer to 1 is "correct", then is there an easy way 
to take a lucene query (with nested Boolean queries, filter queries, etc.) and 
generate a SOLR query string with q and fq components?

Thanks

Scott

ANTLR SOLR query/filter parser

2011-08-01 Thread Scott Smith

I'm looking for an ANTLR parser that consumes solr queries and filters.  Before 
I write my own, thought I'd ask if anyone has one they are willing to share or 
can point me to one?

Thanks

Scott

RE: Facet? Search problem

RE: Facet? Search problem

Facet? Search problem

Accessing document stored fields in a custom function

RE: Help on custom sort

RE: Help on custom sort

Help on custom sort

Tie breakers when sorting equal items

Solr query processing

RE: Custom Solr indexer/searcher

Custom Solr indexer/searcher

RE: Exception in Solr server on "more like this"

Exception in Solr server on "more like this"

RE: MoreLikeThis questions

RE: MoreLikeThis questions

MoreLikeThis questions

To optimize or not - Solr vs Lucene

RE: Lucene->SOLR transition

Lucene->SOLR transition

ANTLR SOLR query/filter parser

20 matches

Site Navigation

Mail list logo

Footer information