Re: Parallelizing queries without Custom Component

2018-01-15 Thread Max Bridgewater
Thanks Emir. Looks indeed like what I need.

On Mon, Jan 15, 2018 at 11:33 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Max,
> It seems to me that you are looking for grouping
> https://lucene.apache.org/solr/guide/6_6/result-grouping.html <
> https://lucene.apache.org/solr/guide/6_6/result-grouping.html> or field
> collapsing https://lucene.apache.org/solr/guide/6_6/collapse-and-
> expand-results.html <https://lucene.apache.org/
> solr/guide/6_6/collapse-and-expand-results.html> feature.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 15 Jan 2018, at 17:27, Max Bridgewater <max.bridgewa...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > My index is composed of product reviews. Each review contains the id of
> the
> > product it refers to. But it also contains a rating for this product and
> > the number of negative feedback provided on this product.
> >
> > {
> >   id: solr doc id,
> >   rating: number between 0 and 5,
> >   product_id: the product that is being reviewed,
> >   negative_feedback: how many negative feedbacks on this product
> > }
> >
> > The query below returns the "worst" review for the given product
> 7453632.
> > Worst is defined as  rated 1 to 3 and having the highest number of
> negative
> > feedback.
> >
> > /select?q=product_id:7453632=rating:[1 TO 3]=negative_feedback
> > desc=1
> >
> > The query works as intended. Now the challenging part is to extend this
> > query to support many product_id. If executed with many product Id, the
> > result should be the list of worst reviews for all the provided products.
> >
> > A query of the following form would return the list of worst products for
> > products: 7453632,645454,534664.
> >
> > /select?q=product_id:[7453632,645454,534664]=rating:[1 TO
> > 3]=negative_feedback desc
> >
> > Is there a way to do this in Solr without custom component?
> >
> > Thanks.
> > Max
>
>


Parallelizing queries without Custom Component

2018-01-15 Thread Max Bridgewater
Hi,

My index is composed of product reviews. Each review contains the id of the
product it refers to. But it also contains a rating for this product and
the number of negative feedback provided on this product.

{
   id: solr doc id,
   rating: number between 0 and 5,
   product_id: the product that is being reviewed,
   negative_feedback: how many negative feedbacks on this product
}

The query below returns the "worst" review for the given product  7453632.
Worst is defined as  rated 1 to 3 and having the highest number of negative
feedback.

/select?q=product_id:7453632=rating:[1 TO 3]=negative_feedback
desc=1

The query works as intended. Now the challenging part is to extend this
query to support many product_id. If executed with many product Id, the
result should be the list of worst reviews for all the provided products.

A query of the following form would return the list of worst products for
products: 7453632,645454,534664.

/select?q=product_id:[7453632,645454,534664]=rating:[1 TO
3]=negative_feedback desc

Is there a way to do this in Solr without custom component?

Thanks.
Max


Do I need to declare TermVectorComponent for best MoreLikeThis results?

2017-07-12 Thread Max Bridgewater
Hi,

The MLT documentation says that for best results, the fields should have
stored term vectors in schema.xml, with:



My question: should I also create the TermVectorComponent and declare it in
the search handler?

In other terms, do I have to do this in my solrconfig.xml for best results?




  
true
  
  
tvComponent
  



I am seeing continuously increasing MLT response times and I am wondering
if I am doing something wrong.

Thanks.
Max.


MoreLikeThis Clarifications

2017-06-22 Thread Max Bridgewater
I am trying to confirm my understanding of MLT after going through
following page:
https://cwiki.apache.org/confluence/display/solr/MoreLikeThis.

Three approaches are mentioned:

1) Use it as a request handler and send text to the MoreLikeThis request
handler as needed.
2) Use it as a search component and MLT is performed on every document
returned
3) You use it as a request handler but with externally supplied text.


What are example queries in each case and what config changes are required
for each case?

There is also MLTQParser. When can I use this parser as opposed to use any
of the three above approaches?

Thanks,
Max.


Re: Phrase Exact Match with Margin of Error

2017-06-15 Thread Max Bridgewater
Thanks Susheel. The challenge is that if I search for the word "between"
alone, I still get plenty of results. In a way I want the query to  match
the document title exactly (up to a few characters) and the document title
match the query exactly (up to a few characters). KeywordTokenizer allows
that. But complexphrase does not seem to work with KeywordTokenizer.

On Thu, Jun 15, 2017 at 10:23 AM, Susheel Kumar <susheel2...@gmail.com>
wrote:

> CompledPhraseQuery parser is what you need to look
> https://cwiki.apache.org/confluence/display/solr/Other+
> Parsers#OtherParsers-ComplexPhraseQueryParser.
> See below for e.g.
>
>
>
> http://localhost:8983/solr/techproducts/select?debugQuery=on=on=
> manu:%22Bridge%20the%20gat~1%20between%20your%20skills%
> 20and%20your%20goals%22=complexphrase
>
> On Thu, Jun 15, 2017 at 5:59 AM, Max Bridgewater <
> max.bridgewa...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I am trying to do phrase exact match. For this, I use
> > KeywordTokenizerFactory. This basically does what I want to do. My field
> > type is defined as follows:
> >
> >  > positionIncrementGap="100">
> >   
> > 
> > 
> >   
> >   
> > 
> > 
> >   
> > 
> >
> >
> > In addition to this, I want to tolerate typos of two or three letters. I
> > thought fuzzy search could allow me to accept this margin of error. But
> > this doesn't seem to work.
> >
> > A typical query I would have is:
> >
> > q=subjet:"Bridge the gap between your skills and your goals"
> >
> > Now, in this query, if I replace gap with gat, I was hoping I could do
> > something such as:
> >
> > q=subjet:"Bridge the gat between your skills and your goals"~0.8
> >
> > But this doesn't quite do what I am trying to achieve.
> >
> > Any suggestion?
> >
>


Phrase Exact Match with Margin of Error

2017-06-15 Thread Max Bridgewater
Hi,

I am trying to do phrase exact match. For this, I use
KeywordTokenizerFactory. This basically does what I want to do. My field
type is defined as follows:


  


  
  


  



In addition to this, I want to tolerate typos of two or three letters. I
thought fuzzy search could allow me to accept this margin of error. But
this doesn't seem to work.

A typical query I would have is:

q=subjet:"Bridge the gap between your skills and your goals"

Now, in this query, if I replace gap with gat, I was hoping I could do
something such as:

q=subjet:"Bridge the gat between your skills and your goals"~0.8

But this doesn't quite do what I am trying to achieve.

Any suggestion?


Invoking a SerachHandler inside Solr Plugin

2017-04-11 Thread Max Bridgewater
I am looking for best practices when a search component in one handler,
needs to invoke another handler, say /basic. So far, I got this working
prototype:

public void process(ResponseBuilder rb) throws IOException {
  SolrQueryResponse response = new SolrQueryResponse();
 ModifiableSolrParams params=new ModifiableSolrParams();
 params.add("defType",
"lucene").add("fl","product_id").add("wt","json").
add("df","competitor_product_titles").add("echoParams","explicit").add("q",rb.req.getParams().get("q"));
  SolrQueryRequest request= new
LocalSolrQueryRequest(rb.req.getCore(),params );
  SolrRequestHandler hdlr =
rb.req.getCore().getRequestHandler("/basic");
  rb.req.getCore().execute(hdlr, request, response);
  DocList
docList=((ResultContext)response.getValues().get("response")).docs;
 //Do some crazy stuff with the result
}


My concerns:

1) What is a clean way to read the /basic handler's default parameters
from solrconfig.xml and use them in LocalSolrQueryRequest().
2) Is there a better way to accomplish this task overall?


Thanks,
Max.


Re: Query.extractTerms dissapeared from 5.1.0 to 5.2.0

2017-02-01 Thread Max Bridgewater
Perfect. Thanks a lot.

On Wed, Feb 1, 2017 at 2:01 PM, Alan Woodward <a...@flax.co.uk> wrote:

> Hi, extractTerms() is now on Weight rather than on Query.
>
> Alan
>
> > On 1 Feb 2017, at 17:43, Max Bridgewater <max.bridgewa...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > It seems Query.extractTerms() disapeared from 5.1.0 (
> > http://lucene.apache.org/core/5_1_0/core/org/apache/lucene/
> search/Query.html)
> > to 5.2.0 (
> > http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/
> search/Query.html
> > ).
> >
> > However, I cannot find any comment on it in 5.2.0 release notes. Any
> > recommendation on what I should use in place of that method? I am
> migrating
> > some legacy code from Solr 4 to Solr 6.
> >
> > Thanks,
> > Max.
>
>


Query.extractTerms dissapeared from 5.1.0 to 5.2.0

2017-02-01 Thread Max Bridgewater
Hi,

It seems Query.extractTerms() disapeared from 5.1.0 (
http://lucene.apache.org/core/5_1_0/core/org/apache/lucene/search/Query.html)
to 5.2.0 (
http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/Query.html
).

However, I cannot find any comment on it in 5.2.0 release notes. Any
recommendation on what I should use in place of that method? I am migrating
some legacy code from Solr 4 to Solr 6.

Thanks,
Max.


Solr 6 Default Core URL

2016-12-13 Thread Max Bridgewater
I have one Solr core on my solr 6 instance and I can query it with:

http://localhost:8983/solr/mycore/search?q=*:*

Is there a way to configure solr 6 so that I can simply query it with this
simple URL?

http://localhost:8983/search?q=*:*


Thanks.
Max,


Re: Solr 6 Performance Suggestions

2016-11-28 Thread Max Bridgewater
Thanks again Folks. I tried each suggestion and none made any difference. I
am setting up a lab for performance monitoring using App Dynamics.
Hopefully I am able to figure out something.

On Mon, Nov 28, 2016 at 11:20 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> bq: If you know the maximum size you ever will need, setting Xmx is good.
>
> Not quite sure what you're getting at here. I pretty much guarantee that a
> production system will eat up the default heap size, so not setting Xmx
> will
> cause OOM errors pretty soon. Or did you mean Xms?
>
> As far as setting Xms, there are differing opinions, mostly though since
> Solr
> likes memory so much there's a lot of tuning to try to determine Xmx and
> it's pretty much guaranteed that Java will need close to that amount of
> memory.
> So setting Xms=Xmx is a minor optimization if that assumption is true.
> It's arguable
> how much practical difference it makes though.
>
> Best,
> Erick
>
> On Mon, Nov 28, 2016 at 2:14 AM, Florian Gleixner <f...@redflo.de> wrote:
> > Am 28.11.2016 um 00:00 schrieb Shawn Heisey:
> >>
> >> On 11/27/2016 12:51 PM, Florian Gleixner wrote:
> >>>
> >>> On 22.11.2016 14:54, Max Bridgewater wrote:
> >>>>
> >>>> test cases were exactly the same, the machines where exactly the same
> >>>> and heap settings exactly the same (Xms24g, Xmx24g). Requests were
> >>>> sent with
> >>>
> >>> Setting heap too large is a common error. Recent Solr use the
> >>> filesystem cache, so you don't have to set heap to the size of the
> >>> index. The avalible RAM has to be able to run the OS, run the jvm and
> >>> hold most of the index data in filesystem cache. If you have 32GB RAM
> >>> and a 20GB Index, then set -Xms never higher than 10GB. I personally
> >>> would set -Xms to 4GB and omit -Xmx
> >>
> >>
> >> In my mind, the Xmx setting is much more important than Xms.  Setting
> >> both to the same number avoids any need for Java to detect memory
> >> pressure before increasing the heap size, which can be helpful.
> >>
> >
> > From https://cwiki.apache.org/confluence/display/solr/JVM+Settings
> >
> > "The maximum heap size, set with -Xmx, is more critical. If the memory
> heap
> > grows to this size, object creation may begin to fail and throw
> > OutOfMemoryException. Setting this limit too low can cause spurious
> errors
> > in your application, but setting it too high can be detrimental as well."
> >
> > you are right, Xmx is more important. But setting Xms to Xmx will waste
> RAM,
> > that the OS can use to cache your index data. Setting Xmx can avoid
> problems
> > in some situations where solr can eat up your filesystem cache until the
> > next GC has been finished.
> >
> >> Without Xmx, Java is in control of the max heap size, and it may not
> >> make the correct choice.  It's important to know what your max heap is,
> >> because chances are excellent that the max heap *will* be reached.  Solr
> >> allocates a lot of memory to do its job.
> >>
> >
> > If you know the maximum size you ever will need, setting Xmx is good.
> >
> >
> >
> >
>


Re: Solr 6 Performance Suggestions

2016-11-25 Thread Max Bridgewater
Thanks folks. It looks like the sweet spot where I get comparable results
is at 30 concurrent threads. It progressively degrades from there as I
increases the number of concurrent threads in the test script.

This made me think that something is configured in Tomcat ((Solr4) that is
not comparatively set in Solr 6. The only thing I found that would make
sense is the connector max number threads that we have set at 800 for
Tomcat. However, it jetty.xml, maxThreads is set to 5. Not sure if
these two maxThreads have the same effect.

I thought about Yonik suggestion a little bit. Where I am scratching my
head is that if specific kind of queries where more expensive than others,
should this be reflected even at 30 concurrent threads?

Anyway, still digging.

On Wed, Nov 23, 2016 at 9:56 AM, Walter Underwood <wun...@wunderwood.org>
wrote:

> I recently ran benchmarks on 4.10.4 and 6.2.1 and found very little
> difference in query performance.
>
> This was with 8 million documents (homework problems) from production. I
> used query logs from
> production. The load is a constant number of requests per minute from 100
> threads. CPU usage
> is under 50% in order to avoid congestion. The benchmarks ran for 100
> minutes.
>
> Measuring median and 95th percentile, the times were within 10%. I think
> that is within the
> repeatability of the benchmark. A different number of GCs could make that
> difference.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Nov 23, 2016, at 8:14 AM, Bram Van Dam <bram.van...@intix.eu> wrote:
> >
> > On 22/11/16 15:34, Prateek Jain J wrote:
> >> I am not sure but I heard this in one of discussions, that you cant
> migrate directly from solr 4 to solr 6. It has to be incremental like solr
> 4 to solr 5 and then to solr 6. I might be wrong but is worth trying.
> >
> > Ideally the index needs to be upgraded using the IndexUpgrader.
> >
> > Something like this should do the trick:
> >
> > java -cp lucene-core-6.0.0.jar:lucene-backward-codecs-6.0.0.jar
> > org.apache.lucene.index.IndexUpgrader /path/to/index
> >
> > - Bram
>
>


Solr 6 Performance Suggestions

2016-11-22 Thread Max Bridgewater
I migrated an application from Solr 4 to Solr 6.  solrconfig.xml  and
schema.xml are sensibly the same. The JVM params are also pretty much
similar.  The indicces have each about 2 million documents. No particular
tuning was done to Solr 6 beyond the default settings. Solr 4 is running in
Tomcat 7.

Early results seem to show Solr 4 outperforming Solr 6. The first shows an
average response time of 280 ms while the second averages at 430 ms. The
test cases were exactly the same, the machines where exactly the same and
heap settings exactly the same (Xms24g, Xmx24g). Requests were sent with
Jmeter with 50 concurrent threads for 2h.

I know that this is not enough information to claim that Solr 4 generally
outperforms Solr 6. I also know that this pretty much depends on what the
application does. So I am not claiming anything general. All I want to do
is get some input before I start digging.

What are some things I could tune to improve the numbers for Solr 6? Have
you guys experienced such discrepancies?

Thanks,
Max.


Re: Edismax query parsing in Solr 4 vs Solr 6

2016-11-12 Thread Max Bridgewater
Hi Greg,

Your analysis is SPOT ON. I did some debugging and found out that we had
q.op in the default set to AND. And when I changed that to OR, things
worked exactly as in Solr 4. So, it seemed Solr 6 was behaving as is
should. What I could not explain was whether Solr 4 was using the
configured q.op that was set in the default or not. But your explanation
makes sense now.

Thanks,
Max.



On Sat, Nov 12, 2016 at 4:54 PM, Greg Pendlebury <greg.pendleb...@gmail.com>
wrote:

> This has come up a lot on the lists lately. Keep in mind that edismax
> parses your query uses additional parameters such as 'mm' and 'q.op'. It is
> the handling of these parameters (and the selection of default values)
> which has changed between versions to address a few functionality gaps.
>
> The most common issue I've seen is where users were not setting those
> values and relying on the defaults. You might now need to set them
> explicitly to return to desired behaviour.
>
> I can't see all of your configuration, but I'm guessing the important one
> here is 'q.op', which was previously hard coded to 'OR', irrespective of
> either parameters or solrconfig. Try setting that to 'OR' explicitly...
> maybe you have your default operator set to 'AND' in solrconfig and that is
> now being applied? The other option is 'mm', which I suspect should be set
> to '0' unless you have some reason to want it. If it was set to '100%' it
> might insert the additional '+' flags, but it can also show up as a '~'
> operator on the end.
>
> Ta,
> Greg
>
> On 8 November 2016 at 22:13, Max Bridgewater <max.bridgewa...@gmail.com>
> wrote:
>
> > I am migrating a solr based app from Solr 4 to Solr 6.  One of the
> > discrepancies I am noticing is around edismax query parsing. My code
> makes
> > the following call:
> >
> >
> >  userQuery="+(title:shirts isbn:shirts) +(id:20446 id:82876)"
> >   Query query=QParser.getParser(userQuery, "edismax", req).getQuery();
> >
> >
> > With Solr 4, query becomes:
> >
> > +(+(title:shirt isbn:shirts) +(id:20446 id:82876))
> >
> > With Solr 6 it however becomes:
> >
> > +(+(+title:shirt +isbn:shirts) +(+id:20446 +id:82876))
> >
> > Digging deeper, it appears that parseOriginalQuery() in
> > ExtendedDismaxQParser is adding those additional + signs.
> >
> >
> > Is there a way to prevent this altering of queries?
> >
> > Thanks,
> > Max.
> >
>


Edismax query parsing in Solr 4 vs Solr 6

2016-11-08 Thread Max Bridgewater
I am migrating a solr based app from Solr 4 to Solr 6.  One of the
discrepancies I am noticing is around edismax query parsing. My code makes
the following call:


 userQuery="+(title:shirts isbn:shirts) +(id:20446 id:82876)"
  Query query=QParser.getParser(userQuery, "edismax", req).getQuery();


With Solr 4, query becomes:

+(+(title:shirt isbn:shirts) +(id:20446 id:82876))

With Solr 6 it however becomes:

+(+(+title:shirt +isbn:shirts) +(+id:20446 +id:82876))

Digging deeper, it appears that parseOriginalQuery() in
ExtendedDismaxQParser is adding those additional + signs.


Is there a way to prevent this altering of queries?

Thanks,
Max.


BooleanQuery Migration from Solr 4 to SOlr 6

2016-07-18 Thread Max Bridgewater
HI Folks,

I am tasked with migrating a Solr app from Solr 4 to Solr 6. This solr app
is in essence a bunch of solr components/handlers. One part that challenges
me is BooleanQuery immutability in Solr 6.

Here is the challenge: In our old code base, we had classes that
implemented custom interfaces and extended BooleanQuery. These custom
interfaces were essentially markers that told our various components where
the user came from. Based on the user's origin, different pieces of logic
would apply.

Now, in Solr 6, our custom boolean query  can no longer extend BooleanQuery
since BooleanQuery only has a private constructor. I am looking for a clean
solution to this problem.

Here are some ideas I had:

1) Remove the logic that depends on the custom boolean query => Big risk to
our search logic
2) Simply remove BooleanQuery as super class of custom boolean query =>
Major risk. Wherever we do “if(query instanceof BooleanQuery) “, we would
not catch our custom queries.
3) Remove BooleanQuery as parent to the custom query (e.g. make it extend
Query) AND Refactor to move all “if(query instanceof BooleanQuery) “ into a
dedicated method: isCustomBooleanQuery. This would return “query instanceof
BooleanQuery || “query instanceof CustomQuery“. We then need to change ALL
20 occurrences of this test and ensure we handle both cases appropriately.
==> Very invasive.
4) Add a method createCustomQuery() that would return a boolean query
wherein a special clause is added that allows us to identify our custom
queries.  This special clause should not impact search results. => Pretty
ugly.


Other potential clean, low risk, and less invasive solution?


Max.


Determine Containing Handler

2016-05-19 Thread Max Bridgewater
Hi,

I am implementing a component that needs to redirect calls to the handler
that originally called it. Say the call comes to handler /search, the
component would then do some processing and, alter the query and then send
the query back to /search again.

It works great. The only issue is that the handler is not always called
/search, leading me to have to force people to pass the handler name as
parameter to the component, which is not ideal.

The question thus is: is there a way to find out what handler a component
was invoked from?

I checked in SolrCore and SolrQueryRequest I can't seem to find a method
that would do this.

Thanks,
Max.


Re: Function Query Parsing problem in Solr 5.4.1 and Solr 5.5.0

2016-04-02 Thread Max Bridgewater
Thank you Mike, that was it.

Max.

On Sat, Apr 2, 2016 at 2:40 AM, Mikhail Khludnev <mkhlud...@griddynamics.com
> wrote:

> Hello Max,
>
> Since it reports the first space occurrence pos=32, I advise to nuke all
> spaces between braces  in sum().
>
> On Fri, Apr 1, 2016 at 7:40 PM, Max Bridgewater <max.bridgewa...@gmail.com
> >
> wrote:
>
> > Hi,
> >
> > I have the following configuration for firstSearcher handler in
> > solrconfig.xml:
> >
> >
> >   
> >   
> > 
> >   parts
> >   score desc, Review1 asc, Rank2 asc
> > 
> > 
> >   make
> >   {!func}sum(product(0.01,param1),
> > product(0.20,param2),  min(param2,0.4)) desc
> > 
> >   
> > 
> >
> > This works great in Solr 4.10. However, in solr 5.4.1 and solr 5.5.0, I
> get
> > the below error. How do I write this kind of query with Solr 5?
> >
> >
> > Thanks,
> > Max.
> >
> >
> > ERROR org.apache.solr.handler.RequestHandlerBase  [   x:productsearch] –
> > org.apache.solr.common.SolrException: Can't determine a Sort Order (asc
> or
> > desc) in sort spec '{!func}sum(product(0.01,param1),
> product(0.20,param2),
> > min(param2,0.4)) desc', pos=32
> > at
> >
> >
> org.apache.solr.search.SortSpecParsing.parseSortSpec(SortSpecParsing.java:143)
> > at org.apache.solr.search.QParser.getSort(QParser.java:247)
> > at
> >
> >
> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:18
> > 7)
> > at
> >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler
> > .java:247)
> > at
> >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.jav
> > a:156)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2073)
> > at
> >
> >
> org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:6
> > 9)
> > at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1840)
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> <mkhlud...@griddynamics.com>
>


Function Query Parsing problem in Solr 5.4.1 and Solr 5.5.0

2016-04-01 Thread Max Bridgewater
Hi,

I have the following configuration for firstSearcher handler in
solrconfig.xml:


  
  

  parts
  score desc, Review1 asc, Rank2 asc


  make
  {!func}sum(product(0.01,param1),
product(0.20,param2),  min(param2,0.4)) desc

  


This works great in Solr 4.10. However, in solr 5.4.1 and solr 5.5.0, I get
the below error. How do I write this kind of query with Solr 5?


Thanks,
Max.


ERROR org.apache.solr.handler.RequestHandlerBase  [   x:productsearch] –
org.apache.solr.common.SolrException: Can't determine a Sort Order (asc or
desc) in sort spec '{!func}sum(product(0.01,param1), product(0.20,param2),
min(param2,0.4)) desc', pos=32
at
org.apache.solr.search.SortSpecParsing.parseSortSpec(SortSpecParsing.java:143)
at org.apache.solr.search.QParser.getSort(QParser.java:247)
at
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:18
7)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler
.java:247)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.jav
a:156)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2073)
at
org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:6
9)
at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1840)


Re: Load Resource from within Solr Plugin

2016-03-31 Thread Max Bridgewater
Hi Folks,

Thanks for all the great suggestions. i will try and see which one works
best.
@Hoss: The WEB-INF folder is just in my dev environment. I have a localo
Solr instance and I points it to the target/WEB-INF. Simple convenient
setup for development purposes.

Much appreciated.

Max.

On Wed, Mar 30, 2016 at 4:24 PM, Rajesh Hazari <rajeshhaz...@gmail.com>
wrote:

> Max,
> Have you looked in External file field which is reload on every hard
> commit,
> only disadvantage of this is the file (personal-words.txt) has to be placed
> in all data folders in each solr core,
> for which we have a bash script to do this job.
>
>
> https://cwiki.apache.org/confluence/display/solr/Working+with+External+Files+and+Processes
>
> Ignore this if this does not meets your requirement.
>
> *Rajesh**.*
>
> On Wed, Mar 30, 2016 at 1:21 PM, Chris Hostetter <hossman_luc...@fucit.org
> >
> wrote:
>
> > :
> > :  > : regex=".*\.jar" />
> >
> > 1) as a general rule, if you have a  delcaration which includes
> > "WEB-INF" you are probably doing something wrong.
> >
> > Maybe not in this case -- maybe "search-webapp/target" is a completley
> > distinct java application and you are just re-using it's jars.  But 9
> > times out of 10, when people have a  WEB-INF path they are trying to load
> > jars from, it's because they *first* added their jars to Solr's WEB_INF
> > directory, and then when that didn't work they added the path to the
> > WEB-INF dir as a  ... but now you've got those classes being loaded
> > twice, and you've multiplied all of your problems.
> >
> > 2) let's ignore the fact that your path has WEB-INF in it, and just
> > assume it's some path to somewhere where on disk that has nothing to
> > do with solr, and you want to load those jars.
> >
> > great -- solr will do that for you, and all of those classes will be
> > available to plugins.
> >
> > Now if you wnat to explicitly do something classloader related, you do
> > *not* want to be using Thread.currentThread().getContextClassLoader() ...
> > because the threads that execute everything in Solr are a pool of worker
> > threads that is created before solr ever has a chance to parse your  > /> directive.
> >
> > You want to ensure anything you do related to a Classloader uses the
> > ClassLoader Solr sets up for plugins -- that's available from the
> > SolrResourceLoader.
> >
> > You can always get the SolrResourceLoader via
> > SolrCore.getSolrResourceLoader().  from there you can getClassLoader() if
> > you really need some hairy custom stuff -- or if you are just trying to
> > load a simple resource file as an InputStream, use openResource(String
> > name) ... that will start by checking for it in the conf dir, and will
> > fallback to your jar -- so you can have a default resource file shipped
> > with your plugin, but allow users to override it in their collection
> > configs.
> >
> >
> > -Hoss
> > http://www.lucidworks.com/
> >
>


Load Resource from within Solr Plugin

2016-03-29 Thread Max Bridgewater
HI,

I am facing the exact issue described here:
http://stackoverflow.com/questions/25623797/solr-plugin-classloader.

Basically I'm writing a solr plugin by extending SearchComponent class. My
new class is part of a.jar archive. Also my class depends on a jar b.jar. I
placed both jars in my own folder and declared in it solrconfig.xml with:



I also declared my new component in solrconfig.xml. The component is
invoked correctly up to a point where a class ClassFromB from b.jar
attempts to load a classpath resource personal-words.txt from classpath.

The piece of code in class ClassFromB looks like this:

Thread.currentThread().getContextClassLoader().getResources("personal-words.txt")


Unfortunately, this returns an empty list. Any recommendation?


Thanks,

Max.


5.4 facet performance thumbs-up

2015-12-22 Thread Aigner, Max
I'm happy to report that we are seeing significant speed-ups in our queries 
with Json facets on 5.4 vs regular facets on 5.1. Our queries contain mostly 
terms facets, many of them with exclusion tags and prefix filtering.
Nice work!



RE: JSON facets and excluded queries

2015-12-11 Thread Aigner, Max
Good to know, thank you. 

From an implementation standpoint that makes a lot of sense. 
We are only using facets of type 'term' for now and for those it works nicely. 
Our usual searches carry around 8-12 facets so we are covered from that side 
:-) 

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, December 11, 2015 3:12 PM
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: JSON facets and excluded queries

Do note that the number of threads also won't help much last I knew unless you 
are faceting over that many fields too. I.e. setting this to 5 while faceting 
on only 1 field won't help.

And it's not implemented for all facet types IIRC.

Best,
Erick

On Fri, Dec 11, 2015 at 1:07 PM, Aigner, Max <max.aig...@nordstrom.com> wrote:
> Answering one question myself after doing some testing on 5.3.1:
>
> Yes, facet.threads is still relevant with Json facets.
>
> We are seeing significant gains as we are increasing the number of threads 
> from 1 up to 4. Beyond that we only observed marginal  improvements -- which 
> makes sense because the test VM has 4 cores.
>
> -Original Message-
> From: Aigner, Max [mailto:max.aig...@nordstrom.com]
> Sent: Thursday, December 10, 2015 12:33 PM
> To: solr-user@lucene.apache.org
> Subject: RE: JSON facets and excluded queries
>
> Another question popped up around this:
> Is the facet.threads parameter still relevant with Json facets? I saw that 
> the facet prefix bug https://issues.apache.org/jira/browse/SOLR-6686 got 
> fixed in  5.3 so I'm looking into re-enabling this parameter for our searches.
>
> On a side note, I've been testing Json facet performance and I've observed 
> that they're generally  faster unless facet prefix filtering comes into play, 
> then they seem to be slower than standard facets.
> Is that just a fluke or should I switch to Json Query Facets instead of using 
> facet prefix filtering?
>
> Thanks again,
> Max
>
> -Original Message-
> From: Aigner, Max [mailto:max.aig...@nordstrom.com]
> Sent: Wednesday, November 25, 2015 11:54 AM
> To: solr-user@lucene.apache.org
> Subject: RE: JSON facets and excluded queries
>
> Yes, just tried that and it works fine.
>
> That just removed a showstopper for me as my queries contain lots of tagged 
> FQs and multi-select facets (implemented the 'good way' :).
>
> Thank you for the quick help!
>
> -Original Message-
> From: Yonik Seeley [mailto:ysee...@gmail.com]
> Sent: Wednesday, November 25, 2015 11:38 AM
> To: solr-user@lucene.apache.org
> Subject: Re: JSON facets and excluded queries
>
> On Wed, Nov 25, 2015 at 2:29 PM, Yonik Seeley <ysee...@gmail.com> wrote:
>> On Wed, Nov 25, 2015 at 2:15 PM, Aigner, Max <max.aig...@nordstrom.com> 
>> wrote:
>>> Thanks, this is great :=))
>>>
>>> I hadn't seen the domain:{excludeTags:...} syntax yet and it doesn't seem 
>>> to be working on 5.3.1 so I'm assuming this is work slated for 5.4 or 6. 
>>> Did I get that right?
>>
>> Hmmm, the "domain" keyword was added for 5.3 along with block join
>> faceting: http://yonik.com/solr-nested-objects/
>> That's when I switched "excludeTags" to also be under the "domain" keyword.
>>
>> Let me try it out...
>
> Ah, I messed up that migration...
> OK, for now, instead of
>   domain:{excludeTags:foo}
> just use
>   excludeTags:foo
> and it should work.
>
> -Yonik


RE: JSON facets and excluded queries

2015-12-11 Thread Aigner, Max
Answering one question myself after doing some testing on 5.3.1: 

Yes, facet.threads is still relevant with Json facets. 

We are seeing significant gains as we are increasing the number of threads from 
1 up to 4. Beyond that we only observed marginal  improvements -- which makes 
sense because the test VM has 4 cores. 

-Original Message-
From: Aigner, Max [mailto:max.aig...@nordstrom.com] 
Sent: Thursday, December 10, 2015 12:33 PM
To: solr-user@lucene.apache.org
Subject: RE: JSON facets and excluded queries

Another question popped up around this: 
Is the facet.threads parameter still relevant with Json facets? I saw that the 
facet prefix bug https://issues.apache.org/jira/browse/SOLR-6686 got fixed in  
5.3 so I'm looking into re-enabling this parameter for our searches. 

On a side note, I've been testing Json facet performance and I've observed that 
they're generally  faster unless facet prefix filtering comes into play, then 
they seem to be slower than standard facets. 
Is that just a fluke or should I switch to Json Query Facets instead of using 
facet prefix filtering?

Thanks again,
Max

-Original Message-
From: Aigner, Max [mailto:max.aig...@nordstrom.com] 
Sent: Wednesday, November 25, 2015 11:54 AM
To: solr-user@lucene.apache.org
Subject: RE: JSON facets and excluded queries

Yes, just tried that and it works fine. 

That just removed a showstopper for me as my queries contain lots of tagged FQs 
and multi-select facets (implemented the 'good way' :). 

Thank you for the quick help! 

-Original Message-
From: Yonik Seeley [mailto:ysee...@gmail.com] 
Sent: Wednesday, November 25, 2015 11:38 AM
To: solr-user@lucene.apache.org
Subject: Re: JSON facets and excluded queries

On Wed, Nov 25, 2015 at 2:29 PM, Yonik Seeley <ysee...@gmail.com> wrote:
> On Wed, Nov 25, 2015 at 2:15 PM, Aigner, Max <max.aig...@nordstrom.com> wrote:
>> Thanks, this is great :=))
>>
>> I hadn't seen the domain:{excludeTags:...} syntax yet and it doesn't seem to 
>> be working on 5.3.1 so I'm assuming this is work slated for 5.4 or 6. Did I 
>> get that right?
>
> Hmmm, the "domain" keyword was added for 5.3 along with block join
> faceting: http://yonik.com/solr-nested-objects/
> That's when I switched "excludeTags" to also be under the "domain" keyword.
>
> Let me try it out...

Ah, I messed up that migration...
OK, for now, instead of
  domain:{excludeTags:foo}
just use
  excludeTags:foo
and it should work.

-Yonik


RE: JSON facets and excluded queries

2015-12-10 Thread Aigner, Max
Another question popped up around this: 
Is the facet.threads parameter still relevant with Json facets? I saw that the 
facet prefix bug https://issues.apache.org/jira/browse/SOLR-6686 got fixed in  
5.3 so I'm looking into re-enabling this parameter for our searches. 

On a side note, I've been testing Json facet performance and I've observed that 
they're generally  faster unless facet prefix filtering comes into play, then 
they seem to be slower than standard facets. 
Is that just a fluke or should I switch to Json Query Facets instead of using 
facet prefix filtering?

Thanks again,
Max

-Original Message-
From: Aigner, Max [mailto:max.aig...@nordstrom.com] 
Sent: Wednesday, November 25, 2015 11:54 AM
To: solr-user@lucene.apache.org
Subject: RE: JSON facets and excluded queries

Yes, just tried that and it works fine. 

That just removed a showstopper for me as my queries contain lots of tagged FQs 
and multi-select facets (implemented the 'good way' :). 

Thank you for the quick help! 

-Original Message-
From: Yonik Seeley [mailto:ysee...@gmail.com] 
Sent: Wednesday, November 25, 2015 11:38 AM
To: solr-user@lucene.apache.org
Subject: Re: JSON facets and excluded queries

On Wed, Nov 25, 2015 at 2:29 PM, Yonik Seeley <ysee...@gmail.com> wrote:
> On Wed, Nov 25, 2015 at 2:15 PM, Aigner, Max <max.aig...@nordstrom.com> wrote:
>> Thanks, this is great :=))
>>
>> I hadn't seen the domain:{excludeTags:...} syntax yet and it doesn't seem to 
>> be working on 5.3.1 so I'm assuming this is work slated for 5.4 or 6. Did I 
>> get that right?
>
> Hmmm, the "domain" keyword was added for 5.3 along with block join
> faceting: http://yonik.com/solr-nested-objects/
> That's when I switched "excludeTags" to also be under the "domain" keyword.
>
> Let me try it out...

Ah, I messed up that migration...
OK, for now, instead of
  domain:{excludeTags:foo}
just use
  excludeTags:foo
and it should work.

-Yonik


RE: JSON facets and excluded queries

2015-11-25 Thread Aigner, Max
Yes, just tried that and it works fine. 

That just removed a showstopper for me as my queries contain lots of tagged FQs 
and multi-select facets (implemented the 'good way' :). 

Thank you for the quick help! 

-Original Message-
From: Yonik Seeley [mailto:ysee...@gmail.com] 
Sent: Wednesday, November 25, 2015 11:38 AM
To: solr-user@lucene.apache.org
Subject: Re: JSON facets and excluded queries

On Wed, Nov 25, 2015 at 2:29 PM, Yonik Seeley <ysee...@gmail.com> wrote:
> On Wed, Nov 25, 2015 at 2:15 PM, Aigner, Max <max.aig...@nordstrom.com> wrote:
>> Thanks, this is great :=))
>>
>> I hadn't seen the domain:{excludeTags:...} syntax yet and it doesn't seem to 
>> be working on 5.3.1 so I'm assuming this is work slated for 5.4 or 6. Did I 
>> get that right?
>
> Hmmm, the "domain" keyword was added for 5.3 along with block join
> faceting: http://yonik.com/solr-nested-objects/
> That's when I switched "excludeTags" to also be under the "domain" keyword.
>
> Let me try it out...

Ah, I messed up that migration...
OK, for now, instead of
  domain:{excludeTags:foo}
just use
  excludeTags:foo
and it should work.

-Yonik


RE: JSON facets and excluded queries

2015-11-25 Thread Aigner, Max
Thanks, this is great :=))

I hadn't seen the domain:{excludeTags:...} syntax yet and it doesn't seem to be 
working on 5.3.1 so I'm assuming this is work slated for 5.4 or 6. Did I get 
that right? 

Thanks,
Max

-Original Message-
From: Yonik Seeley [mailto:ysee...@gmail.com] 
Sent: Wednesday, November 25, 2015 9:21 AM
To: solr-user@lucene.apache.org
Subject: Re: JSON facets and excluded queries

Here's a little tutorial on multi-select faceting w/ the JSON Facet API:
http://yonik.com/multi-select-faceting/

-Yonik


On Tue, Nov 24, 2015 at 12:56 PM, Aigner, Max <max.aig...@nordstrom.com> wrote:
> I'm currently evaluating Solr 5.3.1 for performance improvements with 
> faceting.
> However, I'm unable to get the 'exclude-tagged-filters' feature to work. A 
> lot of the queries I'm doing are in the format
>
> ...?q=category:123={!tag=fqCol}color:green=true{!key=price_all
>  ex=fqCol}price{!key=price_nogreen}price...
>
> I couldn't find a way to make this work with JSON facets, the 'ex=' local 
> param doesn't seem to have a corresponding new parameter in JSON facets.
> Am I just missing something or is there a new recommended way for calculating 
> facets over a subset of filters?
>
> Thanks!
>


JSON facets and excluded queries

2015-11-24 Thread Aigner, Max
I'm currently evaluating Solr 5.3.1 for performance improvements with faceting.
However, I'm unable to get the 'exclude-tagged-filters' feature to work. A lot 
of the queries I'm doing are in the format

...?q=category:123={!tag=fqCol}color:green=true{!key=price_all
 ex=fqCol}price{!key=price_nogreen}price...

I couldn't find a way to make this work with JSON facets, the 'ex=' local param 
doesn't seem to have a corresponding new parameter in JSON facets.
Am I just missing something or is there a new recommended way for calculating 
facets over a subset of filters?

Thanks!



Spellcheck / Suggestions : Append custom dictionary to SOLR default index

2015-08-24 Thread Max Chadwick
Is there a way to append a set of words the the out-of-box solr index when
using the spellcheck / suggestions feature?


Solr could replace shards

2013-12-18 Thread Max Hansmire
I am considering using SolrCloud, but I have a use case that I am not sure
if it covers.

I would like to keep an index up to date in realtime, but also I would like
to sometimes restate the past. The way that I would restate the past is to
do batch processing over historical data.

My idea is that I would have the Solr collection sharded by date range. As
I move forward in time I would add more shards.

For restating historical data I would have a separate process that actually
indexes a shards worth of data. (This keeps the servers that are meant for
production search from having to handle the load of indexing historically.)
I would then move the index files to the solr servers and register the
newly created index with the server replacing the existing shards.

I used to be able to do something similar pre-SolrCloud by using the core
admin. But this did not have the benefit of having one search for the
entire collection. I had to manually query each of the cores to get the
full search index.

Essentially the question is:
1- is it possible to shard by date range in this way?
2- is it possible to swap out the index used by a shard?
3- is there a different way I should be thinking of this?

Max


Re: Indexed data not searchable

2013-04-11 Thread Max Bo
Thanks alot, so I will make a XSLT. 

Great community here!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexed-data-not-searchable-tp4054473p4055258.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexed data not searchable

2013-04-10 Thread Max Bo
Thanks to this!

No I have another problem. I tried to give the XML file the right format so
I made this

?xml version=1.0 encoding=UTF-8?

adddoc
field name=id455HHS-2232/field
field name=titleT0072-00031-DOWNLOAD - Blatt 12v/field
field name=formatapplication/pdf/field
field name=created2012-11-07T11:15:19.887+01:00/field
field name=lastModified2012-11-07T11:15:19.887+01:00/field
field name=issued2012-11-07T11:15:19.887+01:00/field
field name=revision0/field
field name=pidhdl:11858/00-1734--0008-12C5-2/field
field name=extent1131033/field
field name=projectSt. Matthias Test 07/field
field name=availabilitypublic/field
field name=rightsHolderStadtbibliothek und Stadtarchiv Trier/field
/doc/add



I also made the changes in the schema.xml

I added this fields:

   field name=identifier type=text_general indexed=true
stored=true/
   field name=format type=text_general indexed=true stored=true/
   field name=created type=date indexed=true stored=true/
   field name=issued type=date indexed=true stored=true/
   field name=revision type=int indexed=true stored=true/
   field name=pid type=text_general indexed=true stored=true/
   field name=extent type=int indexed=true stored=true/
   field name=dataContributor type=text_general indexed=true
stored=true/
   field name=project type=text_general indexed=true stored=true/
   field name=availability type=text_general indexed=true
stored=true/
   field name=rightsholder type=text_general indexed=true
stored=true/

Did I made anything wrong?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexed-data-not-searchable-tp4054473p4054960.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexed data not searchable

2013-04-10 Thread Max Bo
Just for information: I indicate that the problem occurs when I try to add
the fields, created, last_modified, issued (all three have the type date)
and the field rightsholder.

Maybe it is helpful!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexed-data-not-searchable-tp4054473p4054977.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexed data not searchable

2013-04-10 Thread Max Bo
Thank you. 

I changed it and now it works.

But is there any possibility to make the given timestamp acceptable for
solr?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexed-data-not-searchable-tp4054473p4054985.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexed data not searchable

2013-04-09 Thread Max Bo
The XML files are formatted like this. I think there is the problem.

metadataContainerType
  ns3:object
ns3:generic
   ns3:provided
   ns3:titleT0084-00371-DOWNLOAD - Blatt 184r/ns3:title
   ns3:identifier
type=METSXMLIDT0084-00371-DOWNLOAD/ns3:identifier
   ns3:formatapplication/pdf/ns3:format
   /ns3:provided
 ns3:generated
   ns3:created2012-11-08T00:09:57.531+01:00/ns3:created
  
ns3:lastModified2012-11-08T00:09:57.531+01:00/ns3:lastModified
   ns3:issued2012-11-08T00:09:57.531+01:00/ns3:issued
   ..




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexed-data-not-searchable-tp4054473p4054651.html
Sent from the Solr - User mailing list archive at Nabble.com.


Indexed data not searchable

2013-04-08 Thread Max Bo
Hello,

I'm very new to Solr and I come to an unexplainable point by myself so I
need your help.

I have indexed a huge amount of xml-Files by a shell script.

function solringest_rec {

for SRCFILE in $(find $1 -type f); do
#DESTFILE=$URL${SRCFILE/$1/}
echo ingest  $SRCFILE
curl $URL -H Content-type: text/xml --data-binary
@$SRCFILE
done

}


The respone I get is everytime: 

?xml version=1.0! encoding=UTF-8?
response
lst name=responseHeaderint name=status0int
name=QTime116/int/lst
/respone


Because of this I think that everything should be fine but the queries
doesn't work. 

For all other operations as the post operation I use the stuff from example
folder. 
Maybe I have to configure something in the schema.xml or solrconfig.xml ?


Hope you can help me!


Kind regards,

Max








--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexed-data-not-searchable-tp4054473.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexed data not searchable

2013-04-08 Thread Max Bo
Thanks for your help:

The URL I'am positng to is: http://localhost:8983/solr/update?commit=true


The XML-Filess I've added contains fields like author so I thought they
have to serachable since it it declared as indexed in the example schema.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexed-data-not-searchable-tp4054473p4054481.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How can I check if a more complex query condition matched?

2011-12-28 Thread Max
Thanks for your reply, I thought about using the debug mode, too, but
the information is not easy to parse and doesnt contain everything I
want. Furthermore I dont want to enable debug mode in production.

Is there anything else I could try?

On Tue, Dec 27, 2011 at 12:48 PM, Ahmet Arslan iori...@yahoo.com wrote:
 I have a more complex query condition
 like this:

 (city:15 AND country:60)^4 OR city:15^2 OR country:60^2

 What I want to achive with this query is basically if a
 document has
 city = 15 AND country = 60 it is more important then
 another document
 which only has city = 15 OR country = 60

 Furhtermore I want to show in my results view why a certain
 document
 matched, something like matched city and country or
 matched city
 only or matched country only.

 This is a bit of an simplified example, but the question
 remains: how
 can solr tell me which of the conditions in the query
 matched? If I
 match against a simple field only, I can get away with
 highlight
 fields, but conditions spanning multiple fields seem much
 more tricky.

 Looks like you can extract these info from output of debugQuery=on.
 http://wiki.apache.org/solr/CommonQueryParameters#debugQuery



How can I check if a more complex query condition matched?

2011-12-27 Thread Max
I have a more complex query condition like this:

(city:15 AND country:60)^4 OR city:15^2 OR country:60^2

What I want to achive with this query is basically if a document has
city = 15 AND country = 60 it is more important then another document
which only has city = 15 OR country = 60

Furhtermore I want to show in my results view why a certain document
matched, something like matched city and country or matched city
only or matched country only.

This is a bit of an simplified example, but the question remains: how
can solr tell me which of the conditions in the query matched? If I
match against a simple field only, I can get away with highlight
fields, but conditions spanning multiple fields seem much more tricky.

Thanks for any ideas on this!


InvalidTokenOffsetsException in conjunction with highlighting and ICU folding and edgeNgrams

2011-12-12 Thread Max
Hi there,

when highlighting a field with this definition:

fieldType name=name class=solr.TextField
positionIncrementGap=100
analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.ICUTransformFilterFactory id=Any-Latin/
filter class=solr.ICUFoldingFilterFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
generateNumberParts=1
catenateWords=1
catenateNumbers=1
catenateAll=0
splitOnCaseChange=1/
filter class=solr.EdgeNGramFilterFactory
minGramSize=2 maxGramSize=15 side=front/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.ICUTransformFilterFactory id=Any-Latin/
filter class=solr.ICUFoldingFilterFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
generateNumberParts=1
catenateWords=1
catenateNumbers=1
catenateAll=0
splitOnCaseChange=1/
filter class=solr.EdgeNGramFilterFactory
minGramSize=2 maxGramSize=15 side=front/
/analyzer
/fieldType

containing this string:

Mosfellsbær

I get the following exception, if that field is in the highlight fields:

SEVERE: org.apache.solr.common.SolrException:
org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token
mosfellsbaer exceeds length of provided text sized 11
at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:497)
at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401)
at 
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:636)
Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException:
Token mosfellsbaer exceeds length of provided text sized 11
at 
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233)
at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:490)

I tried with solr 3.4 and 3.5, same error for both. Removing the char
filter didnt fix the problem either.

It seems like there is some weird stuff going on when folding the
string, it can be seen in the analysis view, too:

http://i.imgur.com/6B2Uh.png

The end offset remains 11 even after folding and transforming æ to
ae, which seems wrong to me.

I also stumbled upon https://issues.apache.org/jira/browse/LUCENE-1500
which seems like a similiar issue.

Is there a workaround for that problem or is the field configuration wrong?


Re: InvalidTokenOffsetsException in conjunction with highlighting and ICU folding and edgeNgrams

2011-12-12 Thread Max
Robert, thank you for creating the issue in JIRA.

However, I need ngrams on that field – is there an alternative to the
EdgeNGramFilterFactory ?

Thanks!

On Mon, Dec 12, 2011 at 1:25 PM, Robert Muir rcm...@gmail.com wrote:
 On Mon, Dec 12, 2011 at 5:18 AM, Max nas...@gmail.com wrote:

 It seems like there is some weird stuff going on when folding the
 string, it can be seen in the analysis view, too:

 http://i.imgur.com/6B2Uh.png


 I created a bug here, https://issues.apache.org/jira/browse/LUCENE-3642

 Thanks for the screenshot, makes it easy to do a test case here.

 --
 lucidimagination.com


Re: Using Solr Analyzers in Lucene

2010-10-05 Thread Max Lynch
I guess I missed the init() method.  I was looking at the factory and
thought I saw config loading stuff (like getInt) which I assumed meant it
need to have schema.xml available.

Thanks!

-Max

On Tue, Oct 5, 2010 at 2:36 PM, Mathias Walter mathias.wal...@gmx.netwrote:

 Hi Max,

 why don't you use WordDelimiterFilterFactory directly? I'm doing the same
 stuff inside my own analyzer:

 final MapString, String args = new HashMapString, String();

 args.put(generateWordParts, 1);
 args.put(generateNumberParts, 1);
 args.put(catenateWords, 0);
 args.put(catenateNumbers, 0);
 args.put(catenateAll, 0);
 args.put(splitOnCaseChange, 1);
 args.put(splitOnNumerics, 1);
 args.put(preserveOriginal, 1);
 args.put(stemEnglishPossessive, 0);
 args.put(language, English);

 wordDelimiter = new WordDelimiterFilterFactory();
 wordDelimiter.init(args);
 stream = wordDelimiter.create(stream);

 --
 Kind regards,
 Mathias

  -Original Message-
  From: Max Lynch [mailto:ihas...@gmail.com]
  Sent: Tuesday, October 05, 2010 1:03 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Using Solr Analyzers in Lucene
 
  I have made progress on this by writing my own Analyzer.  I basically
 added
  the TokenFilters that are under each of the solr factory classes.  I had
 to
  copy and paste the WordDelimiterFilter because, of course, it was package
  protected.
 
 
 
  On Mon, Oct 4, 2010 at 3:05 PM, Max Lynch ihas...@gmail.com wrote:
 
   Hi,
   I asked this question a month ago on lucene-user and was referred here.
  
   I have content being analyzed in Solr using these tokenizers and
 filters:
  
   fieldType name=text_standard class=solr.TextField
   positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
  
   filter class=solr.WordDelimiterFilterFactory
   generateWordParts=0 generateNumberParts=1 catenateWords=1
   catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.SnowballPorterFilterFactory
 language=English
   protected=protwords.txt/
 /analyzer
 analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.WordDelimiterFilterFactory
   generateWordParts=0 generateNumberParts=1 catenateWords=1
   catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.SnowballPorterFilterFactory
 language=English
   protected=protwords.txt/
 /analyzer
   /fieldType
  
   Basically I want to be able to search against this index in Lucene with
 one
   of my background searching applications.
  
   My main reason for using Lucene over Solr for this is that I use the
   highlighter to keep track of exactly which terms were found which I use
 for
   my own scoring system and I always collect the whole set of found
   documents.  I've messed around with using Boosts but it wasn't fine
 grained
   enough and I wasn't able to effectively create a score threshold (would
   creating my own scorer be a better idea?)
  
   Is it possible to use this analyzer from Lucene, or at least re-create
 it
   in code?
  
   Thanks.
  
  




Using Solr Analyzers in Lucene

2010-10-04 Thread Max Lynch
Hi,
I asked this question a month ago on lucene-user and was referred here.

I have content being analyzed in Solr using these tokenizers and filters:

fieldType name=text_standard class=solr.TextField
positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/

filter class=solr.WordDelimiterFilterFactory
generateWordParts=0 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=0 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/
  /analyzer
/fieldType

Basically I want to be able to search against this index in Lucene with one
of my background searching applications.

My main reason for using Lucene over Solr for this is that I use the
highlighter to keep track of exactly which terms were found which I use for
my own scoring system and I always collect the whole set of found
documents.  I've messed around with using Boosts but it wasn't fine grained
enough and I wasn't able to effectively create a score threshold (would
creating my own scorer be a better idea?)

Is it possible to use this analyzer from Lucene, or at least re-create it in
code?

Thanks.


Re: Using Solr Analyzers in Lucene

2010-10-04 Thread Max Lynch
I have made progress on this by writing my own Analyzer.  I basically added
the TokenFilters that are under each of the solr factory classes.  I had to
copy and paste the WordDelimiterFilter because, of course, it was package
protected.



On Mon, Oct 4, 2010 at 3:05 PM, Max Lynch ihas...@gmail.com wrote:

 Hi,
 I asked this question a month ago on lucene-user and was referred here.

 I have content being analyzed in Solr using these tokenizers and filters:

 fieldType name=text_standard class=solr.TextField
 positionIncrementGap=100
analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory/

 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory language=English
 protected=protwords.txt/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory language=English
 protected=protwords.txt/
   /analyzer
 /fieldType

 Basically I want to be able to search against this index in Lucene with one
 of my background searching applications.

 My main reason for using Lucene over Solr for this is that I use the
 highlighter to keep track of exactly which terms were found which I use for
 my own scoring system and I always collect the whole set of found
 documents.  I've messed around with using Boosts but it wasn't fine grained
 enough and I wasn't able to effectively create a score threshold (would
 creating my own scorer be a better idea?)

 Is it possible to use this analyzer from Lucene, or at least re-create it
 in code?

 Thanks.




Search a URL

2010-09-23 Thread Max Lynch
Is there a tokenizer that will allow me to search for parts of a URL?  For
example, the search google would match on the data 
http://mail.google.com/dlkjadf;

This tokenizer factory doesn't seem to be sufficient:

fieldType name=text_standard class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=0 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory
language=English protected=protwords.txt/
/analyzer
analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/

 filter class=solr.WordDelimiterFilterFactory
generateWordParts=0 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory
language=English protected=protwords.txt/
 /analyzer
/fieldType

Thanks.


Re: Updating document without removing fields

2010-08-30 Thread Max Lynch
Thanks Lance.

I have decided to just put all of my processing on a bigger server along
with solr.  It's too bad, but I can manage.

-Max

On Sun, Aug 29, 2010 at 9:59 PM, Lance Norskog goks...@gmail.com wrote:

 No. Document creation is all-or-nothing, fields are not updateable.

 I think you have to filter all of your field changes through a join
 server. That is,
 all field updates could go to a database and the master would read
 document updates
 from that database. Or, you could have one updater feed updates to the
 other, The
 sends all updates to the master.

 Lance

 On Sun, Aug 29, 2010 at 6:19 PM, Max Lynch ihas...@gmail.com wrote:
  Hi,
  I have a master solr server and two slaves.  On each of the slaves I have
  programs running that read the slave index, do some processing on each
  document, add a few new fields, and commit the changes back to the
 master.
 
  The problem I'm running into right now is one slave will update one
 document
  and the other slave will eventually update the same document, but the
  changes will overwrite each other.  For example, one slave will add a
 field
  and commit the document, but the other slave won't have that field yet so
 it
  won't duplicate the document when it updates the doc with its own new
  field.  This causes the document to miss one set of fields from one of
 the
  slaves.
 
  Can I update a document without having to recreate it?  Is there a way to
  update the slave and then have the slave commit the changes to the master
  (adding new fields in the process?)
 
  Thanks.
 



 --
 Lance Norskog
 goks...@gmail.com



Updating document without removing fields

2010-08-29 Thread Max Lynch
Hi,
I have a master solr server and two slaves.  On each of the slaves I have
programs running that read the slave index, do some processing on each
document, add a few new fields, and commit the changes back to the master.

The problem I'm running into right now is one slave will update one document
and the other slave will eventually update the same document, but the
changes will overwrite each other.  For example, one slave will add a field
and commit the document, but the other slave won't have that field yet so it
won't duplicate the document when it updates the doc with its own new
field.  This causes the document to miss one set of fields from one of the
slaves.

Can I update a document without having to recreate it?  Is there a way to
update the slave and then have the slave commit the changes to the master
(adding new fields in the process?)

Thanks.


Delete by query issue

2010-08-25 Thread Max Lynch
Hi,
I am trying to delete all documents that have null values for a certain
field.  To that effect I can see all of the documents I want to delete by
doing this query:
-date_added_solr:[* TO *]

This returns about 32,000 documents.

However, when I try to put that into a curl call, no documents get deleted:
curl http://localhost:8985/solr/newsblog/update?commit=true -H
Content-Type: text/xml --data-binary 'deletequery-date_added_solr:[*
TO *]/query/delete'

Solr responds with:
response
lst name=responseHeaderint name=status0/intint
name=QTime364/int/lst
/response

But nothing happens, even if I explicitly issue a commit afterward.

Any ideas?

Thanks.


Re: Delete by query issue

2010-08-25 Thread Max Lynch
I was trying to filter out all documents that HAVE that field.  I was trying
to delete any documents where that field had empty values.

I just found a way to do it, but I did a range query on a string date in the
Lucene DateTools format and it worked, so I'm satisfied.  However, I believe
it worked because all of my documents have values for that field.

Oh well.

-max

On Wed, Aug 25, 2010 at 9:45 PM, scott chu (朱炎詹) scott@udngroup.comwrote:

 Excuse me, what's the hyphen before  the field name 'date_added_solr'? Is
 this some kind of new query format that I didn't know?

 deletequery-date_added_solr:[* TO *]/query/delete'

 - Original Message -
 From: Max Lynch ihas...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, August 26, 2010 6:12 AM
 Subject: Delete by query issue


  Hi,
  I am trying to delete all documents that have null values for a certain
  field.  To that effect I can see all of the documents I want to delete by
  doing this query:
  -date_added_solr:[* TO *]
 
  This returns about 32,000 documents.
 
  However, when I try to put that into a curl call, no documents get
 deleted:
  curl http://localhost:8985/solr/newsblog/update?commit=true -H
  Content-Type: text/xml --data-binary
 'deletequery-date_added_solr:[*
  TO *]/query/delete'
 
  Solr responds with:
  response
  lst name=responseHeaderint name=status0/intint
  name=QTime364/int/lst
  /response
 
  But nothing happens, even if I explicitly issue a commit afterward.
 
  Any ideas?
 
  Thanks.
 



 



 %b6G$J0T.'$$'d(l/f,r!C
 Checked by AVG - www.avg.com
 Version: 9.0.851 / Virus Database: 271.1.1/3093 - Release Date: 08/25/10
 14:34:00



Duplicating a Solr Doc

2010-08-25 Thread Max Lynch
Right now I am doing some processing on my Solr index using Lucene Java.
Basically, I loop through the index in Java and do some extra processing of
each document (processing that is too intensive to do during indexing).

However, when I try to update the document in solr with new fields (using
SolrJ), the document either loses fields I don't explicitly set, or if I
have Solr-specific fields such as a solr date field type, I am not able to
copy the value as I can't read the value from Java.

Is there a way to add a field to a solr document without having to re-create
the document?  If not, how can I read the value of a Solr date in java?
Document.get(date_field) returns null even though the value shows up when
I access it through solr.  If I could read this value I could just copy the
fields from the Lucene Document to a SolrInputDocument.

Thanks.


Re: Delete by query issue

2010-08-25 Thread Max Lynch
Thanks Lance.  I'll give that a try going forward.

On Wed, Aug 25, 2010 at 9:59 PM, Lance Norskog goks...@gmail.com wrote:

 Here's the problem: the standard Solr parser is a little weird about
 negative queries. The way to make this work is to say
*:* AND -field:[* TO *]

 This means select everything AND only these documents without a value
 in the field.

 On Wed, Aug 25, 2010 at 7:55 PM, Max Lynch ihas...@gmail.com wrote:
  I was trying to filter out all documents that HAVE that field.  I was
 trying
  to delete any documents where that field had empty values.
 
  I just found a way to do it, but I did a range query on a string date in
 the
  Lucene DateTools format and it worked, so I'm satisfied.  However, I
 believe
  it worked because all of my documents have values for that field.
 
  Oh well.
 
  -max
 
  On Wed, Aug 25, 2010 at 9:45 PM, scott chu (朱炎詹) scott@udngroup.com
 wrote:
 
  Excuse me, what's the hyphen before  the field name 'date_added_solr'?
 Is
  this some kind of new query format that I didn't know?
 
  deletequery-date_added_solr:[* TO *]/query/delete'
 
  - Original Message -
  From: Max Lynch ihas...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Thursday, August 26, 2010 6:12 AM
  Subject: Delete by query issue
 
 
   Hi,
   I am trying to delete all documents that have null values for a
 certain
   field.  To that effect I can see all of the documents I want to delete
 by
   doing this query:
   -date_added_solr:[* TO *]
  
   This returns about 32,000 documents.
  
   However, when I try to put that into a curl call, no documents get
  deleted:
   curl http://localhost:8985/solr/newsblog/update?commit=true -H
   Content-Type: text/xml --data-binary
  'deletequery-date_added_solr:[*
   TO *]/query/delete'
  
   Solr responds with:
   response
   lst name=responseHeaderint name=status0/intint
   name=QTime364/int/lst
   /response
  
   But nothing happens, even if I explicitly issue a commit afterward.
  
   Any ideas?
  
   Thanks.
  
 
 
 
 
 
 
 
 
  %b6G$J0T.'$$'d(l/f,r!C
  Checked by AVG - www.avg.com
  Version: 9.0.851 / Virus Database: 271.1.1/3093 - Release Date: 08/25/10
  14:34:00
 
 



 --
 Lance Norskog
 goks...@gmail.com



Re: Duplicating a Solr Doc

2010-08-25 Thread Max Lynch
It seems like this is a way to accomplish what I was looking for:
CoreContainer coreContainer = new CoreContainer();
File home = new
File(/home/max/packages/test/apache-solr-1.4.1/example/solr);
File f = new File(home, solr.xml);


coreContainer.load(/home/max/packages/test/apache-solr-1.4.1/example/solr,
f);

SolrCore core = coreContainer.getCore(newsblog);
IndexSchema schema = core.getSchema();
DocumentBuilder builder = new DocumentBuilder(schema);


// get a Lucene Doc
// Document d = ...


SolrDocument solrDocument = new SolrDocument();

builder.loadStoredFields(solrDocument, d);
logger.debug(Loaded stored date:  +
solrDocument.getFieldValue(date_added_solr));

However, one thing that scares me is the warning message I get from the
CoreContainer:
 [java] Aug 25, 2010 10:25:23 PM org.apache.solr.update.SolrIndexWriter
finalize
 [java] SEVERE: SolrIndexWriter was not closed prior to finalize(),
indicates a bug -- POSSIBLE RESOURCE LEAK!!!

I'm not sure what exactly triggers that but it's a result of the code I
posted above.

On Wed, Aug 25, 2010 at 10:49 PM, Max Lynch ihas...@gmail.com wrote:

 Right now I am doing some processing on my Solr index using Lucene Java.
 Basically, I loop through the index in Java and do some extra processing of
 each document (processing that is too intensive to do during indexing).

 However, when I try to update the document in solr with new fields (using
 SolrJ), the document either loses fields I don't explicitly set, or if I
 have Solr-specific fields such as a solr date field type, I am not able to
 copy the value as I can't read the value from Java.

 Is there a way to add a field to a solr document without having to
 re-create the document?  If not, how can I read the value of a Solr date in
 java?  Document.get(date_field) returns null even though the value shows
 up when I access it through solr.  If I could read this value I could just
 copy the fields from the Lucene Document to a SolrInputDocument.

 Thanks.



Duplicate a core

2010-08-03 Thread Max Lynch
Is it possible to duplicate a core?  I want to have one core contain only
documents within a certain date range (ex: 3 days old), and one core with
all documents that have ever been in the first core.  The small core is then
replicated to other servers which do real-time processing on it, but the
archive core exists for longer term searching.

I understand I could just connect to both cores from my indexer, but I would
like to not have to send duplicate documents across the network to save
bandwidth.

Is this possible?

Thanks.


Re: Duplicate a core

2010-08-03 Thread Max Lynch
What I'm doing now is just adding the documents to the other core each night
and deleting old documents from the other core when I'm finished.  Is there
a better way?

On Tue, Aug 3, 2010 at 4:38 PM, Max Lynch ihas...@gmail.com wrote:

 Is it possible to duplicate a core?  I want to have one core contain only
 documents within a certain date range (ex: 3 days old), and one core with
 all documents that have ever been in the first core.  The small core is then
 replicated to other servers which do real-time processing on it, but the
 archive core exists for longer term searching.

 I understand I could just connect to both cores from my indexer, but I
 would like to not have to send duplicate documents across the network to
 save bandwidth.

 Is this possible?

 Thanks.



Re: Know which terms are in a document

2010-07-29 Thread Max Lynch
Yea, I've had mild success with the highlighting approach with lucene, but
wasn't sure if there was another method available from solr.

Thanks Mike.

On Thu, Jul 29, 2010 at 5:17 AM, Michael McCandless 
luc...@mikemccandless.com wrote:

 This is a fairly frequently requested and missing feature in Lucene/Solr...

 Lucene actually knows this information while it's scoring each
 document; it's just that it in no way tries to record that.

 If you will only do this on a few documents (eg the one page of
 results) then piggybacking on the highlighter is an OK approach.

 If you need it on more docs than that, then probably you should
 customize how your queries are scored to also tally up which docs had
 which terms.

 Mike

 On Wed, Jul 28, 2010 at 6:53 PM, Max Lynch ihas...@gmail.com wrote:
  I would like to be search against my index, and then *know* which of a
 set
  of given terms were found in each document.
 
  For example, let's say I want to show articles with the word pizza or
  cake in them, but would like to be able to say which of those two was
  found.  I might use this to handle the article differently if it is about
  pizza, or if it is about cake.  I understand I can do multiple queries
 but I
  would like to avoid that.
 
  One thought I had was to use a highlighter and only return a fragment
 with
  the highlighted word, but I'm not sure how to do this with the various
  highlighting options.
 
  Is there a way?
 
  Thanks.
 



Know which terms are in a document

2010-07-28 Thread Max Lynch
I would like to be search against my index, and then *know* which of a set
of given terms were found in each document.

For example, let's say I want to show articles with the word pizza or
cake in them, but would like to be able to say which of those two was
found.  I might use this to handle the article differently if it is about
pizza, or if it is about cake.  I understand I can do multiple queries but I
would like to avoid that.

One thought I had was to use a highlighter and only return a fragment with
the highlighted word, but I'm not sure how to do this with the various
highlighting options.

Is there a way?

Thanks.


Re: CommonsHttpSolrServer add document hangs

2010-07-20 Thread Max Lynch
I'm still having trouble with this.  My program will run for a while, then
hang up at the same place.  Here is my add/commit process:

I am using StreamingUpdateSolrServer with queue size = 100 and num threads =
3.  My indexing process spawns 8 threads to process a subset of RSS feeds
which each thread then loops through.  Once a thread has processed a new
article, it constructs a new SolrInputDocument, creates a temporary
CollectionSolrInputDocument containing just the one new document, then
calls server.add(docs).  I never call commit() or optimize() from my java
code (I did before though, but I took that out).

On the server side, I have these related settings:
  updateHandler class=solr.DirectUpdateHandler2
autoCommit
  maxDocs300/maxDocs
  maxTime1/maxTime
/autoCommit
/updateHandler

I also have replication set up, as this is the master, here are the
settings:
requestHandler name=/replication class=solr.ReplicationHandler 
lst name=master
  str name=replicateAftercommit/str
  str name=replicateAfterstartup/str
  str name=confFilesschema.xml,stopwords.txt/str
/lst
/requestHandler

Those are the only extra settings I've set.  I also have a cron job running
every minute executing this command:
curl http://localhost:8985/solr/mycore/update -F stream.body=' commit /'

Otherwise I don't see the numDocs number increase on the admin statistics
page.

This process will soon be ONLY for indexing.  Is there a better way to
optimize it?  I replicate from the slaves every 60 seconds, and I want
documents to be available to the slaves as soon as possible.  Currently I
have a search process that has some IndexSearcher's on the Solr index (it's
a pure Lucene program), could that be causing issues?  This process never
opens an IndexWriter.

Thanks!


On Tue, Jul 13, 2010 at 10:52 AM, Max Lynch ihas...@gmail.com wrote:

 Great, thanks!


 On Tue, Jul 13, 2010 at 2:55 AM, Fornoville, Tom tom.fornovi...@truvo.com
  wrote:

 If you're only adding documents you can also have a go with
 StreamingUpdateSolrServer instead of the CommonsHttpSolrServer.
 Couple that with the suggestion of master/slave so the searches don't
 interfere with the indexing and you should have a pretty responsive
 system.

 -Original Message-
 From: Robert Petersen [mailto:rober...@buy.com]
 Sent: maandag 12 juli 2010 22:30
 To: solr-user@lucene.apache.org
 Subject: RE: CommonsHttpSolrServer add document hangs

 You could try a master slave setup using replication perhaps, then the
 slave serves searches and indexing commits on the master won't hang up
 searches at least...

 Here is the description:  http://wiki.apache.org/solr/SolrReplication


 -Original Message-
 From: Max Lynch [mailto:ihas...@gmail.com]
 Sent: Monday, July 12, 2010 11:57 AM
 To: solr-user@lucene.apache.org
 Subject: Re: CommonsHttpSolrServer add document hangs

 Thanks Robert,

 My script did start going again, but it was waiting for about half an
 hour
 which seems a bit excessive to me.  Is there some tuning I can do on the
 solr end to optimize for my use case, which is very heavy on commits and
 very light on searches (I do most of my searches on the raw Lucene index
 in
 the background)?

 Thanks.

 On Mon, Jul 12, 2010 at 12:06 PM, Robert Petersen rober...@buy.com
 wrote:

  Maybe solr is busy doing a commit or optimize?
 
  -Original Message-
  From: Max Lynch [mailto:ihas...@gmail.com]
  Sent: Monday, July 12, 2010 9:59 AM
  To: solr-user@lucene.apache.org
  Subject: CommonsHttpSolrServer add document hangs
 
  Hey guys,
  I'm using Solr 1.4.1 and I've been having some problems lately with
 code
  that adds documents through a CommonsHttpSolrServer.  It seems that
  randomly
  the call to theserver.add() will hang.  I am currently running my code
  in a
  single thread, but I noticed this would happen in multi threaded code
 as
  well.  The jar version of commons-httpclient is 3.1.
 
  I got a thread dump of the process, and one thread seems to be waiting
  on
  the org.apache.commons.httpclient.MultiThreadedHttpConnectionManager
 as
  shown below.  All other threads are in a RUNNABLE state (besides the
  Finalizer daemon).
 
  [java] Full thread dump Java HotSpot(TM) 64-Bit Server VM
 (16.3-b01
  mixed mode):
  [java]
  [java] MultiThreadedHttpConnectionManager cleanup daemon prio=10
  tid=0x7f441051c800 nid=0x527c in Object.wait()
 [0x7f4417e2f000]
  [java]java.lang.Thread.State: WAITING (on object monitor)
  [java] at java.lang.Object.wait(Native Method)
  [java] - waiting on 0x7f443ae5b290 (a
  java.lang.ref.ReferenceQueue$Lock)
  [java] at
  java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
  [java] - locked 0x7f443ae5b290 (a
  java.lang.ref.ReferenceQueue$Lock)
  [java] at
  java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
  [java

Re: CommonsHttpSolrServer add document hangs

2010-07-13 Thread Max Lynch
Great, thanks!

On Tue, Jul 13, 2010 at 2:55 AM, Fornoville, Tom
tom.fornovi...@truvo.comwrote:

 If you're only adding documents you can also have a go with
 StreamingUpdateSolrServer instead of the CommonsHttpSolrServer.
 Couple that with the suggestion of master/slave so the searches don't
 interfere with the indexing and you should have a pretty responsive
 system.

 -Original Message-
 From: Robert Petersen [mailto:rober...@buy.com]
 Sent: maandag 12 juli 2010 22:30
 To: solr-user@lucene.apache.org
 Subject: RE: CommonsHttpSolrServer add document hangs

 You could try a master slave setup using replication perhaps, then the
 slave serves searches and indexing commits on the master won't hang up
 searches at least...

 Here is the description:  http://wiki.apache.org/solr/SolrReplication


 -Original Message-
 From: Max Lynch [mailto:ihas...@gmail.com]
 Sent: Monday, July 12, 2010 11:57 AM
 To: solr-user@lucene.apache.org
 Subject: Re: CommonsHttpSolrServer add document hangs

 Thanks Robert,

 My script did start going again, but it was waiting for about half an
 hour
 which seems a bit excessive to me.  Is there some tuning I can do on the
 solr end to optimize for my use case, which is very heavy on commits and
 very light on searches (I do most of my searches on the raw Lucene index
 in
 the background)?

 Thanks.

 On Mon, Jul 12, 2010 at 12:06 PM, Robert Petersen rober...@buy.com
 wrote:

  Maybe solr is busy doing a commit or optimize?
 
  -Original Message-
  From: Max Lynch [mailto:ihas...@gmail.com]
  Sent: Monday, July 12, 2010 9:59 AM
  To: solr-user@lucene.apache.org
  Subject: CommonsHttpSolrServer add document hangs
 
  Hey guys,
  I'm using Solr 1.4.1 and I've been having some problems lately with
 code
  that adds documents through a CommonsHttpSolrServer.  It seems that
  randomly
  the call to theserver.add() will hang.  I am currently running my code
  in a
  single thread, but I noticed this would happen in multi threaded code
 as
  well.  The jar version of commons-httpclient is 3.1.
 
  I got a thread dump of the process, and one thread seems to be waiting
  on
  the org.apache.commons.httpclient.MultiThreadedHttpConnectionManager
 as
  shown below.  All other threads are in a RUNNABLE state (besides the
  Finalizer daemon).
 
  [java] Full thread dump Java HotSpot(TM) 64-Bit Server VM
 (16.3-b01
  mixed mode):
  [java]
  [java] MultiThreadedHttpConnectionManager cleanup daemon prio=10
  tid=0x7f441051c800 nid=0x527c in Object.wait()
 [0x7f4417e2f000]
  [java]java.lang.Thread.State: WAITING (on object monitor)
  [java] at java.lang.Object.wait(Native Method)
  [java] - waiting on 0x7f443ae5b290 (a
  java.lang.ref.ReferenceQueue$Lock)
  [java] at
  java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
  [java] - locked 0x7f443ae5b290 (a
  java.lang.ref.ReferenceQueue$Lock)
  [java] at
  java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
  [java] at
 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$Referen
  ceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122)
 
  Any ideas?
 
  Thanks.
 



CommonsHttpSolrServer add document hangs

2010-07-12 Thread Max Lynch
Hey guys,
I'm using Solr 1.4.1 and I've been having some problems lately with code
that adds documents through a CommonsHttpSolrServer.  It seems that randomly
the call to theserver.add() will hang.  I am currently running my code in a
single thread, but I noticed this would happen in multi threaded code as
well.  The jar version of commons-httpclient is 3.1.

I got a thread dump of the process, and one thread seems to be waiting on
the org.apache.commons.httpclient.MultiThreadedHttpConnectionManager as
shown below.  All other threads are in a RUNNABLE state (besides the
Finalizer daemon).

 [java] Full thread dump Java HotSpot(TM) 64-Bit Server VM (16.3-b01
mixed mode):
 [java]
 [java] MultiThreadedHttpConnectionManager cleanup daemon prio=10
tid=0x7f441051c800 nid=0x527c in Object.wait() [0x7f4417e2f000]
 [java]java.lang.Thread.State: WAITING (on object monitor)
 [java] at java.lang.Object.wait(Native Method)
 [java] - waiting on 0x7f443ae5b290 (a
java.lang.ref.ReferenceQueue$Lock)
 [java] at
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
 [java] - locked 0x7f443ae5b290 (a
java.lang.ref.ReferenceQueue$Lock)
 [java] at
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
 [java] at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ReferenceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122)

Any ideas?

Thanks.


Re: CommonsHttpSolrServer add document hangs

2010-07-12 Thread Max Lynch
Thanks Robert,

My script did start going again, but it was waiting for about half an hour
which seems a bit excessive to me.  Is there some tuning I can do on the
solr end to optimize for my use case, which is very heavy on commits and
very light on searches (I do most of my searches on the raw Lucene index in
the background)?

Thanks.

On Mon, Jul 12, 2010 at 12:06 PM, Robert Petersen rober...@buy.com wrote:

 Maybe solr is busy doing a commit or optimize?

 -Original Message-
 From: Max Lynch [mailto:ihas...@gmail.com]
 Sent: Monday, July 12, 2010 9:59 AM
 To: solr-user@lucene.apache.org
 Subject: CommonsHttpSolrServer add document hangs

 Hey guys,
 I'm using Solr 1.4.1 and I've been having some problems lately with code
 that adds documents through a CommonsHttpSolrServer.  It seems that
 randomly
 the call to theserver.add() will hang.  I am currently running my code
 in a
 single thread, but I noticed this would happen in multi threaded code as
 well.  The jar version of commons-httpclient is 3.1.

 I got a thread dump of the process, and one thread seems to be waiting
 on
 the org.apache.commons.httpclient.MultiThreadedHttpConnectionManager as
 shown below.  All other threads are in a RUNNABLE state (besides the
 Finalizer daemon).

 [java] Full thread dump Java HotSpot(TM) 64-Bit Server VM (16.3-b01
 mixed mode):
 [java]
 [java] MultiThreadedHttpConnectionManager cleanup daemon prio=10
 tid=0x7f441051c800 nid=0x527c in Object.wait() [0x7f4417e2f000]
 [java]java.lang.Thread.State: WAITING (on object monitor)
 [java] at java.lang.Object.wait(Native Method)
 [java] - waiting on 0x7f443ae5b290 (a
 java.lang.ref.ReferenceQueue$Lock)
 [java] at
 java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
 [java] - locked 0x7f443ae5b290 (a
 java.lang.ref.ReferenceQueue$Lock)
 [java] at
 java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
 [java] at
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$Referen
 ceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122)

 Any ideas?

 Thanks.



MailEntityProcessor class cast exception

2010-06-16 Thread Max Lynch
With last night's build of solr, I am trying to use the MailEntityProcessor
to index an email account.  However, when I call my dataimport url, I
receive a class cast exception:

INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0
QTime=44
Jun 16, 2010 8:16:03 PM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
WARNING: Unable to read: dataimport.properties
Jun 16, 2010 8:16:03 PM org.apache.solr.update.DirectUpdateHandler2
deleteAll
INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
Jun 16, 2010 8:16:03 PM org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=1

 
commit{dir=/home/m/g/spider/misc/solrindex_nl/index,segFN=segments_1,version=1276738117525,generation=1,filenames=[segments_1]
Jun 16, 2010 8:16:03 PM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: newest commit = 1276738117525
Jun 16, 2010 8:16:03 PM org.apache.solr.handler.dataimport.DataImporter
doFullImport
SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
load EntityProcessor implementation for entity:99544078513223 Processing
Document # 1
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at
org.apache.solr.handler.dataimport.DocBuilder.getEntityProcessor(DocBuilder.java:804)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:535)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:260)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:184)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:392)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:373)
Caused by: java.lang.ClassCastException:
org.apache.solr.handler.dataimport.MailEntityProcessor cannot be cast to
org.apache.solr.handler.dataimport.EntityProcessor
at
org.apache.solr.handler.dataimport.DocBuilder.getEntityProcessor(DocBuilder.java:801)
... 6 more
Jun 16, 2010 8:16:03 PM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: start rollback
Jun 16, 2010 8:16:03 PM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: end_rollback

Here is my dataimport part of my solrconfig.xml:
  requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
  lst name=defaults
  str
name=config/home/max/packages/apache-solr-4.0-2010-06-16_08-05-33/e/solr/conf/data-config.xml/str
  /lst
  /requestHandler

and my data-config.xml:
dataConfig
document
   entity processor=MailEntityProcessor
   user=***
   password=***
   host=***
   protocol=imaps
   folders = INBOX/
/document
/dataConfig

I did try to rebuild the solr nightly, but I still receive the same error.
 I have all of the required jar's (AFAIK) in my application's lib folder.

Any ideas?

Thanks.


Unwanted clustering of search results after sorting by score

2008-12-12 Thread Max Scheffler
Hallo,

We have a website on which you can search through a large amount of
products from different shops.

The information describing the products are provided to us by the shops
which sell these products.

If we sort a search result by score many products of the same shop are
clustered together. The reason for this behavior is that a shops tend to
use the same 'style' to describe their products. For example:

Shop 'foo' describes its products with 250 words and uses the searched
word once. Shop 'bar' describes its products with only 25 words and also
uses the searched word once. The score for shop 'foo' will be much worst
than for shop 'bar'. In a search in which are many products of shop
'foo' and 'bar' the products of shop 'bar' are shown before the products
of shop 'foo'.

We tried to avoid this behavior by not using the term frequency. But
after this we got very strange products under the first results.

Has anybody an idea to avoid the clustering of products (documents)
which are from the same shop?

Greetings
Max


prefix-search ingnores the lowerCaseFilter

2007-10-25 Thread Max Scheffler

Hi,

I want to perform a prefix-search which ignores cases. To do this I 
created a fielType called suggest:


fieldType name=suggest class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

Entrys (terms) could be 'foo', 'bar'...

A request like

http://localhost:8983/solr/select/?rows=0facet=trueq=*:*facet.field=suggestfacet.prefix=f

returns things like

lst name=facet_counts
  lst name=facet_queries/
  lst name=facet_fields
lst name=suggest
  int name=foo12/int
/lst
  /lst
/lst

But a request like
http://localhost:8983/solr/select/?rows=0facet=trueq=*:*facet.field=suggestfacet.prefix=F

returns just:

lst name=facet_counts
  lst name=facet_queries/
  lst name=facet_fields
lst name=suggest/
  /lst
/lst

That's not what I've expected, cause the field-definition contains a 
LowerCaseFilter.


Is it possible that the prefix-processing ignores the filters?

Max