Re: Error "unexpected docvalues type NUMERIC for field" using rord() function query on single valued int field

2016-11-23 Thread Jaco de Vroed
Hello,

We've figured out a workaround for this, using another field that's
multivalued and populated with , and using that field in the
rord() function query.

Nevertheless, this feels like a bug to me.

Bye,

Jaco.


On 23 November 2016 at 09:04, Jaco de Vroed <jdevr...@gmail.com> wrote:

> Hi,
>
> No, I reproduced the original issue, with the rord() function, on a brand
> new index with docValues=true, with just one doc indexed in it.
>
> Any clues?
>
> Thanks,
>
> Jaco.
>
> On 21 November 2016 at 15:06, Pushkar Raste <pushkar.ra...@gmail.com>
> wrote:
>
>> Did you turn on/off docValues on a already existing field?
>>
>> On Nov 16, 2016 11:51 AM, "Jaco de Vroed" <jdevr...@gmail.com> wrote:
>>
>> > Hi,
>> >
>> > I made a typo. The Solr version number in which this error occurs is
>> 5.5.3.
>> > I also checked 6.3.0, same problem.
>> >
>> > Thanks, bye,
>> >
>> > Jaco.
>> >
>> > On 16 November 2016 at 17:39, Jaco de Vroed <jdevr...@gmail.com> wrote:
>> >
>> > > Hello Solr users,
>> > >
>> > > I’m running into an error situation using Solr 5.3.3. The case is as
>> > > follows. In my schema, I have a field with a definition like this:
>> > >
>> > > > > > positionIncrementGap="0”/>
>> > > ….
>> > > > > > docValues="true" />
>> > >
>> > > That field is used in function queries for boosting purposes, using
>> the
>> > > rord() function. We’re coming from Solr 4, not using docValues for
>> that
>> > > field, and now moving to Solr 5, using docValues. Now, this is
>> causing a
>> > > problem. When doing this:
>> > >
>> > > http://localhost:8983/solr/core1/select?q=*:*=ID,
>> > > recip(rord(PublicationDate),0.15,300,10)
>> > >
>> > > The following error is given: "*unexpected docvalues type NUMERIC for
>> > > field 'PublicationDate' (expected one of [SORTED, SORTED_SET]). Use
>> > > UninvertingReader or index with docvalues*” (full stack trace below).
>> > >
>> > > This does not happen when the field is changed to be multiValued, but
>> I
>> > > don’t want to change that at this point (and I noticed that changing
>> from
>> > > single valued to multivalued, then attempting to post the document
>> again
>> > > also results in an error related to docvalues type, but that could be
>> the
>> > > topic of another mail I guess). This is now blocking our long desired
>> > > upgrade to Solr 5. We initially tried upgrading without docValues, but
>> > > performance was completely killed because of our function query based
>> > > ranking stuff, so we decide to use docValues.
>> > >
>> > > To me, this seems a bug. I’ve tried finding something in Solr’s JIRA,
>> the
>> > > exact same error is in https://issues.apache.org/jira
>> /browse/SOLR-7495,
>> > > but that is a different case.
>> > >
>> > > I can create a JIRA issue for this of course, but first wanted to
>> throw
>> > > this at the mailing list to see if there’s any insights that can be
>> > shared.
>> > >
>> > > Thanks a lot in advance, bye,
>> > >
>> > > Jaco..
>> > >
>> > > unexpected docvalues type NUMERIC for field 'PublicationDate'
>> (expected
>> > > one of [SORTED, SORTED_SET]). Use UninvertingReader or index with
>> > docvalues.
>> > > java.lang.IllegalStateException: unexpected docvalues type NUMERIC
>> for
>> > > field 'PublicationDate' (expected one of [SORTED, SORTED_SET]). Use
>> > > UninvertingReader or index with docvalues.
>> > > at org.apache.lucene.index.DocValues.checkField(DocValues.java:208)
>> > > at org.apache.lucene.index.DocValues.getSortedSet(DocValues.java:306)
>> > > at org.apache.solr.search.function.ReverseOrdFieldSource.getValues(
>> > > ReverseOrdFieldSource.java:98)
>> > > at org.apache.lucene.queries.function.valuesource.
>> > ReciprocalFloatFunction.
>> > > getValues(ReciprocalFloatFunction.java:64)
>> > > at org.apache.solr.response.transform.ValueSourceAugmenter.transform(
>> > > ValueSourceAugmenter.java:95)
>> > > at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:160)
>> > > at org.apache.solr.response.TextRespo

Re: Error "unexpected docvalues type NUMERIC for field" using rord() function query on single valued int field

2016-11-23 Thread Jaco de Vroed
Hi,

No, I reproduced the original issue, with the rord() function, on a brand
new index with docValues=true, with just one doc indexed in it.

Any clues?

Thanks,

Jaco.

On 21 November 2016 at 15:06, Pushkar Raste <pushkar.ra...@gmail.com> wrote:

> Did you turn on/off docValues on a already existing field?
>
> On Nov 16, 2016 11:51 AM, "Jaco de Vroed" <jdevr...@gmail.com> wrote:
>
> > Hi,
> >
> > I made a typo. The Solr version number in which this error occurs is
> 5.5.3.
> > I also checked 6.3.0, same problem.
> >
> > Thanks, bye,
> >
> > Jaco.
> >
> > On 16 November 2016 at 17:39, Jaco de Vroed <jdevr...@gmail.com> wrote:
> >
> > > Hello Solr users,
> > >
> > > I’m running into an error situation using Solr 5.3.3. The case is as
> > > follows. In my schema, I have a field with a definition like this:
> > >
> > >  > > positionIncrementGap="0”/>
> > > ….
> > >  > > docValues="true" />
> > >
> > > That field is used in function queries for boosting purposes, using the
> > > rord() function. We’re coming from Solr 4, not using docValues for that
> > > field, and now moving to Solr 5, using docValues. Now, this is causing
> a
> > > problem. When doing this:
> > >
> > > http://localhost:8983/solr/core1/select?q=*:*=ID,
> > > recip(rord(PublicationDate),0.15,300,10)
> > >
> > > The following error is given: "*unexpected docvalues type NUMERIC for
> > > field 'PublicationDate' (expected one of [SORTED, SORTED_SET]). Use
> > > UninvertingReader or index with docvalues*” (full stack trace below).
> > >
> > > This does not happen when the field is changed to be multiValued, but I
> > > don’t want to change that at this point (and I noticed that changing
> from
> > > single valued to multivalued, then attempting to post the document
> again
> > > also results in an error related to docvalues type, but that could be
> the
> > > topic of another mail I guess). This is now blocking our long desired
> > > upgrade to Solr 5. We initially tried upgrading without docValues, but
> > > performance was completely killed because of our function query based
> > > ranking stuff, so we decide to use docValues.
> > >
> > > To me, this seems a bug. I’ve tried finding something in Solr’s JIRA,
> the
> > > exact same error is in https://issues.apache.org/jira/browse/SOLR-7495
> ,
> > > but that is a different case.
> > >
> > > I can create a JIRA issue for this of course, but first wanted to throw
> > > this at the mailing list to see if there’s any insights that can be
> > shared.
> > >
> > > Thanks a lot in advance, bye,
> > >
> > > Jaco..
> > >
> > > unexpected docvalues type NUMERIC for field 'PublicationDate' (expected
> > > one of [SORTED, SORTED_SET]). Use UninvertingReader or index with
> > docvalues.
> > > java.lang.IllegalStateException: unexpected docvalues type NUMERIC for
> > > field 'PublicationDate' (expected one of [SORTED, SORTED_SET]). Use
> > > UninvertingReader or index with docvalues.
> > > at org.apache.lucene.index.DocValues.checkField(DocValues.java:208)
> > > at org.apache.lucene.index.DocValues.getSortedSet(DocValues.java:306)
> > > at org.apache.solr.search.function.ReverseOrdFieldSource.getValues(
> > > ReverseOrdFieldSource.java:98)
> > > at org.apache.lucene.queries.function.valuesource.
> > ReciprocalFloatFunction.
> > > getValues(ReciprocalFloatFunction.java:64)
> > > at org.apache.solr.response.transform.ValueSourceAugmenter.transform(
> > > ValueSourceAugmenter.java:95)
> > > at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:160)
> > > at org.apache.solr.response.TextResponseWriter.writeDocuments(
> > > TextResponseWriter.java:246)
> > > at org.apache.solr.response.TextResponseWriter.writeVal(
> > > TextResponseWriter.java:151)
> > > at org.apache.solr.response.XMLWriter.writeResponse(
> XMLWriter.java:113)
> > > at org.apache.solr.response.XMLResponseWriter.write(
> > > XMLResponseWriter.java:39)
> > > at org.apache.solr.response.QueryResponseWriterUtil.
> writeQueryResponse(
> > > QueryResponseWriterUtil.java:52)
> > > at org.apache.solr.servlet.HttpSolrCall.writeResponse(
> > > HttpSolrCall.java:728)
> > > at org.apache.solr.servlet.HttpSolrCall.call(H

Re: Error "unexpected docvalues type NUMERIC for field" using rord() function query on single valued int field

2016-11-16 Thread Jaco de Vroed
Hi,

I made a typo. The Solr version number in which this error occurs is 5.5.3.
I also checked 6.3.0, same problem.

Thanks, bye,

Jaco.

On 16 November 2016 at 17:39, Jaco de Vroed <jdevr...@gmail.com> wrote:

> Hello Solr users,
>
> I’m running into an error situation using Solr 5.3.3. The case is as
> follows. In my schema, I have a field with a definition like this:
>
>  positionIncrementGap="0”/>
> ….
>  docValues="true" />
>
> That field is used in function queries for boosting purposes, using the
> rord() function. We’re coming from Solr 4, not using docValues for that
> field, and now moving to Solr 5, using docValues. Now, this is causing a
> problem. When doing this:
>
> http://localhost:8983/solr/core1/select?q=*:*=ID,
> recip(rord(PublicationDate),0.15,300,10)
>
> The following error is given: "*unexpected docvalues type NUMERIC for
> field 'PublicationDate' (expected one of [SORTED, SORTED_SET]). Use
> UninvertingReader or index with docvalues*” (full stack trace below).
>
> This does not happen when the field is changed to be multiValued, but I
> don’t want to change that at this point (and I noticed that changing from
> single valued to multivalued, then attempting to post the document again
> also results in an error related to docvalues type, but that could be the
> topic of another mail I guess). This is now blocking our long desired
> upgrade to Solr 5. We initially tried upgrading without docValues, but
> performance was completely killed because of our function query based
> ranking stuff, so we decide to use docValues.
>
> To me, this seems a bug. I’ve tried finding something in Solr’s JIRA, the
> exact same error is in https://issues.apache.org/jira/browse/SOLR-7495,
> but that is a different case.
>
> I can create a JIRA issue for this of course, but first wanted to throw
> this at the mailing list to see if there’s any insights that can be shared.
>
> Thanks a lot in advance, bye,
>
> Jaco..
>
> unexpected docvalues type NUMERIC for field 'PublicationDate' (expected
> one of [SORTED, SORTED_SET]). Use UninvertingReader or index with docvalues.
> java.lang.IllegalStateException: unexpected docvalues type NUMERIC for
> field 'PublicationDate' (expected one of [SORTED, SORTED_SET]). Use
> UninvertingReader or index with docvalues.
> at org.apache.lucene.index.DocValues.checkField(DocValues.java:208)
> at org.apache.lucene.index.DocValues.getSortedSet(DocValues.java:306)
> at org.apache.solr.search.function.ReverseOrdFieldSource.getValues(
> ReverseOrdFieldSource.java:98)
> at org.apache.lucene.queries.function.valuesource.ReciprocalFloatFunction.
> getValues(ReciprocalFloatFunction.java:64)
> at org.apache.solr.response.transform.ValueSourceAugmenter.transform(
> ValueSourceAugmenter.java:95)
> at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:160)
> at org.apache.solr.response.TextResponseWriter.writeDocuments(
> TextResponseWriter.java:246)
> at org.apache.solr.response.TextResponseWriter.writeVal(
> TextResponseWriter.java:151)
> at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:113)
> at org.apache.solr.response.XMLResponseWriter.write(
> XMLResponseWriter.java:39)
> at org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(
> QueryResponseWriterUtil.java:52)
> at org.apache.solr.servlet.HttpSolrCall.writeResponse(
> HttpSolrCall.java:728)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:469)
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:257)
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:208)
> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1652)
> at org.eclipse.jetty.servlet.ServletHandler.doHandle(
> ServletHandler.java:585)
> at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:143)
> at org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:577)
> at org.eclipse.jetty.server.session.SessionHandler.
> doHandle(SessionHandler.java:223)
> at org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1127)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(
> ServletHandler.java:515)
> at org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:185)
> at org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandler.java:1061)
> at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:141)
> at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> ContextHandlerCollection.java:215)
> at org.eclipse.jetty.server.handler.HandlerCollection.
> handle(HandlerC

Error "unexpected docvalues type NUMERIC for field" using rord() function query on single valued int field

2016-11-16 Thread Jaco de Vroed
Hello Solr users,

I’m running into an error situation using Solr 5.3.3. The case is as follows. 
In my schema, I have a field with a definition like this:



That field is used in function queries for boosting purposes, using the rord() 
function. We’re coming from Solr 4, not using docValues for that field, and now 
moving to Solr 5, using docValues. Now, this is causing a problem. When doing 
this:


http://localhost:8983/solr/core1/select?q=*:*=ID,recip(rord(PublicationDate),0.15,300,10)
 
<http://localhost:8983/solr/core1/select?q=*:*=ID,recip(rord(PublicationDate),0.15,300,10)>

The following error is given: "unexpected docvalues type NUMERIC for field 
'PublicationDate' (expected one of [SORTED, SORTED_SET]). Use UninvertingReader 
or index with docvalues” (full stack trace below).

This does not happen when the field is changed to be multiValued, but I don’t 
want to change that at this point (and I noticed that changing from single 
valued to multivalued, then attempting to post the document again also results 
in an error related to docvalues type, but that could be the topic of another 
mail I guess). This is now blocking our long desired upgrade to Solr 5. We 
initially tried upgrading without docValues, but performance was completely 
killed because of our function query based ranking stuff, so we decide to use 
docValues.

To me, this seems a bug. I’ve tried finding something in Solr’s JIRA, the exact 
same error is in https://issues.apache.org/jira/browse/SOLR-7495 
<https://issues.apache.org/jira/browse/SOLR-7495>, but that is a different case.

I can create a JIRA issue for this of course, but first wanted to throw this at 
the mailing list to see if there’s any insights that can be shared.

Thanks a lot in advance, bye, 

Jaco..

unexpected docvalues type NUMERIC for field 'PublicationDate' (expected one of 
[SORTED, SORTED_SET]). Use UninvertingReader or index with docvalues.
java.lang.IllegalStateException: unexpected docvalues type NUMERIC for field 
'PublicationDate' (expected one of [SORTED, SORTED_SET]). Use UninvertingReader 
or index with docvalues.
at org.apache.lucene.index.DocValues.checkField(DocValues.java:208)
at org.apache.lucene.index.DocValues.getSortedSet(DocValues.java:306)
at 
org.apache.solr.search.function.ReverseOrdFieldSource.getValues(ReverseOrdFieldSource.java:98)
at 
org.apache.lucene.queries.function.valuesource.ReciprocalFloatFunction.getValues(ReciprocalFloatFunction.java:64)
at 
org.apache.solr.response.transform.ValueSourceAugmenter.transform(ValueSourceAugmenter.java:95)
at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:160)
at 
org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:246)
at 
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:151)
at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:113)
at 
org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:39)
at 
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:52)
at 
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:728)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:469)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpCo

Solr 3.5 MoreLikeThis on Date fields

2012-01-16 Thread Jaco Olivier
Hi Everyone,

Please help out if you know what is going on.
We are upgrading to Solr 3.5 (from 1.4.1) and busy with a Re-Index and Test on 
our data.

Everything seems OK, but Date Fields seem to be broken when using with the 
MoreLikeThis handler 
(I also saw the same error on Date Fields using the HighLighter in another 
forum post Invalid Date String for highlighting any date field match @ Mon 
2011/08/15 13:10 ).
* I deleted the index/core and only loaded a few records and still get the 
error when using the MoreLikeThis using the docdate as part of the mlt.fl 
params.
* I double checked all the data that was loaded and the dates parse 100% and 
can see no problems with any of the data loaded.

Type: fieldType name=date class=solr.TrieDateField 
omitNorms=true precisionStep=0 positionIncrementGap=0/
Definition:   field name=docdate type=date indexed=true stored=true 
multiValued=false/
A sample result: date name=docdate1999-06-28T00:00:00Z/date

THE MLT QUERY:

Jan 16, 2012 4:09:16 PM org.apache.solr.core.SolrCore execute
INFO: [legal_spring] webapp=/solr path=/select 
params={mlt.fl=doctitle,pld_pubtype,docdate,pld_cluster,pld_port,pld_summary,alltext,subclassmlt.mintf=1mlt=trueversion=2.2fl=doc_id,doctitle,docdate,prodtypeqt=mltmlt.boost=truemlt.qf=doctitle^5.0+alltext^0.2json.nl=mapwt=jsonrows=50mlt.mindf=1mlt.count=50start=0q=doc_id:PLD23996}
 status=400 QTime=1

THE ERROR:

Jan 16, 2012 4:09:16 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Invalid Date String:'94046400'
at org.apache.solr.schema.DateField.parseMath(DateField.java:165)
at 
org.apache.solr.analysis.TrieTokenizer.reset(TrieTokenizerFactory.java:106)
at 
org.apache.solr.analysis.TrieTokenizer.init(TrieTokenizerFactory.java:76)
at 
org.apache.solr.analysis.TrieTokenizerFactory.create(TrieTokenizerFactory.java:51)
at 
org.apache.solr.analysis.TrieTokenizerFactory.create(TrieTokenizerFactory.java:41)
at 
org.apache.solr.analysis.TokenizerChain.getStream(TokenizerChain.java:68)
at 
org.apache.solr.analysis.SolrAnalyzer.reusableTokenStream(SolrAnalyzer.java:75)
at 
org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.reusableTokenStream(IndexSchema.java:385)
at 
org.apache.lucene.search.similar.MoreLikeThis.addTermFrequencies(MoreLikeThis.java:876)
at 
org.apache.lucene.search.similar.MoreLikeThis.retrieveTerms(MoreLikeThis.java:820)
at 
org.apache.lucene.search.similar.MoreLikeThis.like(MoreLikeThis.java:629)
at 
org.apache.solr.handler.MoreLikeThisHandler$MoreLikeThisHelper.getMoreLikeThis(MoreLikeThisHandler.java:311)
at 
org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandler.java:149)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:619)

Sincerely,
Jaco Olivier
Please note: This email and its content are subject to the disclaimer as 
displayed at the following link 
http://www.sabinet.co.za/?page=e-mail-disclaimer. Should you not have Web 
access, send an email to i...@sabinet.co.zamailto:i...@sabinet.co.za and a 
copy will be sent to you


Solr and Lucene in South Africa

2010-07-30 Thread Jaco Olivier
Hi to all Solr/Lucene Users...

Out team had a discussion today regarding the Solr/Lucene community closer to 
home.
I am hereby putting out an SOS to all Solr/Lucene users in the South African 
market and wish to organize a meet-up (or user support group) if at all 
possible.
It would be great to share some triumphs and pitfalls that were experienced.

* Sorry for hogging the User Mailing list on non-technical question, but think 
this is the easiest way to get it done :)

Jaco Olivier
Web Specialist

Please note: This email and its content are subject to the disclaimer as 
displayed at the following link 
http://www.sabinet.co.za/?page=e-mail-disclaimer. Should you not have Web 
access, send an email to i...@sabinet.co.zamailto:i...@sabinet.co.za and a 
copy will be sent to you


Re: Good literature on search basics

2010-02-12 Thread Jaco
See http://markmail.org/thread/z5sq2jr2a6eayth4


On 12 February 2010 12:14, javaxmlsoapdev vika...@yahoo.com wrote:


 Does anyone know good literature(web resources, books etc) on basics of
 search? I do have Solr 1.4 and Lucene books but wanted to go in more
 details
 on basics.

 Thanks,
 --
 View this message in context:
 http://old.nabble.com/Good-literature-on-search-basics-tp27562021p27562021.html
 Sent from the Solr - User mailing list archive at Nabble.com.




RE: why no results?

2009-12-08 Thread Jaco Olivier
Hi Regan,

I am using STRING fields only for values that in most cases will be used
to FACET on..
I suggest using TEXT fields as per the default examples...

ALSO, remember that if you do not specify the 
solr.LowerCaseFilterFactory  that your search has just become case
sensitive.. I struggled with that one before, so make sure what you are
indexing is what you are searching for.
* Stick to the default examples that is provided with the SOLR distro
and you should be fine.

fieldType name=text class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
--
!-- Case insensitive stop word removal.
 enablePositionIncrements=true ensures that a 'gap' is left
to
 allow for accurate phrase queries.
--
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

Jaco Olivier

-Original Message-
From: regany [mailto:re...@newzealand.co.nz] 
Sent: 08 December 2009 06:15
To: solr-user@lucene.apache.org
Subject: Re: why no results?



Tom Hill-7 wrote:
 
 Try solr.TextField instead.
 


Thanks Tom,

I've replaced the types section above with...

types
fieldtype name=string class=solr.TextField
sortMissingLast=true omitNorms=true /
/types


deleted my index, restarted Solr and re-indexed my documents - but the
search still returns nothing.

Do I need to change the type in the fields sections as well?

regan
-- 
View this message in context:
http://old.nabble.com/why-no-results--tp26688249p26688469.html
Sent from the Solr - User mailing list archive at Nabble.com.

Please consider the environment before printing this email. This 
transmission is for the intended addressee only and is confidential 
information. If you have received this transmission in error, please 
delete it and notify the sender. The content of this e-mail is the 
opinion of the writer only and is not endorsed by Sabinet Online Limited 
unless expressly stated otherwise.


RE: why no results?

2009-12-08 Thread Jaco Olivier
Hi,

Try changing your TEXT field to type text
field name=text type=text indexed=true
stored=false multiValued=true / (without the  of course :))

That is your problem... also use the text type as per default examples
with SOLR distro :)

Jaco Olivier


-Original Message-
From: regany [mailto:re...@newzealand.co.nz] 
Sent: 08 December 2009 05:44
To: solr-user@lucene.apache.org
Subject: why no results?


hi all - newbie solr question - I've indexed some documents and can
search /
receive results using the following schema - BUT ONLY when searching on
the
id field. If I try searching on the title, subtitle, body or text
field I
receive NO results. Very confused. :confused: Can anyone see anything
obvious I'm doing wrong Regan.



?xml version=1.0 ?

schema name=core0 version=1.1

types
fieldtype name=string class=solr.StrField
sortMissingLast=true omitNorms=true /
/types

 fields
!-- general --
field  name=id type=string indexed=true stored=true
multiValued=false required=true /
field name=title type=string indexed=true stored=true
multiValued=false /
field name=subtitle type=string indexed=true
stored=true
multiValued=false /
field name=body type=string indexed=true stored=true
multiValued=false /
field name=text type=string indexed=true stored=false
multiValued=true /
 /fields

 !-- field to use to determine and enforce document uniqueness. --
 uniqueKeyid/uniqueKey

 !-- field for the QueryParser to use when an explicit fieldname is
absent
--
 defaultSearchFieldtext/defaultSearchField

 !-- SolrQueryParser configuration: defaultOperator=AND|OR --
 solrQueryParser defaultOperator=OR/

 !-- copyFields group fields into one single searchable indexed field
for
speed.  --
copyField source=title dest=text /
copyField source=subtitle dest=text /
copyField source=body dest=text /

/schema

-- 
View this message in context:
http://old.nabble.com/why-no-results--tp26688249p26688249.html
Sent from the Solr - User mailing list archive at Nabble.com.

Please consider the environment before printing this email. This 
transmission is for the intended addressee only and is confidential 
information. If you have received this transmission in error, please 
delete it and notify the sender. The content of this e-mail is the 
opinion of the writer only and is not endorsed by Sabinet Online Limited 
unless expressly stated otherwise.


RE: do copyField's need to exist as Fields?

2009-12-08 Thread Jaco Olivier
Hi Regan,

Something I noticed on your setup...
The ID field in your setup I assume to be your uniqueID for the book or
journal (The ISSN or something)
Try making this a string as TEXT is not the ideal field to use for
unique IDs

field  name=id type=string indexed=true stored=true
multiValued=false required=true /

Congrats on figuring out SOLR fields - I suggest getting the SOLR 1.4
Book.. It really saved me a 1000 questions on this mailing list :)

Jaco Olivier

-Original Message-
From: regany [mailto:re...@newzealand.co.nz] 
Sent: 09 December 2009 00:48
To: solr-user@lucene.apache.org
Subject: Re: do copyField's need to exist as Fields?



regany wrote:
 
 Is there a different way I should be setting it up to achieve the
above??
 


Think I figured it out.

I set up the fields so they are present, but get ignored accept for
the
text field which gets indexed...

field  name=id type=text indexed=true stored=true
multiValued=false required=true /
field name=title stored=false indexed=false multiValued=true
type=text /
field name=subtitle stored=false indexed=false multiValued=true
type=text /
field name=body stored=false indexed=false multiValued=true
type=text /
field name=text type=text indexed=true stored=false
multiValued=true /

and then copyField the first 4 fields to the text field:

copyField source=id dest=text /
copyField source=title dest=text /
copyField source=subtitle dest=text /
copyField source=body dest=text /


Seems to be working!? :drunk:
-- 
View this message in context:
http://old.nabble.com/do-copyField%27s-need-to-exist-as-Fields--tp267017
06p26702224.html
Sent from the Solr - User mailing list archive at Nabble.com.

Please consider the environment before printing this email. This 
transmission is for the intended addressee only and is confidential 
information. If you have received this transmission in error, please 
delete it and notify the sender. The content of this e-mail is the 
opinion of the writer only and is not endorsed by Sabinet Online Limited 
unless expressly stated otherwise.


Nested phrases with proximity in Solr

2009-10-22 Thread Jaco
Hello,

As far as I've been able to dig up, there is no way to use nested phrases in
Solr, let alone with proximity. For instance a b c d~10. I've seen a
special Surround Query Parser in Lucene that appears to support this.

Am I missing something? Any clues anybody?

Thanks in advance, bye,

Jaco.


Re: Solr Replication on Windows

2009-06-17 Thread Jaco
Hi,

In my experience, you can just migrate to 1.4. We are using this in
production without any problems, and the Java Replication (
http://wiki.apache.org/solr/SolrReplication) works excellent.

Bye,

Jaco.


2009/6/17 vaibhav joshi callvaib...@hotmail.com


 Hi,

 I am using Solr 1.3 release and have different set of machines for Query
 and Master for Indexer. These machines are windows boxes. The Solr
 replication in wiki scripts are unix shell scritps. Are there any scritpt or
 Java version of Replication available with Solr 1.3..I saw Java replication
 mentioned in wiki but its seems to be available only with Solr 1.4.



 Thanks

 Vaibhav

 _
 Live Search extreme As India feels the heat of poll season, get all the
 info you need on the MSN News Aggregator
 http://news.in.msn.com/National/indiaelections2009/aggregator/default.aspx



Re: Who is running 1.4 nightly in production?

2009-05-13 Thread Jaco
Running 1.4 nightly in production as well, also for the Java replication and
for the improved facet count algorithms. No problems, all running smoothly.

Bye,

Jaco.

2009/5/13 Erik Hatcher e...@ehatchersolutions.com

 We run a not too distant trunk (1.4, probably a month or so ago) version of
 Solr on LucidFind at http://www.lucidimagination.com/search

Erik

 On May 12, 2009, at 5:02 PM, Walter Underwood wrote:

  We're planning our move to 1.4, and want to run one of our production
 servers with the new code. Just to feel better about it, is anyone else
 running 1.4 in production?

 I'm building 2009-05-11 right now.

 wuner





Re: Dictionary lookup possibilities

2009-04-18 Thread Jaco
Hi,

Thanks for the suggestions! It looks like the MemoryIndex is worth having a
detailed look at, so that's what I'll start on.

Thanks again, bye,

Jaco.


2009/4/17 Steven A Rowe sar...@syr.edu

 Hi Jaco,

 On 4/9/2009 at 2:58 PM, Jaco wrote:
  I'm struggling with some ideas, maybe somebody can help me with past
  experiences or tips. I have loaded a dictionary into a Solr index,
  using stemming and some stopwords in analysis part of the schema.
  Each record holds a term from the dictionary, which can consist of
  multiple words. For some data analysis work, I want to send pieces
  of text (sentences actually) to Solr to retrieve all possible
  dictionary terms that could occur. Ideally, I want to construct a
  query that only returns those Solr records for which all individual
  words in that record are matched.
 
  For instance, my dictionary holds the following terms:
  1 - a b c d
  2 - c d e
  3 - a b
  4 - a e f g h
 
  If I put the sentence [a b c d f g h] in as a query, I want to recieve
  dictionary items 1 (matching all words a b c d) and 3 (matching words a
  b) as matches
 
  I have been puzzling about how to do this. The only way I found so far
  was to construct an OR query with all words of the sentence in it. In
  this case, that would result in all dictionary items being returned.
  This would then require some code to go over the search results and
  analyse each of them (i.e. by using the highlight function) to kick
  out 'false' matches, but I am looking for a more efficient way.
 
  Is there a way to do this with Solr functionality, or do I need to
  start looking into the Lucene API ..?

 Your problem could be modeled as a set of standing queries, where your
 dictionary entries are the *queries* (with all words required, maybe using a
 PhraseQuery or a SpanNearQuery), and the sentence is the document.

 Solr may not be usable in this context (extremely high volume queries),
 depending on your throughput requirements, but Lucene's MemoryIndex was
 designed for this kind of thing:

 
 http://lucene.apache.org/java/2_4_1/api/contrib-memory/org/apache/lucene/index/memory/MemoryIndex.html
 

 Steve




Dictionary lookup possibilities

2009-04-09 Thread Jaco
Hello,

I'm struggling with some ideas, maybe somebody can help me with past
experiences or tips. I have loaded a dictionary into a Solr index, using
stemming and some stopwords in analysis part of the schema. Each record
holds a term from the dictionary, which can consist of multiple words. For
some data analysis work, I want to send pieces of text (sentences actually)
to Solr to retrieve all possible dictionary terms that could occur. Ideally,
I want to construct a query that only returns those Solr records for which
all individual words in that record are matched.

For instance, my dictionary holds the following terms:
1 - a b c d
2 - c d e
3 - a b
4 - a e f g h

If I put the sentence [a b c d f g h] in as a query, I want to recieve
dictionary items 1 (matching all words a b c d) and 3 (matching words a b)
as matches

I have been puzzling about how to do this. The only way I found so far was
to construct an OR query with all words of the sentence in it. In this case,
that would result in all dictionary items being returned. This would then
require some code to go over the search results and analyse each of them
(i.e. by using the highlight function) to kick out 'false' matches, but I am
looking for a more efficient way.

Is there a way to do this with Solr functionality, or do I need to start
looking into the Lucene API ..?

Any help would be much appreciated as usual!

Thanks, bye,

Jaco.


Re: Size of my index directory increase considerably

2009-03-26 Thread Jaco
Hi,

After installing that patch, all is running fine for me as well - problem no
longer occurring and replication running great! The issue
https://issues.apache.org/jira/browse/SOLR-978 has already been committed,
so it's also there in the 1.4 nightly builds.

Bye,

Jaco.



2009/3/26 sunnyfr johanna...@gmail.com


 Just applied this patch :

 http://www.nabble.com/Solr-Replication%3A-disk-space-consumed-on-slave-much-higher-than-on--master-td21579171.html#a21622876

 It seems to work well now. Do I have to do something else ?
 Do you reckon something for my configuration ?

 Thanks a lot
 --
 View this message in context:
 http://www.nabble.com/Size-of-my-index-directory-increase-considerably-tp22718590p22722075.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Solr Replication: disk space consumed on slave much higher than on master

2009-01-25 Thread Jaco
Hi Noble,

Great stuff, no problem, I really think the Solr development team is
excellent and takes pride in delivering high quality software!

And we're going into production with a brand new Solr based system in a few
weeks as well, so I'm really happy that this is fixed now.

Bye,

Jaco.

2009/1/24 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@gmail.com

 hi Jaco,
 We  owe you a bing THANK YOU.

 We were planning to roll out this feature into production in the next
 week or so. Our internal testing could not find this out.



 --Noble


 On Fri, Jan 23, 2009 at 6:36 PM, Jaco jdevr...@gmail.com wrote:
  Hi,
 
  I have tested this as well, looking fine! Both issues are indeed fixed,
 and
  the index directory of the slaves gets cleaned up nicely. I will apply
 the
  changes to all systems I've got running and report back in this thread in
  case any issues are found.
 
  Thanks for the very fast help! I usually need much, much more patience
 with
  commercial software vendors..
 
  Cheers,
 
  Jaco.

 
 
  2009/1/23 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@gmail.com
 
  I have opened an issue to track this
  https://issues.apache.org/jira/browse/SOLR-978
 
  On Fri, Jan 23, 2009 at 5:22 PM, Noble Paul നോബിള്‍  नोब्ळ्
  noble.p...@gmail.com wrote:
   I tested with the patch
   it has solved both the issues
  
   On Fri, Jan 23, 2009 at 5:00 PM, Shalin Shekhar Mangar
   shalinman...@gmail.com wrote:
  
  
   On Fri, Jan 23, 2009 at 2:12 PM, Jaco jdevr...@gmail.com wrote:
  
   Hi,
  
   I applied the patch and did some more tests - also adding some
  LOG.info()
   calls in delTree to see if it actually gets invoked
 (LOG.info(START:
   delTree: +dir.getName()); at the start of that method). I don't see
  any
   entries of this showing up in the log file at all, so it looks like
   delTree
   doesn't get invoked at all.
  
   To be sure, explaining the issue to prevent misunderstanding:
   - The number of files in the index directory on the slave keeps
  increasing
   (in my very small test core, there are now 128 files in the slave's
  index
   directory, and only 73 files in the master's index directory)
   - The directories index.x are still there after replication, but
  they
   are empty
  
   Are there any other things I can do check, or more info that I can
  provide
   to help fix this?
  
   The problem is that when we do a commit on the slave after
 replication
  is
   done. The commit does not re-open the IndexWriter. Therefore, the
  deletion
   policy does not take affect and older files are left as is. This can
  keep on
   building up. The only solution is to re-open the index writer.
  
   I think the attached patch can solve this problem. Can you try this
 and
  let
   us know? Thank you for your patience.
  
   --
   Regards,
   Shalin Shekhar Mangar.
  
  
  
  
   --
   --Noble Paul
  
 
 
 
  --
  --Noble Paul
 
 



 --
 --Noble Paul



Re: Solr Replication: disk space consumed on slave much higher than on master

2009-01-23 Thread Jaco
Hi,

I applied the patch and did some more tests - also adding some LOG.info()
calls in delTree to see if it actually gets invoked (LOG.info(START:
delTree: +dir.getName()); at the start of that method). I don't see any
entries of this showing up in the log file at all, so it looks like delTree
doesn't get invoked at all.

To be sure, explaining the issue to prevent misunderstanding:
- The number of files in the index directory on the slave keeps increasing
(in my very small test core, there are now 128 files in the slave's index
directory, and only 73 files in the master's index directory)
- The directories index.x are still there after replication, but they
are empty

Are there any other things I can do check, or more info that I can provide
to help fix this?

Thanks, bye,

Jaco.


2009/1/22 Shalin Shekhar Mangar shalinman...@gmail.com

 On Fri, Jan 23, 2009 at 12:15 AM, Noble Paul നോബിള്‍ नोब्ळ् 
 noble.p...@gmail.com wrote:

  I have attached a patch which logs the names of the files which could
  not get deleted (which may help us diagnose the problem). If you are
  comfortable applying a patch you may try it out.
 

 I've committed this patch to trunk.

 --
 Regards,
 Shalin Shekhar Mangar.



Re: Solr Replication: disk space consumed on slave much higher than on master

2009-01-23 Thread Jaco
Hi,

I have tested this as well, looking fine! Both issues are indeed fixed, and
the index directory of the slaves gets cleaned up nicely. I will apply the
changes to all systems I've got running and report back in this thread in
case any issues are found.

Thanks for the very fast help! I usually need much, much more patience with
commercial software vendors..

Cheers,

Jaco.


2009/1/23 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@gmail.com

 I have opened an issue to track this
 https://issues.apache.org/jira/browse/SOLR-978

 On Fri, Jan 23, 2009 at 5:22 PM, Noble Paul നോബിള്‍  नोब्ळ्
 noble.p...@gmail.com wrote:
  I tested with the patch
  it has solved both the issues
 
  On Fri, Jan 23, 2009 at 5:00 PM, Shalin Shekhar Mangar
  shalinman...@gmail.com wrote:
 
 
  On Fri, Jan 23, 2009 at 2:12 PM, Jaco jdevr...@gmail.com wrote:
 
  Hi,
 
  I applied the patch and did some more tests - also adding some
 LOG.info()
  calls in delTree to see if it actually gets invoked (LOG.info(START:
  delTree: +dir.getName()); at the start of that method). I don't see
 any
  entries of this showing up in the log file at all, so it looks like
  delTree
  doesn't get invoked at all.
 
  To be sure, explaining the issue to prevent misunderstanding:
  - The number of files in the index directory on the slave keeps
 increasing
  (in my very small test core, there are now 128 files in the slave's
 index
  directory, and only 73 files in the master's index directory)
  - The directories index.x are still there after replication, but
 they
  are empty
 
  Are there any other things I can do check, or more info that I can
 provide
  to help fix this?
 
  The problem is that when we do a commit on the slave after replication
 is
  done. The commit does not re-open the IndexWriter. Therefore, the
 deletion
  policy does not take affect and older files are left as is. This can
 keep on
  building up. The only solution is to re-open the index writer.
 
  I think the attached patch can solve this problem. Can you try this and
 let
  us know? Thank you for your patience.
 
  --
  Regards,
  Shalin Shekhar Mangar.
 
 
 
 
  --
  --Noble Paul
 



 --
 --Noble Paul



Re: Solr Replication: disk space consumed on slave much higher than on master

2009-01-22 Thread Jaco
Hm, I don't know what to do anymore. I tried this:
- Run Tomcat service as local administrator to overcome any permissioning
issues
- Installed latest nightly build (I noticed that item I mentioned before (
http://markmail.org/message/yq2ram4f3jblermd) had been committed which is
good
- Build a small master and slave core to try it all out
- With each replication, the number of files on slave grows, and the
directories index.xxx.. are not removed
- I tried sending explicit commit commands to the slave, assuming it
wouldn't help, which was true.
- I don't see any reference to SolrDeletion in the log of the slave (it's
there in the log of the master)

Can anybody recommend some action to be taken? I'm building up some quite
large production cores right now, and don't want the slaves to eat up all
hard disk space of course..

Thanks a lot in advance,

Jaoc.

2009/1/21 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@gmail.com

 On Wed, Jan 21, 2009 at 3:42 PM, Jaco jdevr...@gmail.com wrote:
  Thanks for the fast replies!
 
  It appears that I made a (probably classical) error... I didnt' make the
  change to solrconfig.xml to include the deletionPolicy when applying
 the
  upgrade. I include this now, but the slave is not cleaning up. Will this
 be
  done at some point automatically? Can I trigger this?
 Unfortunately , no.
 Lucene is supposed to cleanup these old commit points automatically
 after each commit. Even if the delettionPolicy is not specified the
 default is supposed to take  effect.
 
  User access rights for the user are OK, this use is allowed to do
 anything
  in the Solr data directory (Tomcat service is running from SYSTEM account
  (Windows)).
 
  Thanks, regards,
 
  Jaco.
 
 
  2009/1/21 Shalin Shekhar Mangar shalinman...@gmail.com
 
  Hi,
 
  There shouldn't be so many files on the slave. Since the empty
 index.x
  folders are not getting deleted, is it possible that Solr process user
 does
  not enough privileges to delete files/folders?
 
  Also, have you made any changes to the IndexDeletionPolicy
 configuration?
 
  On Wed, Jan 21, 2009 at 2:15 PM, Jaco jdevr...@gmail.com wrote:
 
   Hi,
  
   I'm running Solr nightly build of 20.12.2008, with patch as discussed
 on
   http://markmail.org/message/yq2ram4f3jblermd, using Solr replication.
  
   On various systems running, I see that the disk space consumed on the
  slave
   is much higher than on the master. One example:
   - Master: 30 GB in 138 files
   - Slave: 152 GB in 3,941 files
  
   Can anybody tell me what to do to prevent this from happening, and how
 to
   clean up the slave? Also, there are quite some empty index.xxx
   directories sitting in the slaves data dir. Can these be safely
 removed?
  
   Thanks a lot in advance, bye,
  
   Jaco.
  
 
 
 
  --
  Regards,
  Shalin Shekhar Mangar.
 
 



 --
 --Noble Paul



Solr Replication: disk space consumed on slave much higher than on master

2009-01-21 Thread Jaco
Hi,

I'm running Solr nightly build of 20.12.2008, with patch as discussed on
http://markmail.org/message/yq2ram4f3jblermd, using Solr replication.

On various systems running, I see that the disk space consumed on the slave
is much higher than on the master. One example:
- Master: 30 GB in 138 files
- Slave: 152 GB in 3,941 files

Can anybody tell me what to do to prevent this from happening, and how to
clean up the slave? Also, there are quite some empty index.xxx
directories sitting in the slaves data dir. Can these be safely removed?

Thanks a lot in advance, bye,

Jaco.


Re: Solr Replication: disk space consumed on slave much higher than on master

2009-01-21 Thread Jaco
Thanks for the fast replies!

It appears that I made a (probably classical) error... I didnt' make the
change to solrconfig.xml to include the deletionPolicy when applying the
upgrade. I include this now, but the slave is not cleaning up. Will this be
done at some point automatically? Can I trigger this?

User access rights for the user are OK, this use is allowed to do anything
in the Solr data directory (Tomcat service is running from SYSTEM account
(Windows)).

Thanks, regards,

Jaco.


2009/1/21 Shalin Shekhar Mangar shalinman...@gmail.com

 Hi,

 There shouldn't be so many files on the slave. Since the empty index.x
 folders are not getting deleted, is it possible that Solr process user does
 not enough privileges to delete files/folders?

 Also, have you made any changes to the IndexDeletionPolicy configuration?

 On Wed, Jan 21, 2009 at 2:15 PM, Jaco jdevr...@gmail.com wrote:

  Hi,
 
  I'm running Solr nightly build of 20.12.2008, with patch as discussed on
  http://markmail.org/message/yq2ram4f3jblermd, using Solr replication.
 
  On various systems running, I see that the disk space consumed on the
 slave
  is much higher than on the master. One example:
  - Master: 30 GB in 138 files
  - Slave: 152 GB in 3,941 files
 
  Can anybody tell me what to do to prevent this from happening, and how to
  clean up the slave? Also, there are quite some empty index.xxx
  directories sitting in the slaves data dir. Can these be safely removed?
 
  Thanks a lot in advance, bye,
 
  Jaco.
 



 --
 Regards,
 Shalin Shekhar Mangar.



Unable to move index file error during replication

2008-12-24 Thread Jaco
Hello,

While testing out the new replication features, I'm running into some
strange problem. On the slave, I keep getting an error like this after all
files have been copied from the master to the temporary index.x
directory:

SEVERE: Unable to move index file from:
D:\Data\solr\Slave\data\index.20081224110855\_21e.tvx to:
D:\Data\Solr\Slave\data\index\_21e.tvx

The replication then stops, index remains in original state, so the updates
are not available at the slave.

This is my replication config at the master:

requestHandler name=/replication class=solr.ReplicationHandler 
lst name=master
!--Replicate on 'optimize' it can also be  'commit' --
str name=replicateAftercommit/str
str name=confFilesschema.xml/str
/lst
/requestHandler

This is the replication config at the slave:

requestHandler name=/replication class=solr.ReplicationHandler 
lst name=slave
str name=masterUrl
http://hostnamemaster:8080/solr/Master/replication/str
str name=pollInterval00:10:00/str
str name=ziptrue/str
/lst
/requestHandler

I'm running a Solr nightly build of 21.12.2008 in Tomcat 6 on Windows 2003.
Initially I thought there was some problem with disk space, but this is not
the case. Replication did run fine for intial version of index, but after
that at some point it didn't work anymore. Any ideas what could be wrong
here?

Thanks very much in advance, bye,

Jaco.


Re: Unable to move index file error during replication

2008-12-24 Thread Jaco
Very good! I applied the patch in the attached file, working fine now. I'll
keep monitoring and post any issues found.

Will this be included in some next nightly build?

Thanks very much for the very quick response!

Jaco.

2008/12/24 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@gmail.com

 James thanks .

 If this is true the place to fix this is in
 ReplicationHandler#getFileList(). patch is attached.


 On Wed, Dec 24, 2008 at 4:04 PM, James Grant james.gr...@semantico.com
 wrote:
  I had the same problem. It turned out that the list of files from the
 master
  included duplicates. When the slave completes the download and tries to
 move
  the files into the index it comes across a file that does not exist
 because
  it has already been moved so it backs out the whole operation.
 
  My solution for now was to patch the copyindexFiles method of
  org.apache.solr.handler.SnapPuller so that it normalises the list before
  moving the files. This isn't the best solution since it will still
 download
  the file twice but it was the easiest and smallest change to make. The
 patch
  is below
 
  Regards
 
  James
 
  --- src/java/org/apache/solr/handler/SnapPuller.java(revision 727347)
  +++ src/java/org/apache/solr/handler/SnapPuller.java(working copy)
  @@ -470,7 +470,7 @@
*/
   private boolean copyIndexFiles(File snapDir, File indexDir) {
 String segmentsFile = null;
  -ListString copiedfiles = new ArrayListString();
  +SetString filesToCopy = new HashSetString();
 for (MapString, Object f : filesDownloaded) {
   String fname = (String) f.get(NAME);
   // the segments file must be copied last
  @@ -482,6 +482,10 @@
 segmentsFile = fname;
 continue;
   }
  +  filesToCopy.add(fname);
  +}
  +ListString copiedfiles = new ArrayListString();
  +for (String fname: filesToCopy) {
   if (!copyAFile(snapDir, indexDir, fname, copiedfiles)) return false;
   copiedfiles.add(fname);
 }
 
 
  Jaco wrote:
 
  Hello,
 
  While testing out the new replication features, I'm running into some
  strange problem. On the slave, I keep getting an error like this after
 all
  files have been copied from the master to the temporary index.x
  directory:
 
  SEVERE: Unable to move index file from:
  D:\Data\solr\Slave\data\index.20081224110855\_21e.tvx to:
  D:\Data\Solr\Slave\data\index\_21e.tvx
 
  The replication then stops, index remains in original state, so the
  updates
  are not available at the slave.
 
  This is my replication config at the master:
 
 requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 !--Replicate on 'optimize' it can also be  'commit' --
 str name=replicateAftercommit/str
 str name=confFilesschema.xml/str
 /lst
 /requestHandler
 
  This is the replication config at the slave:
 
 requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=slave
 str name=masterUrl
  http://hostnamemaster:8080/solr/Master/replication/str
 str name=pollInterval00:10:00/str
 str name=ziptrue/str
 /lst
 /requestHandler
 
  I'm running a Solr nightly build of 21.12.2008 in Tomcat 6 on Windows
  2003.
  Initially I thought there was some problem with disk space, but this is
  not
  the case. Replication did run fine for intial version of index, but
 after
  that at some point it didn't work anymore. Any ideas what could be wrong
  here?
 
  Thanks very much in advance, bye,
 
  Jaco.
 
 
 
 



 --
 --Noble Paul



Preferred Tomcat version on Windows 2003 (64 bits)

2008-11-06 Thread Jaco
Hello,

I am planning a brand new environment for Solr running on a Windows 2003
Server 64 bits platform. I want to use Tomcat, and was wondering whether
there is any preference in general for using Tomcat 5.5 or Tomcat 6.0 with
Solr.

Any suggestions would be appreciated!

Thanks, bye,

Jaco.


Re: Preferred Tomcat version on Windows 2003 (64 bits)

2008-11-06 Thread Jaco
Thanks for the fast reply!

I've tested SOLR-561, and it is working beautifully! Excellent
functionality.

Cheers,

Jaco.

2008/11/6 Otis Gospodnetic [EMAIL PROTECTED]

 I don't think there are preferences.  If going with the brand new setup why
 not go with Tomcat 6.0.
 Also be aware that if you want master-slave setup Windows you will need to
 use post 1.3 version of Solr (nightly) that includes functionality from
 SOLR-561.


 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: Jaco [EMAIL PROTECTED]
  To: solr-user@lucene.apache.org
  Sent: Thursday, November 6, 2008 11:32:04 AM
  Subject: Preferred Tomcat version on Windows 2003 (64 bits)
 
  Hello,
 
  I am planning a brand new environment for Solr running on a Windows 2003
  Server 64 bits platform. I want to use Tomcat, and was wondering whether
  there is any preference in general for using Tomcat 5.5 or Tomcat 6.0
 with
  Solr.
 
  Any suggestions would be appreciated!
 
  Thanks, bye,
 
  Jaco.




Distributed search, standard request handler and more like this

2008-10-29 Thread Jaco
Hello,

I'm doing some expirements with the morelikethis functionality using the
standard request handler to see if it also works with distributed search (I
saw that it will not yet work with the MoreLikeThis handler,
https://issues.apache.org/jira/browse/SOLR-788). As far as I can see, this
also does not work when using the standard request handler, i.e.:

http://localhost:8080/solr/select?q=ID:*documentID*
mlt=truemlt.fl=Textmlt.mindf=1mlt.mintf=1shards=shard1,shard2

I''m not getting any moreLikeThis results back, just the document resulting
from the q= query. The same query without shards= does return moreLiekThis
results. Am I doing something wrong or is this not yet supported..?

Thanks, bye,

Jaco.


Re: Integrating external stemmer in Solr and pre-processing text

2008-09-30 Thread Jaco
Hi,

The suggested approach with a TokenFilter extending the BufferedTokenStream
class works fine, performance is OK - the external stemmer is now invoked
only once for the complete search text. Also, from a functional point of
view, the approach is useful, because it allows for other filtering (i.e
WordDelimiterFilter with the various useful options) to be done before
stemming takes place.

Code is roughly like this for the process() function of the custom Filter
class:

protected Token process (Token token) {
StringBuilderstringBuilder = new StringBuilder();
TokennextToken;
IntegertokenPos = 0;
MapInteger, TokentokenMap = new LinkedHashMapInteger,
Token();

stringBuilder.append(token.term()).append(' ');
tokenMap.put(tokenPos++, token);
nextToken= read();

while (nextToken != null)
{
stringBuilder.append(nextToken.term()).append(' ');
tokenMap.put(tokenPos++, nextToken);

nextToken= read();
}

StringinputText = stringBuilder.toString();
StringstemmedText   = stemText(inputText);
String[] stemmedWords= stemmedText.split(\\s);

for (Map.EntryInteger, Token entry : tokenMap.entrySet())
{
Integerpos= entry.getKey();
Tokentok = entry.getValue();

tok.setTermBuffer(stemmedWords[pos]);
write(tok);
}

return null;
}
}

This will need some work and additional error checking, and I'll probably
put a maximum om the number of tokens that is to be processed in one go to
make sure things don't get too big in memory.

Thanks for helping out!

Bye,

Jaco.



2008/9/26 Jaco [EMAIL PROTECTED]

 Thanks for these suggestions, will try it in the coming days and post my
 findings in this thread.

 Bye,


 Jaco.

 2008/9/26 Grant Ingersoll [EMAIL PROTECTED]


 On Sep 26, 2008, at 12:05 PM, Jaco wrote:

  Hi Grant,

 In reply to your questions:

 1. Are you having to restart/initialize the stemmer every time for your
 slow approach?  Does that really need to happen?

 It is invoking a COM object in Windows. The object is instantiated once
 for
 a token stream, and then invoked once for each token. The invoke always
 has
 an overhead, not much to do about that (sigh...)

 2. Can the stemmer return something other than a String?  Say a String
 array
 of all the stemmed words?  Or maybe even some type of object that tells
 you
 the original word and the stemmed word?

 The stemmer can only return a String. But, I do know that the returned
 string always has exactly the same number of words as the input string.
 So
 logically, it would be possible to :
 a) first calculate the position/start/end of each token in the input
 string
 (usual tokenization by Whitespace), resulting in token list 1
 b) then invoke the stemmer, and tokenize that result by Whitespace,
 resulting in token list 2
 c) 'merge' the token values of token list 2 into token list 1, which is
 possible because each token's position is the same in both lists...
 d) return that 'merged' token list 2 for further processing

 Would this work in Solr?


 I think so, assuming your stemmer tokenizes on whitespace as well.



 I can do some Java coding to achieve that from logical point of view, but
 I
 wouldn't know how to structure this flow into the MyTokenizerFactory, so
 some hints to achieve that would be great!



 One thought:
 Don't create an all in one Tokenizer.  Instead, keep the Whitespace
 Tokenizer as is.  Then, create a TokenFilter that buffers the whole document
 into a memory (via the next() implementation) and also creates, using
 StringBuilder, a string containing the whole text.  Once you've read it all
 in, then send the string to your stemmer, parse it back out and associate it
 back to your token buffer.  If you are guaranteed position, you could even
 keep a (linked) hash, such that it is really quick to look up tokens after
 stemming.

 Pseudocode looks something like:

 while (token.next != null)
   tokenMap.put(token.position, token)
   stringBuilder.append(' ').append(token.text)

 stemmedText = comObj.stem(stringBuilder.toString())
 correlateStemmedText(stemmedText, tokenMap)

 spit out the tokens one by one...


 I think this approach should be fast (but maybe not as fast as your all in
 one tokenizer) and will provide the correct position and offsets.  You do
 have to be careful w/ really big documents, as that map can be big.  You
 also want to be careful about map reuse, token reuse, etc.

 I believe there are a couple of buffering TokenFilters in Solr that you
 could examine for inspiration.  I think the RemoveDuplicatesTokenFilter (or
 whatever it's called) does buffering.

 -Grant






 Thanks for helping out!

 Jaco.


 2008/9/26 Grant Ingersoll [EMAIL PROTECTED]


 On Sep 26, 2008, at 9:40 AM, Jaco wrote:

 Hi,


 Here's some

Integrating external stemmer in Solr and pre-processing text

2008-09-26 Thread Jaco
Hello,

I need to work with an external stemmer in Solr. This stemmer is accessible
as a COM object (running Solr in tomcat on Windows platform). I managed to
integrate this using the com4j library. I tested two scenario's:
1. Create a custom FilterFactory and Filter class for this. The external
stemmer is then invoked for every token
2. Create a custom TokenizerFactory (extending BaseTokenizerFactory), that
invokes the external stemmer for the entire search text, then puts the
result of this into a StringReader, and finally returns new
WhitespaceTokenizer(stringReader), so the stemmed text gets tokenized by the
whitespace tokenizer.

Looking at search results, both scenario's appear to work from a functional
point of view. The first scenario however is too slow because of the
overhead of calling the external COM object for each token.

The second scenario is much faster, and also gives correct search results.
However, this then gives problems with highlighting - sometimes, errors are
reported (String out of Range), in other cases, I get incorrect highlight
fragments. Without knowing all details about this stuff, this makes sense
because of the change done to the text to be processed before it's
tokenized.  Maybe my second scenario does not make sense at all..?

Any ideas on how to overcome this or any other suggestions on how to realise
this?

Thanks, bye,

Jaco.

PS I posted this message twice before but it didn't come through (spam
filtering..??), so this is the 2nd try with text changed a bit


Re: Integrating external stemmer in Solr and pre-processing text

2008-09-26 Thread Jaco
Hi,

Here's some of the code of my Tokenizer:

public class MyTokenizerFactory extends BaseTokenizerFactory
{
public WhitespaceTokenizer create(Reader input)
{
String text, normalizedText;

try {
text  = IOUtils.toString(input);
normalizedText= *invoke my stemmer(text)*;

}
catch( IOException ex ) {
throw new SolrException( SolrException.ErrorCode.SERVER_ERROR,
ex );
}

StringReaderstringReader = new StringReader(normalizedText);

return new WhitespaceTokenizer(stringReader);
}
}

I see what's going in the analysis tool now, and I think I understand the
problem. For instance, the text: abcdxxx defgxxx. Let's assume the stemmer
gets rid of xxx.

I would then see this in the analysis tool after the tokenizer stage:
- abcd - term position 1; start: 1; end:  3
- defg - term position 2; start: 4; end: 7

These positions are not in line with the initial search text - this must be
why the highlighting goes wrong. I guess my little trick to do this was a
bit too simple because it messes up the positions basically because
something different from the original source text is tokenized.

Any suggestions would be very welcome...

Cheers,

Jaco.


2008/9/26 Grant Ingersoll [EMAIL PROTECTED]

 How are you creating the tokens?  What are you setting for the offsets and
 the positions?

 One thing that is helpful is Solr's built in Analysis tool via the Admin
 interface (http://localhost:8983/solr/admin/)  From there, you can plug in
 verbose mode, and see what the position and offsets are for every piece of
 your Analyzer.

 -Grant


 On Sep 26, 2008, at 3:10 AM, Jaco wrote:

  Hello,

 I need to work with an external stemmer in Solr. This stemmer is
 accessible
 as a COM object (running Solr in tomcat on Windows platform). I managed to
 integrate this using the com4j library. I tested two scenario's:
 1. Create a custom FilterFactory and Filter class for this. The external
 stemmer is then invoked for every token
 2. Create a custom TokenizerFactory (extending BaseTokenizerFactory), that
 invokes the external stemmer for the entire search text, then puts the
 result of this into a StringReader, and finally returns new
 WhitespaceTokenizer(stringReader), so the stemmed text gets tokenized by
 the
 whitespace tokenizer.

 Looking at search results, both scenario's appear to work from a
 functional
 point of view. The first scenario however is too slow because of the
 overhead of calling the external COM object for each token.

 The second scenario is much faster, and also gives correct search results.
 However, this then gives problems with highlighting - sometimes, errors
 are
 reported (String out of Range), in other cases, I get incorrect highlight
 fragments. Without knowing all details about this stuff, this makes sense
 because of the change done to the text to be processed before it's
 tokenized.  Maybe my second scenario does not make sense at all..?

 Any ideas on how to overcome this or any other suggestions on how to
 realise
 this?

 Thanks, bye,

 Jaco.

 PS I posted this message twice before but it didn't come through (spam
 filtering..??), so this is the 2nd try with text changed a bit


 --
 Grant Ingersoll
 http://www.lucidimagination.com

 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ










Re: Integrating external stemmer in Solr and pre-processing text

2008-09-26 Thread Jaco
Hi Grant,

In reply to your questions:

1. Are you having to restart/initialize the stemmer every time for your
slow approach?  Does that really need to happen?

It is invoking a COM object in Windows. The object is instantiated once for
a token stream, and then invoked once for each token. The invoke always has
an overhead, not much to do about that (sigh...)

2. Can the stemmer return something other than a String?  Say a String array
of all the stemmed words?  Or maybe even some type of object that tells you
the original word and the stemmed word?

The stemmer can only return a String. But, I do know that the returned
string always has exactly the same number of words as the input string. So
logically, it would be possible to :
a) first calculate the position/start/end of each token in the input string
(usual tokenization by Whitespace), resulting in token list 1
b) then invoke the stemmer, and tokenize that result by Whitespace,
resulting in token list 2
c) 'merge' the token values of token list 2 into token list 1, which is
possible because each token's position is the same in both lists...
d) return that 'merged' token list 2 for further processing

Would this work in Solr?

I can do some Java coding to achieve that from logical point of view, but I
wouldn't know how to structure this flow into the MyTokenizerFactory, so
some hints to achieve that would be great!

Thanks for helping out!

Jaco.


2008/9/26 Grant Ingersoll [EMAIL PROTECTED]


 On Sep 26, 2008, at 9:40 AM, Jaco wrote:

  Hi,

 Here's some of the code of my Tokenizer:

 public class MyTokenizerFactory extends BaseTokenizerFactory
 {
   public WhitespaceTokenizer create(Reader input)
   {
   String text, normalizedText;

   try {
   text  = IOUtils.toString(input);
   normalizedText= *invoke my stemmer(text)*;

   }
   catch( IOException ex ) {
   throw new SolrException( SolrException.ErrorCode.SERVER_ERROR,
 ex );
   }

   StringReaderstringReader = new StringReader(normalizedText);

   return new WhitespaceTokenizer(stringReader);
   }
 }

 I see what's going in the analysis tool now, and I think I understand the
 problem. For instance, the text: abcdxxx defgxxx. Let's assume the stemmer
 gets rid of xxx.

 I would then see this in the analysis tool after the tokenizer stage:
 - abcd - term position 1; start: 1; end:  3
 - defg - term position 2; start: 4; end: 7

 These positions are not in line with the initial search text - this must
 be
 why the highlighting goes wrong. I guess my little trick to do this was a
 bit too simple because it messes up the positions basically because
 something different from the original source text is tokenized.


 Yes, this is exactly the problem.  I don't know enough about com4J or your
 stemmer, but some things come to mind:

 1. Are you having to restart/initialize the stemmer every time for your
 slow approach?  Does that really need to happen?
 2. Can the stemmer return something other than a String?  Say a String
 array of all the stemmed words?  Or maybe even some type of object that
 tells you the original word and the stemmed word?

 -Grant



Re: Integrating external stemmer in Solr and pre-processing text

2008-09-26 Thread Jaco
Thanks for these suggestions, will try it in the coming days and post my
findings in this thread.

Bye,

Jaco.

2008/9/26 Grant Ingersoll [EMAIL PROTECTED]


 On Sep 26, 2008, at 12:05 PM, Jaco wrote:

  Hi Grant,

 In reply to your questions:

 1. Are you having to restart/initialize the stemmer every time for your
 slow approach?  Does that really need to happen?

 It is invoking a COM object in Windows. The object is instantiated once
 for
 a token stream, and then invoked once for each token. The invoke always
 has
 an overhead, not much to do about that (sigh...)

 2. Can the stemmer return something other than a String?  Say a String
 array
 of all the stemmed words?  Or maybe even some type of object that tells
 you
 the original word and the stemmed word?

 The stemmer can only return a String. But, I do know that the returned
 string always has exactly the same number of words as the input string. So
 logically, it would be possible to :
 a) first calculate the position/start/end of each token in the input
 string
 (usual tokenization by Whitespace), resulting in token list 1
 b) then invoke the stemmer, and tokenize that result by Whitespace,
 resulting in token list 2
 c) 'merge' the token values of token list 2 into token list 1, which is
 possible because each token's position is the same in both lists...
 d) return that 'merged' token list 2 for further processing

 Would this work in Solr?


 I think so, assuming your stemmer tokenizes on whitespace as well.



 I can do some Java coding to achieve that from logical point of view, but
 I
 wouldn't know how to structure this flow into the MyTokenizerFactory, so
 some hints to achieve that would be great!



 One thought:
 Don't create an all in one Tokenizer.  Instead, keep the Whitespace
 Tokenizer as is.  Then, create a TokenFilter that buffers the whole document
 into a memory (via the next() implementation) and also creates, using
 StringBuilder, a string containing the whole text.  Once you've read it all
 in, then send the string to your stemmer, parse it back out and associate it
 back to your token buffer.  If you are guaranteed position, you could even
 keep a (linked) hash, such that it is really quick to look up tokens after
 stemming.

 Pseudocode looks something like:

 while (token.next != null)
   tokenMap.put(token.position, token)
   stringBuilder.append(' ').append(token.text)

 stemmedText = comObj.stem(stringBuilder.toString())
 correlateStemmedText(stemmedText, tokenMap)

 spit out the tokens one by one...


 I think this approach should be fast (but maybe not as fast as your all in
 one tokenizer) and will provide the correct position and offsets.  You do
 have to be careful w/ really big documents, as that map can be big.  You
 also want to be careful about map reuse, token reuse, etc.

 I believe there are a couple of buffering TokenFilters in Solr that you
 could examine for inspiration.  I think the RemoveDuplicatesTokenFilter (or
 whatever it's called) does buffering.

 -Grant






 Thanks for helping out!

 Jaco.


 2008/9/26 Grant Ingersoll [EMAIL PROTECTED]


 On Sep 26, 2008, at 9:40 AM, Jaco wrote:

 Hi,


 Here's some of the code of my Tokenizer:

 public class MyTokenizerFactory extends BaseTokenizerFactory
 {
  public WhitespaceTokenizer create(Reader input)
  {
 String text, normalizedText;

 try {
 text  = IOUtils.toString(input);
 normalizedText= *invoke my stemmer(text)*;

 }
 catch( IOException ex ) {
 throw new SolrException( SolrException.ErrorCode.SERVER_ERROR,
 ex );
 }

 StringReaderstringReader = new StringReader(normalizedText);

 return new WhitespaceTokenizer(stringReader);
  }
 }

 I see what's going in the analysis tool now, and I think I understand
 the
 problem. For instance, the text: abcdxxx defgxxx. Let's assume the
 stemmer
 gets rid of xxx.

 I would then see this in the analysis tool after the tokenizer stage:
 - abcd - term position 1; start: 1; end:  3
 - defg - term position 2; start: 4; end: 7

 These positions are not in line with the initial search text - this must
 be
 why the highlighting goes wrong. I guess my little trick to do this was
 a
 bit too simple because it messes up the positions basically because
 something different from the original source text is tokenized.


 Yes, this is exactly the problem.  I don't know enough about com4J or
 your
 stemmer, but some things come to mind:

 1. Are you having to restart/initialize the stemmer every time for your
 slow approach?  Does that really need to happen?
 2. Can the stemmer return something other than a String?  Say a String
 array of all the stemmed words?  Or maybe even some type of object that
 tells you the original word and the stemmed word?

 -Grant


 --
 Grant Ingersoll
 http://www.lucidimagination.com

 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ










Pre-processing text in custom FilterFactory / TokenizerFactory

2008-09-25 Thread Jaco
Hello,

I need to work with an external stemmer in Solr. This stemmer is accessible
as a COM object (running Solr in tomcat on Windows platform). I managed to
integrate this using the com4j library. I tried two scenario's:
1. Create a custom FilterFactory and Filter class for this. The external
stemmer is then invoked for every token
2. Create a custom TokenizerFactory, that invokes the external stemmer for
the entire search text, then puts the result of this into a StringReader,
and finally returns new WhitespaceTokenizer(stringReader), so the stemmed
text gets tokenized by the whitespace tokenizer.

Looking at search results, both scenario's appear to work from a functional
point of view. The first scenario however is too slow because of the
overhead of calling the external COM object for each token.

The second scenario is much faster, and also gives correct search results.
However, this then gives problems with highlighting - sometimes, errors are
reported (String out of Range), in other cases, I get incorrect highlight
fragments. Without knowing all details about this stuff, this makes sense
because of the change done to the text to be processed (I guess positions
get messed up then).  Maybe my second scenario is totally insane?

Any ideas on how to overcome this or any other suggestions on how to realise
this?

Cheers,

Jaco.

PS I posted this message yesterday, but it didn't come through, so this is
the 2nd try..


Pre-processing text in custom FilterFactory / TokenizerFactory

2008-09-24 Thread Jaco
Hello,

I need to work with an external stemmer, which is accessible as a COM
object. I managed to integrate this using the com4j library. I tried two
scenario's:
1. Create a custom FilterFactory and Filter class for this. The external
stemmer is then invoked for every token
2. Create a custom TokenizerFactory, that invokes the external stemmer for
the entire search text, then puts the result of this into a StringReader,
and finally returns new WhitespaceTokenizer(stringReader), so the stemmed
text gets tokenized by the whitespace tokenizer.

Both scenario's appear to work from a functional point of view. The first
scenario however is to slow because of the overhead of calling the external
COM object. The second scenario is much faster, and also gives correct
search results. However, this then gives problems with highlighting -
sometimes, errors are reported (String out of Range), in other cases, I get
incorrect highlight fragments. Without knowing all details about this stuff,
this makes sense because of the change done to the text to be processed (I
guess positions get messed up then).  Maybe my second scenario is totally
insane?

Any ideas on how to overcome this?

Cheers,

Jaco.


Re: scoring individual values in a multivalued field

2008-09-05 Thread Jaco
Hi,

I ran into the same problem some time ago, couldn't find any relation to the
boost values on the multivalued field and the search results. Does anybody
have an idea how to handle this?

Thanks,

Jaco.

2008/8/29 Sébastien Rainville [EMAIL PROTECTED]

 Hi,

 I have a multivalued field that I would want to score individually for each
 value. Is there an easy way to do that?

 Here's a concrete example of what I'm trying to achieve:

 Let's say that I have 3 documents with a field name_t and a multivalued
 field caracteristic_t_mv:

 doc
 field name=name_t boost=1.0Dog/field
 field name=caracteristic_t_mv boost=0.45Cool/field
 field name=caracteristic_t_mv boost=0.2Big/field
 field name=caracteristic_t_mv boost=0.89Dirty/field
 /doc

 doc
 field name=name_t boost=1.0Cat/field
 field name=caracteristic_t_mv boost=0.76Small/field
 field name=caracteristic_t_mv boost=0.32Dirty/field
 /doc

 doc
 field name=name_t boost=1.0Fish/field
 field name=caracteristic_t_mv boost=0.92Smells/field
 field name=caracteristic_t_mv boost=0.55Dirty/field
 /doc

 If I query only the field caracteristic_t_mv for the value Dirty I would
 like the documents to be sorted accordingly = get 1-3-2.

 It's possible to set the scoring of a field when indexing but there are 2
 problems with that:
 1) the value of the field boost is actually the multiplication of the value
 for the different boost values of the fields with the same name;
 2) the value of normField is persisted as a byte in the index and the
 precision loss hurts.

 Thanks in advance,
 Sebastien



Distributed search and facet counts using facet.limit=-1

2008-09-05 Thread Jaco
Hello,

I'm testing the distributed search using the shards= parameter, looking into
the facet counts (release: Solr 1.3.0 RC-2). I noticed that when using
facet.limit = -1 (to get unlimited number of facet values with count) there
are no facet counts returned at all. There is mention of this in
https://issues.apache.org/jira/browse/SOLR-303, but I don't see this working
OK in the release mentioned.

My query looks as follows:

http://localhost:8080/solr/select?q=my queryfl= some
fieldsfacet=truefacet.sort=truefacet.limit=01facet.field= my facet
field shards=localhost:8080/solr

Am I possibly doing something wrong or is this a bug?

Bye,

Jaco.


Re: Beginners question: adding a plugin

2008-08-28 Thread Jaco
That does the trick! Thanks for the quick reply (and for a great Solr
product!)

Bye,

Jaco.

2008/8/27 Grant Ingersoll [EMAIL PROTECTED]

 Instead of solr.TestStemFilterFactory, put the fully qualified classname
 for the TestStemFilterFactory, i.e.
 com.my.great.stemmer.TestStemFilterFactory.  The solr.FactoryName notation
 is just shorthand for org.apache.solr.BlahBlahBlah

 -Grant


 On Aug 27, 2008, at 3:27 PM, Jaco wrote:

  Hello,

 I'm pretty new to Solr, and not a Java expert, and trying to create my own
 plug in according to the instructions given in
 http://wiki.apache.org/solr/SolrPlugins. I want to integrate an external
 stemmer for the Dutch language by creating a new FilterFactory that will
 invoke the external stemmer for a TokenStream.

 First thing I want to do is just make sure I can get the plug in running.
 Here's what I did:
 - Take a copy of DutchStemFilterFactory.java, rename it to
 TestStemFilterFactory, renamed the class to TestStemFilterFactory
 - Successfully compiled the java using javac, and add the .class file to a
 jar file
 - Put the jar file in SOLR_HOME/lib
 - Put a line filter class=solr.TestStemFilterFactory / in my analyzer
 definition in schema.xml
 - Restart tomcat

 In the Tomcat log, there is an indication that the file is found:

 27-Aug-2008 20:58:25 org.apache.solr.core.SolrResourceLoader
 createClassLoader
 INFO: Adding 'file:/D:/Programs/Solr/lib/Test.jar' to Solr classloader

 But then I get errors being reported by Tomcat further down the log file:

 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'solr.TestStemFilterFactory'
   at

 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:256)
   at

 org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:261)
   at

 org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:83)
   at

 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)
 
 Caused by: java.lang.ClassNotFoundException: solr.TestStemFilterFactory
   at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
 .

 Probably some configuration issue somewhere, but I am in the dark here (as
 said: not a Java expert...). I've tried to find information in mailing
 list
 archives on this, but no luck so far. I'm Running Solr nightly build of
 20.08.2008, tomcat 5.5.26 on Windows.

 Any help would be much appreciated!

 Cheers,

 Jaco.


 --
 Grant Ingersoll
 http://www.lucidimagination.com

 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ










Beginners question: adding a plugin

2008-08-27 Thread Jaco
Hello,

I'm pretty new to Solr, and not a Java expert, and trying to create my own
plug in according to the instructions given in
http://wiki.apache.org/solr/SolrPlugins. I want to integrate an external
stemmer for the Dutch language by creating a new FilterFactory that will
invoke the external stemmer for a TokenStream.

First thing I want to do is just make sure I can get the plug in running.
Here's what I did:
- Take a copy of DutchStemFilterFactory.java, rename it to
TestStemFilterFactory, renamed the class to TestStemFilterFactory
- Successfully compiled the java using javac, and add the .class file to a
jar file
- Put the jar file in SOLR_HOME/lib
- Put a line filter class=solr.TestStemFilterFactory / in my analyzer
definition in schema.xml
- Restart tomcat

In the Tomcat log, there is an indication that the file is found:

27-Aug-2008 20:58:25 org.apache.solr.core.SolrResourceLoader
createClassLoader
INFO: Adding 'file:/D:/Programs/Solr/lib/Test.jar' to Solr classloader

But then I get errors being reported by Tomcat further down the log file:

SEVERE: org.apache.solr.common.SolrException: Error loading class
'solr.TestStemFilterFactory'
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:256)
at
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:261)
at
org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:83)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)

Caused by: java.lang.ClassNotFoundException: solr.TestStemFilterFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
.

Probably some configuration issue somewhere, but I am in the dark here (as
said: not a Java expert...). I've tried to find information in mailing list
archives on this, but no luck so far. I'm Running Solr nightly build of
20.08.2008, tomcat 5.5.26 on Windows.

Any help would be much appreciated!

Cheers,

Jaco.