Re: custom org.apache.lucene.store.Directory

2012-04-15 Thread Chris Male
I think what Radim is referring to is can he use the Infinispan Lucene
Directory
https://docs.jboss.org/author/display/ISPN/Infinispan+as+a+Directory+for+Lucene
in
Solr.

To do this Radim, I think you'll need to create an implementation of Solr's
DirectoryFactory that can load your Directory implementation.

On Mon, Apr 16, 2012 at 6:20 AM, Erick Erickson wrote:

> Please review:
>
> http://wiki.apache.org/solr/UsingMailingLists
>
> I have no clue what you mean by "supported" here.
>
> Best
> Erick
>
> 2012/4/14 Radim Kolar :
> > is custom /org.apache.lucene.store.Directory ///supported in Solr? I
> want to
> > try infinispan.
> > //
>



-- 
Chris Male | Software Developer | DutchWorks | www.dutchworks.nl


Re: Reducing heap space consumption for large dictionaries?

2011-12-12 Thread Chris Male
Hi,

Its good to hear some feedback on using the Hunspell dictionaries.
 Lucene's support is pretty new so we're obviously looking to improve it.
 Could you open a JIRA issue so we can explore whether there is some ways
to reduce memory consumption?

On Tue, Dec 13, 2011 at 5:37 PM, Maciej Lisiewski  wrote:

> Hi,
>>
>> in my index schema I has defined a
>> DictionaryCompoundWordTokenFil**terFactory and a
>> HunspellStemFilterFactory. Each FilterFactory has a dictionary with
>> about 100k entries.
>>
>> To avoid an out of memory error I have to set the heap space to 128m
>> for 1 index.
>>
>> Is there a way to reduce the memory consumption when parsing the
>> dictionary?
>> I need to create several indexes and 128m for each index is too much.
>>
>
> Same problem here - even with an empty index (no data yet) and two fields
> using Hunspell (pl_PL) I had to increase heap size to over 2GB for solr to
> start at all..
>
> Stempel using the very same dictionary works fine with 128M..
>
> --
> Maciej Lisiewski
>



-- 
Chris Male | Software Developer | DutchWorks | www.dutchworks.nl


Re: Problem with hunspell french dictionary

2011-12-01 Thread Chris Male
There seems that theres a problem with the code parsing the Dictionary.
 Can you open a JIRA issue with the same information so we can look into
fixing it?

On Thu, Dec 1, 2011 at 10:14 PM, Nathan Castelein <
nathan.castel...@gmail.com> wrote:

> Hi,
>
> I'm trying to add the HunspellStemFilterFactory to my Solr project.
>
> I'm trying this on a fresh new download of Solr 3.5.
>
> I downloaded french dictionary here (found it from here
> <
> http://wiki.services.openoffice.org/wiki/Dictionaries#French_.28France.2C_29
> >):
> http://www.dicollecte.org/download/fr/hunspell-fr-moderne-v4.3.zip
>
> But when I start Solr and go to the Solr Analysis, an error occurs in Solr.
>
> Is there the trace :
>
> java.lang.RuntimeException: Unable to load hunspell data!
> [dictionary=en_GB.dic,affix=fr-moderne.aff]
>at
> org.apache.solr.analysis.HunspellStemFilterFactory.inform(HunspellStemFilterFactory.java:82)
>at
> org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:546)
>at org.apache.solr.schema.IndexSchema.(IndexSchema.java:126)
>at org.apache.solr.core.CoreContainer.create(CoreContainer.java:461)
>at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316)
>at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207)
>at
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:130)
>at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94)
>at
> org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
>at
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>at
> org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
>at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
>at
> org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
>at
> org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
>at
> org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
>at
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>at
> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
>at
> org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
>at
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>at
> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
>at
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>at
> org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
>at org.mortbay.jetty.Server.doStart(Server.java:224)
>at
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>at java.lang.reflect.Method.invoke(Unknown Source)
>at org.mortbay.start.Main.invokeMain(Main.java:194)
>at org.mortbay.start.Main.start(Main.java:534)
>at org.mortbay.start.Main.start(Main.java:441)
>at org.mortbay.start.Main.main(Main.java:119)
>
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of
> range: 3 at java.lang.String.charAt(Unknown Source) at
>
> org.apache.lucene.analysis.hunspell.HunspellDictionary$DoubleASCIIFlagParsingStrategy.parseFlags(HunspellDictionary.java:382)
> at
>
> org.apache.lucene.analysis.hunspell.HunspellDictionary.parseAffix(HunspellDictionary.java:165)
> at
>
> org.apache.lucene.analysis.hunspell.HunspellDictionary.readAffixFile(HunspellDictionary.java:121)
> at
>
> org.apache.lucene.analysis.hunspell.HunspellDictionary.(HunspellDictionary.java:64)
> at
>
> org.apache.solr.analysis.HunspellStemFilterFactory.inform(HunspellStemFilterFactory.java:46)
> I can't find where the problem is. It seems like my dictionary isn't well
> written for hunspell, but I tried with two different dictionaries, and I
> had the same problem.
>
> I also tried with an english dictionary, and ... it works !
>
> So I think that my french dictionary is wrong for hunspell, but I
> don't know why ...
>
> Can you help me ?
>



-- 
Chris Male | Software Developer | DutchWorks | www.dutchworks.nl


Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Chris Male
>
>
> [X] ASF Mirrors (linked in our release announcements or via the Lucene
> website)
>
> [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [] I/we build them from source via an SVN/Git checkout.
>
> [] Other (someone in your company mirrors them internally or via a
> downstream project)
>


Re: UpdateRequestProcessor to avoid documents of being indexed

2009-12-10 Thread Chris Male
Hi,

Yeah thats what I was suggesting.  Did that work?

On Thu, Dec 10, 2009 at 12:24 PM, Marc Sturlese wrote:

>
> Do you mean something like?:
>
>@Override
>public void processAdd(AddUpdateCommand cmd) throws IOException {
>boolean addDocToIndex
> =dealWithSolrDocFields(cmd.getSolrInputDocument()) ;
>if (next != null && addDocToIndex) {
>next.processAdd(cmd);
>} else {
> LOG.debug("Doc skipped!") ;
>    }
>}
>
> Thanks in advance
>
>
>
> Chris Male wrote:
> >
> > Hi,
> >
> > If your UpdateRequestProcessor does not forward the AddUpdateCommand onto
> > the RunUpdateProcessor, I believe the document will not be indexed.
> >
> > Cheers
> >
> > On Thu, Dec 10, 2009 at 12:09 PM, Marc Sturlese
> > wrote:
> >
> >>
> >> Hey there,
> >> I need that once a document has been created be able to decide if I want
> >> it
> >> to be indexed or not. I have thought in implement an
> >> UpdateRequestProcessor
> >> to do that but don't know how to tell Solr in the processAdd void to
> skip
> >> the document.
> >> If I delete all the field would it be skiped or is there a better way to
> >> reach this goal?
> >> Thanks in advance.
> >> --
> >> View this message in context:
> >>
> http://old.nabble.com/UpdateRequestProcessor-to-avoid-documents-of-being-indexed-tp26725534p26725534.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
> > --
> > Chris Male | Software Developer | JTeam BV.| T: +31-(0)6-14344438 |
> > www.jteam.nl
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/UpdateRequestProcessor-to-avoid-documents-of-being-indexed-tp26725534p26725698.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Chris Male | Software Developer | JTeam BV.| www.jteam.nl


Re: UpdateRequestProcessor to avoid documents of being indexed

2009-12-10 Thread Chris Male
Hi,

If your UpdateRequestProcessor does not forward the AddUpdateCommand onto
the RunUpdateProcessor, I believe the document will not be indexed.

Cheers

On Thu, Dec 10, 2009 at 12:09 PM, Marc Sturlese wrote:

>
> Hey there,
> I need that once a document has been created be able to decide if I want it
> to be indexed or not. I have thought in implement an UpdateRequestProcessor
> to do that but don't know how to tell Solr in the processAdd void to skip
> the document.
> If I delete all the field would it be skiped or is there a better way to
> reach this goal?
> Thanks in advance.
> --
> View this message in context:
> http://old.nabble.com/UpdateRequestProcessor-to-avoid-documents-of-being-indexed-tp26725534p26725534.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Chris Male | Software Developer | JTeam BV.| T: +31-(0)6-14344438 |
www.jteam.nl


Re: Question: How do I run the solr analysis tool programtically ?

2009-09-03 Thread Chris Male
Hi Yatir,

The FieldAnalysisRequestHandler has the same behavior as the analysis tool.
It will show you the list of tokens that are created after each of the
filters have been applied.  It can be used through normal HTTP requests, or
you can use SolrJ's support.

Thanks,
Chris

On Thu, Sep 3, 2009 at 12:42 PM, Yatir  wrote:

>
> Form java code I want to contact solr through Http and supply a text buffer
> (or a url that returns text, whatever is easier) and I want to get in
> return
> the final list of tokens (or the final text buffer) after it went through
> all the query time filters defined for this solr instance (stemming, stop
> words etc)
> thanks in advance
>
> --
> View this message in context:
> http://www.nabble.com/Question%3A-How-do-I-run-the-solr-analysis-tool-programtically---tp25273484p25273484.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Solr Quoted search confusions

2009-08-21 Thread Chris Male
Hi,

I think the cause of the problem is the WordDelimiterFilterFactory.  With
your current configuration indexing i-like results in 3 terms being indexed
- i, like and ilike.  Then when you query for ilike, you match the 3rd
term.  The term ilike is created by the WordDelimiterFilter due to the
catenateWords="1" configuration.  When I change this to 0 only i and like
are created, hence ilike no longer matches i-like.

Hope that fixes your problem.

Thanks,
Chris

On Fri, Aug 21, 2009 at 7:16 AM, Vannia Rajan wrote:

> Hi,
>
> On Thu, Aug 20, 2009 at 9:13 PM, Chris Male  wrote:
>
> > Hi,
> >
> > What analyzers/filters have you configured for the field that you are
> > searching? One could be causing the various versions of "ilike" to be
> > indexed the same way.
> >
>
>   I'm using "text" field with the following analyzers / filters for the
> field "description" (which has various forms of word "ilike":
>
> positionIncrementGap="100">
>
>
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>
> protected="protwords.txt"/>
>
>
>
>
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> words="stopwords.txt"/>
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>
> protected="protwords.txt"/>
>
>
>
>
>
> Is there anything that i could tune here to get the intended results?
>
>
> >
> > Thanks
> > Chris
> >
> > On Thu, Aug 20, 2009 at 5:29 PM, Vannia Rajan  > >wrote:
> >
> > > Hi,*
> > >
> > >   *I need some help to clarify how solr indexes documents. I have 6
> > > documents with various forms of the word "ilike" (complete word and not
> > "i
> > > like") - one having "ilike" as such and others having a special
> character
> > > in
> > > between "i" and "like".
> > >
> > >   What i expected from solr is that, when i do a Quoted search "ilike",
> > it
> > > should return only the document that had "ilike" exactly. But, what i
> get
> > > from solr is that various forms of the word "ilike" are also included
> in
> > > the
> > > results. Is there an option/configuration that i can do to solr so that
> i
> > > will get only the result with exact word "ilike"?
> > > *
> > >
> > >  The result i obtained from solr is shown below,
> > >
> > > http://localhost:8080/solr/select/?q=%22ilike%22&fl=description,score
> > > 
> > > -
> > > 
> > > 0
> > > 20
> > > -
> > > 
> > > description,score
> > > "ilike"
> > > 
> > > 
> > > -
> > > 
> > > -
> > > 
> > > 0.5
> > > Ilike company is doing great!
> > > 
> > > -
> > > 
> > > 0.375
> > > I:like company is doing great!
> > > 
> > > -
> > > 
> > > 0.3125
> > > I-like it very much. Really, this can come
> > > up!.
> > > 
> > > -
> > > 
> > > 0.3125
> > > I;like it very much. Really, i say.
> > > 
> > > -
> > > 
> > > 0.25
> > > -
> > > 
> > > i.like it very much. full stop can come? i don't know.
> > > 
> > > 
> > > 
> > >  > >
> > > --
> > > Thanks,
> > > Vanniarajan
> > >
> >
>
>
>
> --
> Thanks,
> Vanniarajan
>


Re: Solr Quoted search confusions

2009-08-20 Thread Chris Male
Hi,

What analyzers/filters have you configured for the field that you are
searching? One could be causing the various versions of "ilike" to be
indexed the same way.

Thanks
Chris

On Thu, Aug 20, 2009 at 5:29 PM, Vannia Rajan wrote:

> Hi,*
>
>   *I need some help to clarify how solr indexes documents. I have 6
> documents with various forms of the word "ilike" (complete word and not "i
> like") - one having "ilike" as such and others having a special character
> in
> between "i" and "like".
>
>   What i expected from solr is that, when i do a Quoted search "ilike", it
> should return only the document that had "ilike" exactly. But, what i get
> from solr is that various forms of the word "ilike" are also included in
> the
> results. Is there an option/configuration that i can do to solr so that i
> will get only the result with exact word "ilike"?
> *
>
>  The result i obtained from solr is shown below,
>
> http://localhost:8080/solr/select/?q=%22ilike%22&fl=description,score
> 
> -
> 
> 0
> 20
> -
> 
> description,score
> "ilike"
> 
> 
> -
> 
> -
> 
> 0.5
> Ilike company is doing great!
> 
> -
> 
> 0.375
> I:like company is doing great!
> 
> -
> 
> 0.3125
> I-like it very much. Really, this can come
> up!.
> 
> -
> 
> 0.3125
> I;like it very much. Really, i say.
> 
> -
> 
> 0.25
> -
> 
> i.like it very much. full stop can come? i don't know.
> 
> 
> 
> 
> --
> Thanks,
> Vanniarajan
>


Re: I think this is a "bug"

2009-08-13 Thread Chris Male
Hi Paul,

Yes the comment does look very wrong.  I'll open a JIRA issue and include a
fix.

On Thu, Aug 13, 2009 at 4:43 PM, Paul Tomblin  wrote:

> I don't want to join yet another mailing list or register for JIRA,
> but I just noticed that the Javadocs for
> SolrInputDocument.addField(String name, Object value, float boost) is
> incredibly wrong - it looks like it was copied from a "deleteAll"
> method.
>
>
> --
> http://www.linkedin.com/in/paultomblin
>


Using Filters with SolrIndexSearcher

2008-10-01 Thread Chris Male
Hello,

I have a lucene CustomScoreQuery which I am wanting to execute with a lucene
Filter.  However the getDocListAndSet API provided by the SolrIndexSearcher
doesn't seem to allow Filters to be used along with Queries.  Instead it
seems that the Filters must be first converted to a DocSet.  Internally, the
SolrIndexSearcher then seems to call Searcher.search(Query, HitCollector),
which results in my CustomScoreQuery being called for every document
resulting from the query (which in my case is every document in the index),
instead of those resulting from the Query and the Filter.  Ideally
SolrIndexSearcher would call something like Searcher.search(Query, Filter,
Sort) (although I understand this particular method returns Hits, which is
deprecated).  Is there anyway to achieve this functionality without
extending the SolrIndexSearcher API? In other words, is it possible to run
the query on top of a filtered view of the index so the CustomScoreQuery
will not be called for every document in the it?

Thanks
Chris