RE: Search differences between solr 1.4.0 and 3.6.1
Also, i'm having issues with searching "RoC" . It returns thousands of matches on 3.6.1 against just a few on solr 1.4.0. Looking to analysis I see no differences... Should I add "RoC" to protected keywords or can I tweak something on schema to achieve exact "RoC" matches? -Mensagem original- De: Frederico Azeiteiro [mailto:frederico.azeite...@cision.com] Enviada: quarta-feira, 28 de Novembro de 2012 17:19 Para: solr-user@lucene.apache.org Assunto: RE: Search differences between solr 1.4.0 and 3.6.1 Ok, I'll test that and let you know. Is there some test I can easily do to confirm that is was really a side-effect of the bug? ____ Frederico Azeiteiro Developer -Mensagem original- De: Jack Krupansky [mailto:j...@basetechnology.com] Enviada: quarta-feira, 28 de Novembro de 2012 13:39 Para: solr-user@lucene.apache.org Assunto: Re: Search differences between solr 1.4.0 and 3.6.1 You need to add the generateNumberParts=1 attribute - assuming you actually want the number generated. The fact that your schema worked in 1.4 was probably simply a side effect of this bug: https://issues.apache.org/jira/browse/SOLR-1706 "wrong tokens output from WordDelimiterFilter depending upon options" -- Jack Krupansky -----Original Message- From: Frederico Azeiteiro Sent: Monday, November 26, 2012 9:06 AM To: solr-user@lucene.apache.org Subject: Search differences between solr 1.4.0 and 3.6.1 Hi, While updating our SOLR to 3.6.1 I noticed some results differences when using search strings with letters+number. For a text field defined as: <http://cbrsrvmtr04:8983/solr/WISE/admin/file/?file=schema.xml> <http://cbrsrvmtr04:8983/solr/WISE/admin/file/?file=schema.xml> Searching for string GAMES12 returns a lot of results on 3.6.1 that are not returned on 1.4.0. It looks like WordDelimiterFilterFactory is acting different for 3.6.1, the numeric part of the keyword is being ignored and the search is performed using only GAMES. Analisys returns for 1.4.0: org.apache.solr.analysis.WordDelimiterFilterFactory {protected=protwords.txt, splitOnCaseChange=1, generateNumberParts=0, catenateWords=0, generateWordParts=1, catenateAll=0, catenateNumbers=0} term position 1 2 term text GAMES 12 term type word word source start,end 0,5 5,7 payload AND for 3.6.1 org.apache.solr.analysis.WordDelimiterFilterFactory {protected=protwords.txt, splitOnCaseChange=1, generateNumberParts=0, catenateWords=0, luceneMatchVersion=LUCENE_36, generateWordParts=1, catenateAll=0, catenateNumbers=0} position 1 term text GAMES startOffset 0 endOffset 5 type word positionLength 1 Is this something that can be modified/fixed to return the same results? Thank you. Regards, Frederico
RE: Search differences between solr 1.4.0 and 3.6.1
Ok, I'll test that and let you know. Is there some test I can easily do to confirm that is was really a side-effect of the bug? Frederico Azeiteiro Developer -Mensagem original- De: Jack Krupansky [mailto:j...@basetechnology.com] Enviada: quarta-feira, 28 de Novembro de 2012 13:39 Para: solr-user@lucene.apache.org Assunto: Re: Search differences between solr 1.4.0 and 3.6.1 You need to add the generateNumberParts=1 attribute - assuming you actually want the number generated. The fact that your schema worked in 1.4 was probably simply a side effect of this bug: https://issues.apache.org/jira/browse/SOLR-1706 "wrong tokens output from WordDelimiterFilter depending upon options" -- Jack Krupansky -Original Message- From: Frederico Azeiteiro Sent: Monday, November 26, 2012 9:06 AM To: solr-user@lucene.apache.org Subject: Search differences between solr 1.4.0 and 3.6.1 Hi, While updating our SOLR to 3.6.1 I noticed some results differences when using search strings with letters+number. For a text field defined as: <http://cbrsrvmtr04:8983/solr/WISE/admin/file/?file=schema.xml> <http://cbrsrvmtr04:8983/solr/WISE/admin/file/?file=schema.xml> Searching for string GAMES12 returns a lot of results on 3.6.1 that are not returned on 1.4.0. It looks like WordDelimiterFilterFactory is acting different for 3.6.1, the numeric part of the keyword is being ignored and the search is performed using only GAMES. Analisys returns for 1.4.0: org.apache.solr.analysis.WordDelimiterFilterFactory {protected=protwords.txt, splitOnCaseChange=1, generateNumberParts=0, catenateWords=0, generateWordParts=1, catenateAll=0, catenateNumbers=0} term position 1 2 term text GAMES 12 term type word word source start,end 0,5 5,7 payload AND for 3.6.1 org.apache.solr.analysis.WordDelimiterFilterFactory {protected=protwords.txt, splitOnCaseChange=1, generateNumberParts=0, catenateWords=0, luceneMatchVersion=LUCENE_36, generateWordParts=1, catenateAll=0, catenateNumbers=0} position 1 term text GAMES startOffset 0 endOffset 5 type word positionLength 1 Is this something that can be modified/fixed to return the same results? Thank you. Regards, Frederico
RE: Search differences between solr 1.4.0 and 3.6.1
Sorry, ignore the "<http://cbrsrvmtr04:8983/solr/WISE/admin/file/?file=schema.xml>". Somehow that text appeared when I copy/pasted the XML from IE and I did not notice, but that is not part of the schema... :) Still can't figure this thing out... -Mensagem original- De: Erick Erickson [mailto:erickerick...@gmail.com] Enviada: quarta-feira, 28 de Novembro de 2012 12:52 Para: solr-user@lucene.apache.org Assunto: Re: Search differences between solr 1.4.0 and 3.6.1 Well, I get the same results in 1.4 and 3.6. The only difference is I didn't put <http://cbrsrvmtr04:8983/solr/WISE/admin/file/?file=schema.xml> in. In both cases the 12 is missing from the query analysis but is in the index analysis, due to the catenateNumbers being 1 in one case and 0 in the other. So Im guessing there's something else going on that you're overlooking, but don't have any good clue Best Erick On Wed, Nov 28, 2012 at 4:34 AM, Frederico Azeiteiro < frederico.azeite...@cision.com> wrote: > I just reload both indexes just to make sure that all definitions are > loaded. > On Analysis tool I can see differences, even that the fields are > defined on the same way: > > Query Analyser for 3.6.1 > org.apache.solr.analysis.WordDelimiterFilterFactory > {protected=protwords.txt, splitOnCaseChange=1, generateNumberParts=0, > catenateWords=0, luceneMatchVersion=LUCENE_36, generateWordParts=1, > catenateAll=0, catenateNumbers=0} term text: GAMES > > Query Analyser for 1.4.0 > org.apache.solr.analysis.WordDelimiterFilterFactory > {protected=protwords.txt, splitOnCaseChange=1, generateNumberParts=0, > catenateWords=0, generateWordParts=1, catenateAll=0, > catenateNumbers=0} term text: GAMES | 12 > > The "12" is lost on query for 3.6.1. > The only diference I can see on the field definition is the > "luceneMatchVersion=LUCENE_36"... Could it cause this issue? > > Thank you. > Frederico > > -Mensagem original- > De: Erick Erickson [mailto:erickerick...@gmail.com] > Enviada: terça-feira, 27 de Novembro de 2012 12:26 > Para: solr-user@lucene.apache.org > Assunto: Re: Search differences between solr 1.4.0 and 3.6.1 > > Using the definition you provided, I don't get the same output. Are > you sure you are doing what you think? The generateNumberParts=0 keeps the > '12' > from making it through the filter in 1.4 and 3.6 so I suspect you're > not quite doing something the same way in both. > > Perhaps looking at index tokenization in one and query in the other? > > Best > Erick > > > On Mon, Nov 26, 2012 at 9:06 AM, Frederico Azeiteiro < > frederico.azeite...@cision.com> wrote: > > > Hi, > > > > > > > > While updating our SOLR to 3.6.1 I noticed some results differences > > when using search strings with letters+number. > > > > For a text field defined as: > > > > > > <http://cbrsrvmtr04:8983/solr/WISE/admin/file/?file=schema.xml> > > > > > > > > > mapping="mapping-ISOLatin1Accent.txt"/> > > > > > protected="protwords.txt" splitOnCaseChange="1" catenateAll="0" > > catenateNumbers="1" catenateWords="1" generateNumberParts="0" > > generateWordParts="1" stemEnglishPossessive="0"/> > > > > > > > > > > <http://cbrsrvmtr04:8983/solr/WISE/admin/file/?file=schema.xml> > > > > > > > > > expand="true" synonyms="synonyms.txt"/> > > > > > protected="protwords.txt" splitOnCaseChange="1" catenateAll="0" > > catenateNumbers="0" catenateWords="0" generateNumberParts="0" > > generateWordParts="1"/> > > > > > > > > > > > > Searching for string GAMES12 returns a lot of results on 3.6.1 that > > are not returned on 1.4.0. > > > > > > > > It looks like WordDelimiterFilterFactory is acting different for > > 3.6.1, the numeric part of the keyword is being ignored and the > > search is performed using only GAMES. > > > > > > > > Analisys returns for 1.4.0: > > > > org.apache.solr.analysis.WordDelimiterFilterFactory > > {protected=protwords.txt, splitOnCaseChange=1, > > generateNumberParts=0, catenateWords=0, generateWordParts=1, > > catenateAll=0, catenateNumbers=0} > > > > term position > > > > 1 > > > > 2 > > > > term text > > > > GAMES > > > > 12 > > > > term type > > > > word > > > > word > > > > source start,end > > > > 0,5 > > > > 5,7 > > > > payload > > > > > > > > > > > > AND for 3.6.1 > > > > > > > > org.apache.solr.analysis.WordDelimiterFilterFactory > > {protected=protwords.txt, splitOnCaseChange=1, > > generateNumberParts=0, catenateWords=0, > > luceneMatchVersion=LUCENE_36, generateWordParts=1, catenateAll=0, > > catenateNumbers=0} > > > > position > > > > 1 > > > > term text > > > > GAMES > > > > startOffset > > > > 0 > > > > endOffset > > > > 5 > > > > type > > > > word > > > > positionLength > > > > 1 > > > > > > > > > > > > Is this something that can be modified/fixed to return the same results? > > > > > > > > Thank you. > > > > > > > > Regards, > > > > Frederico > > > > > > > > > > > > >
RE: Search differences between solr 1.4.0 and 3.6.1
I just reload both indexes just to make sure that all definitions are loaded. On Analysis tool I can see differences, even that the fields are defined on the same way: Query Analyser for 3.6.1 org.apache.solr.analysis.WordDelimiterFilterFactory {protected=protwords.txt, splitOnCaseChange=1, generateNumberParts=0, catenateWords=0, luceneMatchVersion=LUCENE_36, generateWordParts=1, catenateAll=0, catenateNumbers=0} term text: GAMES Query Analyser for 1.4.0 org.apache.solr.analysis.WordDelimiterFilterFactory {protected=protwords.txt, splitOnCaseChange=1, generateNumberParts=0, catenateWords=0, generateWordParts=1, catenateAll=0, catenateNumbers=0} term text: GAMES | 12 The "12" is lost on query for 3.6.1. The only diference I can see on the field definition is the "luceneMatchVersion=LUCENE_36"... Could it cause this issue? Thank you. Frederico -Mensagem original- De: Erick Erickson [mailto:erickerick...@gmail.com] Enviada: terça-feira, 27 de Novembro de 2012 12:26 Para: solr-user@lucene.apache.org Assunto: Re: Search differences between solr 1.4.0 and 3.6.1 Using the definition you provided, I don't get the same output. Are you sure you are doing what you think? The generateNumberParts=0 keeps the '12' from making it through the filter in 1.4 and 3.6 so I suspect you're not quite doing something the same way in both. Perhaps looking at index tokenization in one and query in the other? Best Erick On Mon, Nov 26, 2012 at 9:06 AM, Frederico Azeiteiro < frederico.azeite...@cision.com> wrote: > Hi, > > > > While updating our SOLR to 3.6.1 I noticed some results differences > when using search strings with letters+number. > > For a text field defined as: > > > <http://cbrsrvmtr04:8983/solr/WISE/admin/file/?file=schema.xml> > > > > mapping="mapping-ISOLatin1Accent.txt"/> > > protected="protwords.txt" splitOnCaseChange="1" catenateAll="0" > catenateNumbers="1" catenateWords="1" generateNumberParts="0" > generateWordParts="1" stemEnglishPossessive="0"/> > > > > > <http://cbrsrvmtr04:8983/solr/WISE/admin/file/?file=schema.xml> > > > > expand="true" synonyms="synonyms.txt"/> > > protected="protwords.txt" splitOnCaseChange="1" catenateAll="0" > catenateNumbers="0" catenateWords="0" generateNumberParts="0" > generateWordParts="1"/> > > > > > > Searching for string GAMES12 returns a lot of results on 3.6.1 that > are not returned on 1.4.0. > > > > It looks like WordDelimiterFilterFactory is acting different for > 3.6.1, the numeric part of the keyword is being ignored and the search > is performed using only GAMES. > > > > Analisys returns for 1.4.0: > > org.apache.solr.analysis.WordDelimiterFilterFactory > {protected=protwords.txt, splitOnCaseChange=1, generateNumberParts=0, > catenateWords=0, generateWordParts=1, catenateAll=0, > catenateNumbers=0} > > term position > > 1 > > 2 > > term text > > GAMES > > 12 > > term type > > word > > word > > source start,end > > 0,5 > > 5,7 > > payload > > > > > > AND for 3.6.1 > > > > org.apache.solr.analysis.WordDelimiterFilterFactory > {protected=protwords.txt, splitOnCaseChange=1, generateNumberParts=0, > catenateWords=0, luceneMatchVersion=LUCENE_36, generateWordParts=1, > catenateAll=0, catenateNumbers=0} > > position > > 1 > > term text > > GAMES > > startOffset > > 0 > > endOffset > > 5 > > type > > word > > positionLength > > 1 > > > > > > Is this something that can be modified/fixed to return the same results? > > > > Thank you. > > > > Regards, > > Frederico > > > > > >
RE: Error loading class solr.CJKBigramFilterFactory
:) Just installed 3.6.1 and its working just fine. Something should be wrong with my tomcat/solr install. Thank you Robert. //Frederico -Mensagem original- De: Robert Muir [mailto:rcm...@gmail.com] Enviada: quarta-feira, 14 de Novembro de 2012 19:18 Para: solr-user@lucene.apache.org Assunto: Re: Error loading class solr.CJKBigramFilterFactory I'm sure. I added it to 3.6 ;) You must have something funky with your tomcat configuration, like an exploded war with different versions of jars or some other form of jar hell. On Wed, Nov 14, 2012 at 9:32 AM, Frederico Azeiteiro wrote: > Are you sure about that? > > We have it working on: > > Solr Specification Version: 3.5.0.2011.11.22.14.54.38 Solr > Implementation Version: 3.5.0 1204988 - simon - 2011-11-22 14:54:38 > Lucene Specification Version: 3.5.0 Lucene Implementation Version: > 3.5.0 1204988 - simon - 2011-11-22 14:46:51 Current Time: Wed Nov 14 > 17:30:07 WET 2012 Server Start Time:Wed Nov 14 11:40:36 WET 2012 > > ?? > > Thanks, > Frederico > > > -Mensagem original- > De: Robert Muir [mailto:rcm...@gmail.com] > Enviada: quarta-feira, 14 de Novembro de 2012 16:28 > Para: solr-user@lucene.apache.org > Assunto: Re: Error loading class solr.CJKBigramFilterFactory > > On Wed, Nov 14, 2012 at 8:12 AM, Frederico Azeiteiro > wrote: >> Fo make some further testing I installed SOLR 3.5.0 using default >> Jetty server. >> >> When tried to start SOLR using the same schema I get: >> >> >> >> SEVERE: org.apache.solr.common.SolrException: Error loading class >> 'solr.CJKBigramFilterFactory' > > This filter was added in 3.6, so its expected that it wouldnt be found.
RE: Error loading class solr.CJKBigramFilterFactory
Are you sure about that? We have it working on: Solr Specification Version: 3.5.0.2011.11.22.14.54.38 Solr Implementation Version: 3.5.0 1204988 - simon - 2011-11-22 14:54:38 Lucene Specification Version: 3.5.0 Lucene Implementation Version: 3.5.0 1204988 - simon - 2011-11-22 14:46:51 Current Time: Wed Nov 14 17:30:07 WET 2012 Server Start Time:Wed Nov 14 11:40:36 WET 2012 ?? Thanks, Frederico -Mensagem original- De: Robert Muir [mailto:rcm...@gmail.com] Enviada: quarta-feira, 14 de Novembro de 2012 16:28 Para: solr-user@lucene.apache.org Assunto: Re: Error loading class solr.CJKBigramFilterFactory On Wed, Nov 14, 2012 at 8:12 AM, Frederico Azeiteiro wrote: > Fo make some further testing I installed SOLR 3.5.0 using default > Jetty server. > > When tried to start SOLR using the same schema I get: > > > > SEVERE: org.apache.solr.common.SolrException: Error loading class > 'solr.CJKBigramFilterFactory' This filter was added in 3.6, so its expected that it wouldnt be found.
Error loading class solr.CJKBigramFilterFactory
Hi, I've been testing some CJK tokenizers and I manage to get acceptable results using: The tests have been done using SOLR 3.5.0 on TomCat7. Fo make some further testing I installed SOLR 3.5.0 using default Jetty server. When tried to start SOLR using the same schema I get: SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.CJKBigramFilterFactory' SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.CJKWidthFilterFactory' Should these classes com on v. 3.5.0 by default? Do I need to install anything or copy any lib? Thank you all. Frederico
Solr 3.5.0 - different behaviour on rows?
Hi all, Just testing SOLR 3.5.0. and notice a different behavior on this new version: select?rows=10&q=sig%3a("54ba3e8fd3d5d8371f0e01c403085a0c")&? this query returns no results on my indexes, but works for SOLR 1.4.0 and returns "Java heap space java.lang.OutOfMemoryError: Java heap space" on SOLR 3.5.0 Is this normal? As there are no results, why the OutOfMemoryError? Is it some memory allocated based on the rows number? Regards, Frederico
Recover index
Hello all, When moving a SOLR index to another instance I lost the files: segments.gen segments_xk I have the .cfs file complete. What are my options to recover the data. Any ideia that I can test? Thank you. Frederico Azeiteiro
RE: Using MLT feature
Yes, i guess that could be an option, but I'm not very experienced with Java development and SOLR modifications. As my main goal was to create a similar sig in C#, I just use the c# method to create the sig myself before indexing instead of SOLR Deduplicate function. That way, when searching I could use the same method with the certain the sig is the same. As the algorytm used is the same of textProfileSignature the result is the same as using SOLR deduplicate. Frederico -Original Message- From: lboutros [mailto:boutr...@gmail.com] Sent: sexta-feira, 8 de Abril de 2011 10:11 To: solr-user@lucene.apache.org Subject: Re: Using MLT feature Couldn't you extend the TextProfileSignature and modify the TokenComparator class to use lexical order when token have the same frequency ? Ludovic. 2011/4/8 Frederico Azeiteiro [via Lucene] < ml-node+2794604-1683988626-383...@n3.nabble.com> > Hi. > > Yes, I manage to create a stable comparator in c# for profile. > The problem is before that on: > > ... > tokens.put(s, tok); > ... > > Imagine you have 2 tokens with the same frequency, on the stable sort > comparator for profile it will maintain the original order. > The problem is that the original order comes from the way they are > inserted in hashmap 'tokens' and not from the order the tokens appear on > original text. > > Frederico > > -Original Message- > From: lboutros [mailto:[hidden > email]<http://user/SendEmail.jtp?type=node&node=2794604&i=0&by-user=t>] > > Sent: sexta-feira, 8 de Abril de 2011 09:49 > To: [hidden > email]<http://user/SendEmail.jtp?type=node&node=2794604&i=1&by-user=t> > Subject: Re: Using MLT feature > > It seems that tokens are sorted by frequencies : > > ... > Collections.sort(profile, new TokenComparator()); > ... > > > and > > private static class TokenComparator implements Comparator { > public int compare(Token t1, Token t2) { > return t2.cnt - t1.cnt; > } > > and cnt is the token count. > > Ludovic. > > 2011/4/7 Frederico Azeiteiro [via Lucene] < > [hidden > email]<http://user/SendEmail.jtp?type=node&node=2794604&i=2&by-user=t>> > > > > Well at this point I'm more dedicated to the Deduplicate issue. > > > > Using a Min_token_len of 4 I'm getting nice comparison results. MLT > returns > > a lot of similar docs that I don't consider similar - even tuning the > > parameters. > > > > Finishing this issue, I found out that the signature also contains the > > field name meaning that if you wish to signature both title and text > fields, > > your signature will be a hash of ("text"+"text value"+"title"+"title > > value"). > > > > In any case, I found that the Hashmap used on the hash algorithm > inserts > > the tokens by some hashmap internal sort method that I can't > understand :), > > and so, impossible to copy to C# implementation. > > > > Thank you for all your help, > > Frederico > > > > > > > - > Jouve > France. > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Using-MLT-feature-tp2774454p2794585.h<http://lucene.472066.n3.nabble.com/Using-MLT-feature-tp2774454p2794585.h?by-user=t> > tml > Sent from the Solr - User mailing list archive at Nabble.com. > > > -- > If you reply to this email, your message will be added to the discussion > below: > http://lucene.472066.n3.nabble.com/Using-MLT-feature-tp2774454p2794604.html > To start a new topic under Solr - User, email > ml-node+472068-1765922688-383...@n3.nabble.com > To unsubscribe from Solr - User, click > here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472068&code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0Mzk2MDUxNjE=>. > > - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Using-MLT-feature-tp2774454p2794622.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Using MLT feature
Hi. Yes, I manage to create a stable comparator in c# for profile. The problem is before that on: ... tokens.put(s, tok); ... Imagine you have 2 tokens with the same frequency, on the stable sort comparator for profile it will maintain the original order. The problem is that the original order comes from the way they are inserted in hashmap 'tokens' and not from the order the tokens appear on original text. Frederico -Original Message- From: lboutros [mailto:boutr...@gmail.com] Sent: sexta-feira, 8 de Abril de 2011 09:49 To: solr-user@lucene.apache.org Subject: Re: Using MLT feature It seems that tokens are sorted by frequencies : ... Collections.sort(profile, new TokenComparator()); ... and private static class TokenComparator implements Comparator { public int compare(Token t1, Token t2) { return t2.cnt - t1.cnt; } and cnt is the token count. Ludovic. 2011/4/7 Frederico Azeiteiro [via Lucene] < ml-node+2790579-1141723501-383...@n3.nabble.com> > Well at this point I'm more dedicated to the Deduplicate issue. > > Using a Min_token_len of 4 I'm getting nice comparison results. MLT returns > a lot of similar docs that I don't consider similar - even tuning the > parameters. > > Finishing this issue, I found out that the signature also contains the > field name meaning that if you wish to signature both title and text fields, > your signature will be a hash of ("text"+"text value"+"title"+"title > value"). > > In any case, I found that the Hashmap used on the hash algorithm inserts > the tokens by some hashmap internal sort method that I can't understand :), > and so, impossible to copy to C# implementation. > > Thank you for all your help, > Frederico > > - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Using-MLT-feature-tp2774454p2794585.h tml Sent from the Solr - User mailing list archive at Nabble.com.
RE: Using MLT feature
Well at this point I'm more dedicated to the Deduplicate issue. Using a Min_token_len of 4 I'm getting nice comparison results. MLT returns a lot of similar docs that I don't consider similar - even tuning the parameters. Finishing this issue, I found out that the signature also contains the field name meaning that if you wish to signature both title and text fields, your signature will be a hash of ("text"+"text value"+"title"+"title value"). In any case, I found that the Hashmap used on the hash algorithm inserts the tokens by some hashmap internal sort method that I can't understand :), and so, impossible to copy to C# implementation. Thank you for all your help, Frederico -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: quinta-feira, 7 de Abril de 2011 04:09 To: solr-user@lucene.apache.org Subject: Re: Using MLT feature A "fuzzy signature" system will not work here. You are right, you want to try MLT instead. Lance On Wed, Apr 6, 2011 at 9:47 AM, Frederico Azeiteiro wrote: > Yes, I had already check the code for it and use it to compile a c# method > that returns the same signature. > > But I have a strange issue: > For instance, using MinTokenLenght=2 and default QUANT_RATE, passing the > text "frederico" (simple text no big deal here): > > 1. using my c# app returns "8b92e01d67591dfc60adf9576f76a055" > 2. using SOLR, passing a doc with HeadLine "frederico" I get > "8d9a5c35812ba75b8383d4538b91080f" on my signature field. > 3. Created a Java app (i'm not a Java expert..), using the code from SOLR > SignatureUpdateProcessorFactory class (please check code below) and I get > "8b92e01d67591dfc60adf9576f76a055". > > Java app code: > TextProfileSignature textProfileSignature = new > TextProfileSignature(); > NamedList params = new NamedList(); > params.add("", ""); > SolrParams solrParams = SolrParams.toSolrParams(params); > textProfileSignature.init(solrParams); > textProfileSignature.add("frederico"); > > > byte[] signature = textProfileSignature.getSignature(); > char[] arr = new char[signature.length << 1]; > for (int i = 0; i < signature.length; i++) { > int b = signature[i]; > int idx = i << 1; > arr[idx] = StrUtils.HEX_DIGITS[(b >> 4) & 0xf]; > arr[idx + 1] = StrUtils.HEX_DIGITS[b & 0xf]; > } > String sigString = new String(arr); > System.out.println(sigString); > > > > > Here's my processor configs: > > > class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory"> > true > sig > false > HeadLine > name="signatureClass">org.apache.solr.update.processor.TextProfileSignature > 2 > > > > > > > So both my apps (Java and C#) return the same signature but SOLR returns a > different one.. > Can anyone understand what I should be doing wrong? > > Thank you once again. > > Frederico > > -Original Message- > From: Markus Jelsma [mailto:markus.jel...@openindex.io] > Sent: terça-feira, 5 de Abril de 2011 15:20 > To: solr-user@lucene.apache.org > Cc: Frederico Azeiteiro > Subject: Re: Using MLT feature > > If you check the code for TextProfileSignature [1] your'll notice the init > method reading params. You can set those params as you did. Reading Javadoc > [2] might help as well. But what's not documented in the Javadoc is how QUANT > is computed; it rounds. > > [1]: > http://svn.apache.org/viewvc/lucene/solr/branches/branch-1.4/src/java/org/apache/solr/update/processor/TextProfileSignature.java?view=markup > [2]: > http://lucene.apache.org/solr/api/org/apache/solr/update/processor/TextProfileSignature.html > > On Tuesday 05 April 2011 16:10:08 Frederico Azeiteiro wrote: >> Thank you, I'll try to create a c# method to create the same sig of SOLR, >> and then compare both sigs before index the doc. This way I can avoid the >> indexation of existing docs. >> >> If anyone needs to use this parameter (as this info is not on the wiki), >> you can add the option >> >> 5 >> >> On the processor tag. >> >> Best regards, >> Frederico >> >> >> -Original Message- >> From: Markus Jelsma [mailto:markus.jel...@openindex.io] >> Sent: terç
RE: Using MLT feature
Yes, I had already check the code for it and use it to compile a c# method that returns the same signature. But I have a strange issue: For instance, using MinTokenLenght=2 and default QUANT_RATE, passing the text "frederico" (simple text no big deal here): 1. using my c# app returns "8b92e01d67591dfc60adf9576f76a055" 2. using SOLR, passing a doc with HeadLine "frederico" I get "8d9a5c35812ba75b8383d4538b91080f" on my signature field. 3. Created a Java app (i'm not a Java expert..), using the code from SOLR SignatureUpdateProcessorFactory class (please check code below) and I get "8b92e01d67591dfc60adf9576f76a055". Java app code: TextProfileSignature textProfileSignature = new TextProfileSignature(); NamedList params = new NamedList(); params.add("", ""); SolrParams solrParams = SolrParams.toSolrParams(params); textProfileSignature.init(solrParams); textProfileSignature.add("frederico"); byte[] signature = textProfileSignature.getSignature(); char[] arr = new char[signature.length << 1]; for (int i = 0; i < signature.length; i++) { int b = signature[i]; int idx = i << 1; arr[idx] = StrUtils.HEX_DIGITS[(b >> 4) & 0xf]; arr[idx + 1] = StrUtils.HEX_DIGITS[b & 0xf]; } String sigString = new String(arr); System.out.println(sigString); Here's my processor configs: true sig false HeadLine org.apache.solr.update.processor.TextProfileSignature 2 So both my apps (Java and C#) return the same signature but SOLR returns a different one.. Can anyone understand what I should be doing wrong? Thank you once again. Frederico -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: terça-feira, 5 de Abril de 2011 15:20 To: solr-user@lucene.apache.org Cc: Frederico Azeiteiro Subject: Re: Using MLT feature If you check the code for TextProfileSignature [1] your'll notice the init method reading params. You can set those params as you did. Reading Javadoc [2] might help as well. But what's not documented in the Javadoc is how QUANT is computed; it rounds. [1]: http://svn.apache.org/viewvc/lucene/solr/branches/branch-1.4/src/java/org/apache/solr/update/processor/TextProfileSignature.java?view=markup [2]: http://lucene.apache.org/solr/api/org/apache/solr/update/processor/TextProfileSignature.html On Tuesday 05 April 2011 16:10:08 Frederico Azeiteiro wrote: > Thank you, I'll try to create a c# method to create the same sig of SOLR, > and then compare both sigs before index the doc. This way I can avoid the > indexation of existing docs. > > If anyone needs to use this parameter (as this info is not on the wiki), > you can add the option > > 5 > > On the processor tag. > > Best regards, > Frederico > > > -Original Message- > From: Markus Jelsma [mailto:markus.jel...@openindex.io] > Sent: terça-feira, 5 de Abril de 2011 12:01 > To: solr-user@lucene.apache.org > Cc: Frederico Azeiteiro > Subject: Re: Using MLT feature > > On Tuesday 05 April 2011 12:19:33 Frederico Azeiteiro wrote: > > Sorry, the reply I made yesterday was directed to Markus and not the > > list... > > > > Here's my thoughts on this. At this point I'm a little confused if SOLR > > is a good option to find near duplicate docs. > > > > >> Yes there is, try set overwriteDupes to true and documents yielding > > > > the same signature will be overwritten > > > > The problem is that I don't want to overwrite the doc, I need to > > maintain the original version (because the doc has others fields I need > > to maintain). > > > > >>If you have need both fuzzy and exact matching then add a second > > > > update processor inside the chain and create another signature field. > > > > I just need the fuzzy search but the quick tests I made, return > > different signatures for what I consider duplicate docs. > > "Army deploys as clan war kills 11 in Philippine south" > > "Army deploys as clan war kills 11 in Philippine south." > > > > Same sig for the above 2 strings, that's ok. > > > > But a different sig was created for: > > "Army deploys as clan war kills 11 in Philippine south the." > > > > Is there a way to setup the TextProfileSignature parameters to adjust > > the "s
RE: Using MLT feature
Thank you, I'll try to create a c# method to create the same sig of SOLR, and then compare both sigs before index the doc. This way I can avoid the indexation of existing docs. If anyone needs to use this parameter (as this info is not on the wiki), you can add the option 5 On the processor tag. Best regards, Frederico -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: terça-feira, 5 de Abril de 2011 12:01 To: solr-user@lucene.apache.org Cc: Frederico Azeiteiro Subject: Re: Using MLT feature On Tuesday 05 April 2011 12:19:33 Frederico Azeiteiro wrote: > Sorry, the reply I made yesterday was directed to Markus and not the > list... > > Here's my thoughts on this. At this point I'm a little confused if SOLR > is a good option to find near duplicate docs. > > >> Yes there is, try set overwriteDupes to true and documents yielding > > the same signature will be overwritten > > The problem is that I don't want to overwrite the doc, I need to > maintain the original version (because the doc has others fields I need > to maintain). > > >>If you have need both fuzzy and exact matching then add a second > > update processor inside the chain and create another signature field. > > I just need the fuzzy search but the quick tests I made, return > different signatures for what I consider duplicate docs. > "Army deploys as clan war kills 11 in Philippine south" > "Army deploys as clan war kills 11 in Philippine south." > > Same sig for the above 2 strings, that's ok. > > But a different sig was created for: > "Army deploys as clan war kills 11 in Philippine south the." > > Is there a way to setup the TextProfileSignature parameters to adjust > the "sensibility" on SOLR (QUANT_RATE or MIN_TOKEN_LEN)? > > Do you think that these parameters can help creating the same sig for > the above example? You can only fix this by increasing minTokenLen to 4 to prevent `the` from being added to the list of tokens but this may affect other signatures. Possibly more documents will then get the same signature. Messing around with quantRate won't do much good because all your tokens have the same frequency (1) so quant will always be 1 in this short text. That's why TextProfileSignature works less well for short texts. http://nutch.apache.org/apidocs-1.2/org/apache/nutch/crawl/TextProfileSignature.html > > Is anyone using the TextProfileSignature with success? > > Thank you, > Frederico > > > -Original Message- > From: Markus Jelsma [mailto:markus.jel...@openindex.io] > Sent: segunda-feira, 4 de Abril de 2011 16:47 > To: solr-user@lucene.apache.org > Cc: Frederico Azeiteiro > Subject: Re: Using MLT feature > > > Hi again, > > I guess I was wrong on my early post... There's no automated way to > > avoid > > > the indexation of the duplicate doc. > > Yes there is, try set overwriteDupes to true and documents yielding the > same > signature will be overwritten. If you have need both fuzzy and exact > matching > then add a second update processor inside the chain and create another > signature field. > > > I guess I have 2 options: > > > > 1. Create a temp index with signatures and then have an app that for > > each > > > new doc verifies if sig exists on my primary index. If not, add the > > article. > > > > 2. Before adding the doc, create a signature (using the same algorithm > > that > > > SOLR uses) on my indexing app and then verify if signature exists > > before > > > adding. > > > > I'm way thinking the right way here? :) > > > > Thank you, > > Frederico > > > > > > > > -Original Message- > > From: Frederico Azeiteiro [mailto:frederico.azeite...@cision.com] > > Sent: segunda-feira, 4 de Abril de 2011 11:59 > > To: solr-user@lucene.apache.org > > Subject: RE: Using MLT feature > > > > Thank you Markus it looks great. > > > > But the wiki is not very detailed on this. > > Do you mean if I: > > > > 1. Create: > > > > > > > class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory" > > > true > > > > false > > signature > > headline,body,medianame > > > name="signatureClass">org.apache.solr.update.processor.Lookup3Signature< > /s > > > tr> > > > > > > > > > > > > > > 2. Add the request as the default update request >
RE: Using MLT feature
Sorry, the reply I made yesterday was directed to Markus and not the list... Here's my thoughts on this. At this point I'm a little confused if SOLR is a good option to find near duplicate docs. >> Yes there is, try set overwriteDupes to true and documents yielding the same signature will be overwritten The problem is that I don't want to overwrite the doc, I need to maintain the original version (because the doc has others fields I need to maintain). >>If you have need both fuzzy and exact matching then add a second update processor inside the chain and create another signature field. I just need the fuzzy search but the quick tests I made, return different signatures for what I consider duplicate docs. "Army deploys as clan war kills 11 in Philippine south" "Army deploys as clan war kills 11 in Philippine south." Same sig for the above 2 strings, that's ok. But a different sig was created for: "Army deploys as clan war kills 11 in Philippine south the." Is there a way to setup the TextProfileSignature parameters to adjust the "sensibility" on SOLR (QUANT_RATE or MIN_TOKEN_LEN)? Do you think that these parameters can help creating the same sig for the above example? Is anyone using the TextProfileSignature with success? Thank you, Frederico -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: segunda-feira, 4 de Abril de 2011 16:47 To: solr-user@lucene.apache.org Cc: Frederico Azeiteiro Subject: Re: Using MLT feature > Hi again, > I guess I was wrong on my early post... There's no automated way to avoid > the indexation of the duplicate doc. Yes there is, try set overwriteDupes to true and documents yielding the same signature will be overwritten. If you have need both fuzzy and exact matching then add a second update processor inside the chain and create another signature field. > > I guess I have 2 options: > > 1. Create a temp index with signatures and then have an app that for each > new doc verifies if sig exists on my primary index. If not, add the > article. > > 2. Before adding the doc, create a signature (using the same algorithm that > SOLR uses) on my indexing app and then verify if signature exists before > adding. > > I'm way thinking the right way here? :) > > Thank you, > Frederico > > > > -Original Message- > From: Frederico Azeiteiro [mailto:frederico.azeite...@cision.com] > Sent: segunda-feira, 4 de Abril de 2011 11:59 > To: solr-user@lucene.apache.org > Subject: RE: Using MLT feature > > Thank you Markus it looks great. > > But the wiki is not very detailed on this. > Do you mean if I: > > 1. Create: > > class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory" > > true > false > signature > headline,body,medianame > name="signatureClass">org.apache.solr.update.processor.Lookup3Signature< /s > tr> > > > > > 2. Add the request as the default update request > 3. Add a "signature" indexed field to my schema. > > Then, > When adding a new doc to my index, it is only added of not considered a > duplicate using a Lookup3Signature on the field defined? All duplicates > are ignored and not added to my index? > Is it so simple as that? > > Does it works even if the medianame should be an exact match (not similar > match as the headline and bodytext are)? > > Thank you for your help, > > > Frederico Azeiteiro > Developer > > > > -Original Message- > From: Markus Jelsma [mailto:markus.jel...@openindex.io] > Sent: segunda-feira, 4 de Abril de 2011 10:48 > To: solr-user@lucene.apache.org > Subject: Re: Using MLT feature > > http://wiki.apache.org/solr/Deduplication > > On Monday 04 April 2011 11:34:52 Frederico Azeiteiro wrote: > > Hi, > > > > The ideia is don't index if something similar (headline+bodytext) for > > the same exact medianame. > > > > Do you mean I would need to index the doc first (maybe in a temp index) > > and then use the MLT feature to find similar docs before adding to final > > index? > > > > Thanks, > > Frederico > > > > > > -Original Message- > > From: Chris Fauerbach [mailto:chris.fauerb...@gmail.com] > > Sent: segunda-feira, 4 de Abril de 2011 10:22 > > To: solr-user@lucene.apache.org > > Subject: Re: Using MLT feature > > > > Do you want to not index if something similar? Or don't index if exact. > > I would look into a hash code of the document if you don't want to index > &
RE: Using MLT feature
Hi again, I guess I was wrong on my early post... There's no automated way to avoid the indexation of the duplicate doc. I guess I have 2 options: 1. Create a temp index with signatures and then have an app that for each new doc verifies if sig exists on my primary index. If not, add the article. 2. Before adding the doc, create a signature (using the same algorithm that SOLR uses) on my indexing app and then verify if signature exists before adding. I'm way thinking the right way here? :) Thank you, Frederico -Original Message----- From: Frederico Azeiteiro [mailto:frederico.azeite...@cision.com] Sent: segunda-feira, 4 de Abril de 2011 11:59 To: solr-user@lucene.apache.org Subject: RE: Using MLT feature Thank you Markus it looks great. But the wiki is not very detailed on this. Do you mean if I: 1. Create: true false signature headline,body,medianame org.apache.solr.update.processor.Lookup3Signature 2. Add the request as the default update request 3. Add a "signature" indexed field to my schema. Then, When adding a new doc to my index, it is only added of not considered a duplicate using a Lookup3Signature on the field defined? All duplicates are ignored and not added to my index? Is it so simple as that? Does it works even if the medianame should be an exact match (not similar match as the headline and bodytext are)? Thank you for your help, ____ Frederico Azeiteiro Developer -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: segunda-feira, 4 de Abril de 2011 10:48 To: solr-user@lucene.apache.org Subject: Re: Using MLT feature http://wiki.apache.org/solr/Deduplication On Monday 04 April 2011 11:34:52 Frederico Azeiteiro wrote: > Hi, > > The ideia is don't index if something similar (headline+bodytext) for > the same exact medianame. > > Do you mean I would need to index the doc first (maybe in a temp index) > and then use the MLT feature to find similar docs before adding to final > index? > > Thanks, > Frederico > > > -Original Message- > From: Chris Fauerbach [mailto:chris.fauerb...@gmail.com] > Sent: segunda-feira, 4 de Abril de 2011 10:22 > To: solr-user@lucene.apache.org > Subject: Re: Using MLT feature > > Do you want to not index if something similar? Or don't index if exact. > I would look into a hash code of the document if you don't want to index > exact.Similar though, I think has to be based off a document in the > index. > > On Apr 4, 2011, at 5:16, Frederico Azeiteiro > > wrote: > > Hi, > > > > > > > > I would like to hear your opinion about the MLT feature and if it's a > > good solution to what I need to implement. > > > > > > > > My index has fields like: headline, body and medianame. > > > > What I need to do is, before adding a new doc, verify if a similar doc > > exists for this media. > > > > > > > > My idea is to use the MorelikeThisHandler > > (http://wiki.apache.org/solr/MoreLikeThisHandler) in the following > > way: > > For each new doc, perform a MLT search with q= medianame and > > stream.body=headline+bodytext. > > > > If no similar docs are found than I can safely add the doc. > > > > > > > > Is this feasible using the MLT handler? Is it a good approach? Are > > there > > > a better way to perform this comparison? > > > > > > > > Thank you for your help. > > > > > > > > Best regards, > > > > > > > > Frederico Azeiteiro -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
RE: Using MLT feature
Thank you Markus it looks great. But the wiki is not very detailed on this. Do you mean if I: 1. Create: true false signature headline,body,medianame org.apache.solr.update.processor.Lookup3Signature 2. Add the request as the default update request 3. Add a "signature" indexed field to my schema. Then, When adding a new doc to my index, it is only added of not considered a duplicate using a Lookup3Signature on the field defined? All duplicates are ignored and not added to my index? Is it so simple as that? Does it works even if the medianame should be an exact match (not similar match as the headline and bodytext are)? Thank you for your help, ____ Frederico Azeiteiro Developer -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: segunda-feira, 4 de Abril de 2011 10:48 To: solr-user@lucene.apache.org Subject: Re: Using MLT feature http://wiki.apache.org/solr/Deduplication On Monday 04 April 2011 11:34:52 Frederico Azeiteiro wrote: > Hi, > > The ideia is don't index if something similar (headline+bodytext) for > the same exact medianame. > > Do you mean I would need to index the doc first (maybe in a temp index) > and then use the MLT feature to find similar docs before adding to final > index? > > Thanks, > Frederico > > > -Original Message- > From: Chris Fauerbach [mailto:chris.fauerb...@gmail.com] > Sent: segunda-feira, 4 de Abril de 2011 10:22 > To: solr-user@lucene.apache.org > Subject: Re: Using MLT feature > > Do you want to not index if something similar? Or don't index if exact. > I would look into a hash code of the document if you don't want to index > exact.Similar though, I think has to be based off a document in the > index. > > On Apr 4, 2011, at 5:16, Frederico Azeiteiro > > wrote: > > Hi, > > > > > > > > I would like to hear your opinion about the MLT feature and if it's a > > good solution to what I need to implement. > > > > > > > > My index has fields like: headline, body and medianame. > > > > What I need to do is, before adding a new doc, verify if a similar doc > > exists for this media. > > > > > > > > My idea is to use the MorelikeThisHandler > > (http://wiki.apache.org/solr/MoreLikeThisHandler) in the following > > way: > > For each new doc, perform a MLT search with q= medianame and > > stream.body=headline+bodytext. > > > > If no similar docs are found than I can safely add the doc. > > > > > > > > Is this feasible using the MLT handler? Is it a good approach? Are > > there > > > a better way to perform this comparison? > > > > > > > > Thank you for your help. > > > > > > > > Best regards, > > > > > > > > Frederico Azeiteiro -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
RE: Using MLT feature
Hi, The ideia is don't index if something similar (headline+bodytext) for the same exact medianame. Do you mean I would need to index the doc first (maybe in a temp index) and then use the MLT feature to find similar docs before adding to final index? Thanks, Frederico -Original Message- From: Chris Fauerbach [mailto:chris.fauerb...@gmail.com] Sent: segunda-feira, 4 de Abril de 2011 10:22 To: solr-user@lucene.apache.org Subject: Re: Using MLT feature Do you want to not index if something similar? Or don't index if exact. I would look into a hash code of the document if you don't want to index exact.Similar though, I think has to be based off a document in the index. On Apr 4, 2011, at 5:16, Frederico Azeiteiro wrote: > Hi, > > > > I would like to hear your opinion about the MLT feature and if it's a > good solution to what I need to implement. > > > > My index has fields like: headline, body and medianame. > > What I need to do is, before adding a new doc, verify if a similar doc > exists for this media. > > > > My idea is to use the MorelikeThisHandler > (http://wiki.apache.org/solr/MoreLikeThisHandler) in the following way: > > > > For each new doc, perform a MLT search with q= medianame and > stream.body=headline+bodytext. > > If no similar docs are found than I can safely add the doc. > > > > Is this feasible using the MLT handler? Is it a good approach? Are there > a better way to perform this comparison? > > > > Thank you for your help. > > > > Best regards, > > > > Frederico Azeiteiro > > >
Using MLT feature
Hi, I would like to hear your opinion about the MLT feature and if it's a good solution to what I need to implement. My index has fields like: headline, body and medianame. What I need to do is, before adding a new doc, verify if a similar doc exists for this media. My idea is to use the MorelikeThisHandler (http://wiki.apache.org/solr/MoreLikeThisHandler) in the following way: For each new doc, perform a MLT search with q= medianame and stream.body=headline+bodytext. If no similar docs are found than I can safely add the doc. Is this feasible using the MLT handler? Is it a good approach? Are there a better way to perform this comparison? Thank you for your help. Best regards, ____ Frederico Azeiteiro
Strange query behaviour using splitOnCaseChange=1
Hi all, I had indexed a text with the word "InterContinental" with fieldType text (with the default filters just removing the solr.SnowballPorterFilterFactory). As far as I understand, using the filter solr.WordDelimiterFilterFactory with splitOnCaseChange="1", this word is indexed as: term text inter continental intercontinental When I search for "continental" the article is returned. When searching for "intercontinental" the article is returned When searching for "Inter Continental" the article is returned When searching for "Inter AND Continental" the article is returned When searching for "InterContinental" the article is NOT returned Can anyone explains why the last search didn't return the article? Thank you, Frederico Azeiteiro
RE: wildcard and proximity searches
Hi Mark, unfortanelly it's still on my ToDo list... :(. I don't know if it allows "solr mail*"~10 . I hope so, as i'll need that also on the future. Frederico De: Mark N [mailto:nipen.m...@gmail.com] Enviada: ter 05-10-2010 11:29 Para: solr-user@lucene.apache.org Assunto: Re: wildcard and proximity searches Hi were you successful in trying SOLR -1604 to allow wild card queries in phrases ? Also does this plugin allow us to use proximity with wild card * "solr mail*"~10 * If this the right approach to go ahead to support these functionalities? thanks Mark On Wed, Aug 4, 2010 at 2:24 PM, Frederico Azeiteiro < frederico.azeite...@cision.com> wrote: > Thanks for you ideia. > > At this point I'm logging each query time. My ideia is to divide my > queries into "normal queries" and "heavy queries". I have some heavy > queries with 1 minute or 2mintes to get results. But they have for > instance (*word1* AND *word2* AND word3*). I guess that this will be > always slower (could be a little faster with > "ReversedWildcardFilterFactory") but they never be ready in a few > seconds. For now, I just increased the timeout for those :) (using > solrnet). > > My priority at the moment is the queries phrases like "word1* word2* > word3". After this is working, I'll try to optimize the "heavy queries" > > Frederico > > > -Original Message- > From: Jonathan Rochkind [mailto:rochk...@jhu.edu] > Sent: quarta-feira, 4 de Agosto de 2010 01:41 > To: solr-user@lucene.apache.org > Subject: Re: wildcard and proximity searches > > Frederico Azeiteiro wrote: > > > >>> But it is unusual to use both leading and trailing * operator. Why > are > >>> > > you doing this? > > > > Yes I know, but I have a few queries that need this. I'll try the > > "ReversedWildcardFilterFactory". > > > > > > > > ReverseWildcardFilter will help leading wildcard, but will not help > trying to use a query with BOTH leading and trailing wildcard. it'll > still be slow. Solr/lucene isn't good at that; I didn't even know Solr > would do it at all in fact. > > If you really needed to do that, the way to play to solr/lucene's way of > > doing things, would be to have a field where you actually index each > _character_ as a seperate token. Then leading and trailing wildcard > search is basically reduced to a "phrase search", but where the words > are actually characters. But then you're going to get an index where > pretty much every token belongs to every document, which Solr isn't that > > great at either, but then you can apply "commongram" stuff on top to > help that out a lot too. Not quite sure what the end result will be, > I've never tried it. I'd only use that weird special "char as token" > field for queries that actually required leading and trailing wildcards. > > Figuring out how to set up your analyzers, and what (if anything) you're > > going to have to do client-app-side to transform the user's query into > something that'll end up searching like a "phrase search where each > 'word' is a character is left as an exersize for the reader. :) > > Jonathan > -- Nipen Mark
RE: timestamp field
Hi Jan, Dah, I didn't know that :( I always thought it used the servertime. Anyway,just out of curiosity, the hour is UTC but NOT the time in London right now. London is UTC+1 (same as here in Portugal) :). So, London solr users should have the same "problem". Well, I must be careful when using this field. Thanks for your answer, Frederico -Original Message- From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] Sent: quarta-feira, 11 de Agosto de 2010 12:17 To: solr-user@lucene.apache.org Subject: Re: timestamp field Hi, Which time zone are you located in? Do you have DST? Solr uses UTC internally for dates, which means that "NOW" will be the time in London right now :) Does that appear to be right 4 u? Also see this thread: http://search-lucene.com/m/hqBed2jhu2e2/ -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 11. aug. 2010, at 13.02, Frederico Azeiteiro wrote: > Hi, > > > > I have on my schema > > default="NOW" /> > > > > This field is returned as > > 2010-08-11T10:11:03.354Z > > > > For an article added at 2010-08-11T11:11:03.354Z! > > > > And the server has the time of 2010-08-11T11:11:03.354Z... > > > > This is a w2003 server using solr 1.4. > > > > Any guess of what could be wrong here? > > > > Thanks, > > Frederico > > > > >
timestamp field
Hi, I have on my schema This field is returned as 2010-08-11T10:11:03.354Z For an article added at 2010-08-11T11:11:03.354Z! And the server has the time of 2010-08-11T11:11:03.354Z... This is a w2003 server using solr 1.4. Any guess of what could be wrong here? Thanks, Frederico
RE: wildcard and proximity searches
Thanks for you ideia. At this point I'm logging each query time. My ideia is to divide my queries into "normal queries" and "heavy queries". I have some heavy queries with 1 minute or 2mintes to get results. But they have for instance (*word1* AND *word2* AND word3*). I guess that this will be always slower (could be a little faster with "ReversedWildcardFilterFactory") but they never be ready in a few seconds. For now, I just increased the timeout for those :) (using solrnet). My priority at the moment is the queries phrases like "word1* word2* word3". After this is working, I'll try to optimize the "heavy queries" Frederico -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: quarta-feira, 4 de Agosto de 2010 01:41 To: solr-user@lucene.apache.org Subject: Re: wildcard and proximity searches Frederico Azeiteiro wrote: > >>> But it is unusual to use both leading and trailing * operator. Why are >>> > you doing this? > > Yes I know, but I have a few queries that need this. I'll try the > "ReversedWildcardFilterFactory". > > > ReverseWildcardFilter will help leading wildcard, but will not help trying to use a query with BOTH leading and trailing wildcard. it'll still be slow. Solr/lucene isn't good at that; I didn't even know Solr would do it at all in fact. If you really needed to do that, the way to play to solr/lucene's way of doing things, would be to have a field where you actually index each _character_ as a seperate token. Then leading and trailing wildcard search is basically reduced to a "phrase search", but where the words are actually characters. But then you're going to get an index where pretty much every token belongs to every document, which Solr isn't that great at either, but then you can apply "commongram" stuff on top to help that out a lot too. Not quite sure what the end result will be, I've never tried it. I'd only use that weird special "char as token" field for queries that actually required leading and trailing wildcards. Figuring out how to set up your analyzers, and what (if anything) you're going to have to do client-app-side to transform the user's query into something that'll end up searching like a "phrase search where each 'word' is a character is left as an exersize for the reader. :) Jonathan
RE: wildcard and proximity searches
Hi Ahmet, > a) I think wildcard search is by default "case sensitive"? > Is there a > way to make case insensitive? >>Wildcard searches are not analyzed. To case insensitive search you can lowercase query terms >>at client side. (with using lowercasefilter at index time) e.g. Mail* => mail* > > I discovered that the normal query type doesn't work with wildcards > and so I'm using the "Filter Query" to query these. >>I don't understand this. Wildcard search works with q parameter if you are asking that. >>&q=mail* For the 2 points above, my bad. I'm already using the "lowercasefilter" but I was not lowering the query with wildcards (the others are lowered by the analyser). So it's working fine now! On my tests yesterday probably I was testing &q=Mail* and &fq=mail* (and didn't notice the difference) and read somewhere that it wasn't possible (probably on older solr version) so I get the wrong conclusion that it wasn't working. >>But it is unusual to use both leading and trailing * operator. Why are you doing this? Yes I know, but I have a few queries that need this. I'll try the "ReversedWildcardFilterFactory". >>By default it is not supported. With SOLR-1604 is it possible. Ok then. I guess "SOLR-1604" is the answer for most of my problems. I'm going to give it a try and then I'll share some feedback. Thanks for your help and sorry for my newbie confusions. :) Frederico -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: sexta-feira, 30 de Julho de 2010 12:09 To: solr-user@lucene.apache.org Subject: RE: wildcard and proximity searches > a) I think wildcard search is by default "case sensitive"? > Is there a > way to make case insensitive? Wildcard searches are not analyzed. To case insensitive search you can lowercase query terms at client side. (with using lowercasefilter at index time) e.g. Mail* => mail* > I discovered that the normal query type doesn't work with wildcards > and so I'm using the "Filter Query" to query these. I don't understand this. Wildcard search works with q parameter if you are asking that. &q=mail* > field my > queries are much slower (I have some queries like *word* or > *word1* or > *word2* that take about one minute to perform) > Is there a way to optimize these queries (without removing > the wildcards > :))? It is normal for leading wildcard search to be slow. Using ReversedWildcardFilterFactory at index time can speedup it. But it is unusual to use both leading and trailing * operator. Why are you doing this? > c)Is there a way to do phrase queries with wildcards? Like > "This solr* > mail*"? Because the tests I made, when using quotes I think > the wildcards are ignored. By default it is not supported. With SOLR-1604 is it possible. > d)How exactly works the pf (phrase fields) and ps (phrase > slop) > parameters and what's the difference for the proximity > searches (ex: > "word word2"~20)? These parameters are specific to dismax query parser. http://wiki.apache.org/solr/DisMaxQParserPlugin
RE: wildcard and proximity searches
Hi Ahmet, Thank you. I'll be happy to test it if I manage to install it ok.. I'm a newbie at solr but I'm going to try the instructions in the thread to load it. Another doubts I have about wildcard searches: a) I think wildcard search is by default "case sensitive"? Is there a way to make case insensitive? b) I have about 6000 queries to run (could have widlcards, proximity searches or just normal queries). I discovered that the normal query type doesn't work with wildcards and so I'm using the "Filter Query" to query these. Is this field slower? I notice that using this field my queries are much slower (I have some queries like *word* or *word1* or *word2* that take about one minute to perform) Is there a way to optimize these queries (without removing the wildcards :))? c)Is there a way to do phrase queries with wildcards? Like "This solr* mail*"? Because the tests I made, when using quotes I think the wildcards are ignored. d)How exactly works the pf (phrase fields) and ps (phrase slop) parameters and what's the difference for the proximity searches (ex: "word word2"~20)? Sorry for the long email and thank you for your help... Frederico -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: sexta-feira, 30 de Julho de 2010 10:57 To: solr-user@lucene.apache.org Subject: Re: wildcard and proximity searches > What approach shoud I use to perform wildcard and proximity > searches? > > > > Like: "solr mail*"~10 > > > > For getting docs where solr is within 10 words of "mailing" > for > instance? You can do it with the plug-in described here: https://issues.apache.org/jira/browse/SOLR-1604 It would be great if you test it and give feedback.
wildcard and proximity searches
Hi, What approach shoud I use to perform wildcard and proximity searches? Like: "solr mail*"~10 For getting docs where solr is within 10 words of "mailing" for instance? Thanks, Frederico
java.lang.NullPointerException
Hi again, I change the search options to decrease my query size and now I get passed the URI too long from the other thread. I already added : 819200 819200 On Jetty config but now I'm stucked again on: 13/Jul/2010 9:41:38 org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at java.io.StringReader.(Unknown Source) My querystring has now about 10 000 chars. Could this be memory issues? Thank You, Frederico
RE: Query: URl too long
Ok, I posted on SOLRNet forum asking how can I reduce the URL string using POST method. But I'm giving a try to SOLRJ. Think should be the right way to do it maybe. -Original Message- From: Mauricio Scheffer [mailto:mauricioschef...@gmail.com] Sent: segunda-feira, 12 de Julho de 2010 14:31 To: solr-user@lucene.apache.org Subject: Re: Query: URl too long Frederico, This is indeed a SolrNet issue. You can switch to POST in queries by implementing a ISolrConnection decorator. In the Get() method you'd build a POST request instead of the standard GET. Please use the SolrNet forum for further questions about SolrNet. Cheers, Mauricio On Mon, Jul 12, 2010 at 9:33 AM, kenf_nc wrote: > > Frederico, > You should also pose your question on the SolrNet forum, > http://groups.google.com/group/solrnet?hl=en > Switching from GET to POST isn't a Solr issue, but a SolrNet issue. > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Query-URl-too-long-tp959990p960208.ht ml > Sent from the Solr - User mailing list archive at Nabble.com. >
RE: Query: URl too long
Not an option because the query has other fields to query also. They are generated throw a list choices (that could go to 5000's string with 7 char each..). I don't know is this could be considered off-topic (please advise...) but: i'm doing some test with lucene (Lucene.Net 2.9.2) but the results with date range queries are not similar (0 hits on Lucene, 900 with Solr). Does lucene supports date range queries? Thank you for your help. -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: segunda-feira, 12 de Julho de 2010 13:16 To: solr-user@lucene.apache.org Subject: RE: Query: URl too long > Yes, i guess i can't create an URI > object that long. > > Can someone remember other options? You can shorten your String by not repeating OR and fieldName. e.g. "fieldName: value1 OR fieldName: value2 OR fieldName: value3..." q=value1 value2 value3&q.op=OR&df=fieldName By the way how are you generating these value1 value2 etc? If the above does not solve your problem you can embed this logic into a custom SearchHandler.
RE: Query: URl too long
Yes, i guess i can't create an URI object that long. Can someone remember other options? I'm thinking about options avoiding the http request... My best try is using lucene again but keep the solr for indexing. Do you thing this is a good aproach? -Original Message----- From: Frederico Azeiteiro [mailto:frederico.azeite...@cision.com] Sent: segunda-feira, 12 de Julho de 2010 12:10 To: solr-user@lucene.apache.org Subject: RE: Query: URl too long Hi, A closer look shows that the problem is not on the request but on the creation of the URI object. The exception is sent when trying to access the URI object inside the URIbuilder. Trying to google it but without luck... -Original Message- From: Jon Poulton [mailto:jon.poul...@vyre.com] Sent: segunda-feira, 12 de Julho de 2010 11:56 To: solr-user@lucene.apache.org Subject: Re: Query: URl too long Hi there, We had a similar issue. It's an easy fix, simply change the request type from GET to POST. Jon On 12 Jul 2010, at 11:18, Frederico Azeiteiro wrote: > Hi, > > > > I need to perform a search using a list of values (about 2000). > > > > I'm using SolrNET QueryInList function that creates the searchstring > like: > > > > "fieldName: value1 OR fieldName: value2 OR fieldName: value3..." (2000 > values) > > > > This method created a string with about 100 000 chars and the Web > Request fails with URI too long (C#). > > > > I'm trying to update an old Lucene app that performs this kind of > searches. > > How can I achieve this with Solr? > > > > What are my options here? > > > > Thank you, > > Frederico >
RE: Query: URl too long
Hi, A closer look shows that the problem is not on the request but on the creation of the URI object. The exception is sent when trying to access the URI object inside the URIbuilder. Trying to google it but without luck... -Original Message- From: Jon Poulton [mailto:jon.poul...@vyre.com] Sent: segunda-feira, 12 de Julho de 2010 11:56 To: solr-user@lucene.apache.org Subject: Re: Query: URl too long Hi there, We had a similar issue. It's an easy fix, simply change the request type from GET to POST. Jon On 12 Jul 2010, at 11:18, Frederico Azeiteiro wrote: > Hi, > > > > I need to perform a search using a list of values (about 2000). > > > > I'm using SolrNET QueryInList function that creates the searchstring > like: > > > > "fieldName: value1 OR fieldName: value2 OR fieldName: value3..." (2000 > values) > > > > This method created a string with about 100 000 chars and the Web > Request fails with URI too long (C#). > > > > I'm trying to update an old Lucene app that performs this kind of > searches. > > How can I achieve this with Solr? > > > > What are my options here? > > > > Thank you, > > Frederico >
Query: URl too long
Hi, I need to perform a search using a list of values (about 2000). I'm using SolrNET QueryInList function that creates the searchstring like: "fieldName: value1 OR fieldName: value2 OR fieldName: value3..." (2000 values) This method created a string with about 100 000 chars and the Web Request fails with URI too long (C#). I'm trying to update an old Lucene app that performs this kind of searches. How can I achieve this with Solr? What are my options here? Thank you, Frederico
RE: steps to improve search
Thanks Leonardo, I didn't know that tool, very good! So I see what is wrong: SnowballPorterFilterFactory and StopFilterFactory. (both used on index and query) I tried remove the snowball and change the stopfilter to "ignorecase=false" on QUERY and restarted solr. But now I get no results :(. On index analysis I get (result of filters): paying for it paying paying paying pay For Query analysis (result of filters): paying for it paying for it paying paying paying This means that at the end, the word indexed is "pay" and the searched is "paying"? It's necessary to reindex the data? Thanks -Original Message- From: Leonardo Menezes [mailto:leonardo.menez...@googlemail.com] Sent: sexta-feira, 2 de Julho de 2010 12:58 To: solr-user@lucene.apache.org Subject: Re: steps to improve search most likely due to: EnglishPorterFilterFactory RemoveDuplicatesTokenFilterFactory StopFilterFactory you get those "fake" matches. try going into the admin, on the analysis section. in there you can "simulate" the index/search of a document, and see how its actually searched/indexed. it will give you some clues... On Fri, Jul 2, 2010 at 1:50 PM, Frederico Azeiteiro < frederico.azeite...@cision.com> wrote: > For the example given, I need the full expression "paying for it", so > yes all the words. > -Original Message- > From: Ahmet Arslan [mailto:iori...@yahoo.com] > Sent: sexta-feira, 2 de Julho de 2010 12:30 > To: solr-user@lucene.apache.org > Subject: RE: steps to improve search > > > I need to know how to achieve more accurates queries (like > > the example below...) using these filters. > > do you want that all terms - you search - must appear in returned > documents? > > You can change default operator of QueryParser to AND. either in > schema.xml or appending &q.op=AND you your search url. I am assuming you > are not using dismax. > > > >
RE: steps to improve search
For the example given, I need the full expression "paying for it", so yes all the words. -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: sexta-feira, 2 de Julho de 2010 12:30 To: solr-user@lucene.apache.org Subject: RE: steps to improve search > I need to know how to achieve more accurates queries (like > the example below...) using these filters. do you want that all terms - you search - must appear in returned documents? You can change default operator of QueryParser to AND. either in schema.xml or appending &q.op=AND you your search url. I am assuming you are not using dismax.
RE: steps to improve search
I'm using " surrounding the text. My Query: Headline:("paying for it") on solr admin interface Some results: ...l stop paying tax until council pays for dam... "Why paying extra doesn't always pay!" "...pay cut as M&S investor pressure pays off" "Can't pay or won't pay: the debt collector call" What could be wrong here? Thanks. -Original Message- From: Leonardo Menezes [mailto:leonardo.menez...@googlemail.com] Sent: sexta-feira, 2 de Julho de 2010 12:30 To: solr-user@lucene.apache.org Subject: Re: steps to improve search No, you explained alright, but then didnt understand the answer. Searching with the " surrounding the text you are searching for, has exactly the effect you are looking for. try it... On Fri, Jul 2, 2010 at 1:23 PM, Frederico Azeiteiro < frederico.azeite...@cision.com> wrote: > I'm sorry, maybe I didn’t explain correctly. > > The issue is using the default text FIELD TYPE, not the default text FIELD. > The "text" field type uses a lot of filters on indexing. > I need to know how to achieve more accurates queries (like the example > below...) using these filters. > > > -Original Message- > From: Leonardo Menezes [mailto:leonardo.menez...@googlemail.com] > Sent: sexta-feira, 2 de Julho de 2010 12:07 > To: solr-user@lucene.apache.org > Subject: Re: steps to improve search > > Try > field:"text to search" > > On Fri, Jul 2, 2010 at 12:57 PM, Frederico Azeiteiro < > frederico.azeite...@cision.com> wrote: > > > Hi, > > > > I'm using the default text field type on my schema. > > > > > > > > Is there a quick way to do more accurate searches like searching for > > "paying for it" only return docs with the full expression "paying for > > it", and not return articles with word "pay" as it does now? > > > > > > > > Thanks, > > > > Frederico > > > > >
RE: steps to improve search
I'm sorry, maybe I didn’t explain correctly. The issue is using the default text FIELD TYPE, not the default text FIELD. The "text" field type uses a lot of filters on indexing. I need to know how to achieve more accurates queries (like the example below...) using these filters. -Original Message- From: Leonardo Menezes [mailto:leonardo.menez...@googlemail.com] Sent: sexta-feira, 2 de Julho de 2010 12:07 To: solr-user@lucene.apache.org Subject: Re: steps to improve search Try field:"text to search" On Fri, Jul 2, 2010 at 12:57 PM, Frederico Azeiteiro < frederico.azeite...@cision.com> wrote: > Hi, > > I'm using the default text field type on my schema. > > > > Is there a quick way to do more accurate searches like searching for > "paying for it" only return docs with the full expression "paying for > it", and not return articles with word "pay" as it does now? > > > > Thanks, > > Frederico > >
steps to improve search
Hi, I'm using the default text field type on my schema. Is there a quick way to do more accurate searches like searching for "paying for it" only return docs with the full expression "paying for it", and not return articles with word "pay" as it does now? Thanks, Frederico
RE: Where to check optimize status
Thank you but I didn't find anything like "Merge thread" and I continued to have the lock file. The segments were not merged so I stopped the SOLR and restart. The lock disappear but I guess the optimization didn’t complete. I'll try again tomorrow -Original Message- From: Alexander Rothenberg [mailto:a.rothenb...@fotofinder.net] Sent: terça-feira, 29 de Junho de 2010 12:20 To: solr-user@lucene.apache.org Subject: Re: Where to check optimize status To determine if the optimize is still in progress, you can look at the admin-frontend on the page "THREAD DUMP" for something like "Lucene Merge Thread". If its there, then optimize is still running. Also, index-filesize and filenames in your index-dir are changing a lot... On Tuesday 29 June 2010 12:54:54 Frederico Azeiteiro wrote: > Hi, > I'm using solr1.4.0 default installation. > Is there a place where I can find the optimization status. > I sent a optimize http request and it should had finish by now, but I > still see the lock file on index folder. > Can I see somewhere if the optimization is still running? > Thanks, > Frederico Azeiteiro -- Alexander Rothenberg Fotofinder GmbH USt-IdNr. DE812854514 Software EntwicklungWeb: http://www.fotofinder.net/ Potsdamer Str. 96 Tel: +49 30 25792890 10785 BerlinFax: +49 30 257928999 Geschäftsführer:Ali Paczensky Amtsgericht:Berlin Charlottenburg (HRB 73099) Sitz: Berlin
Where to check optimize status
Hi, I'm using solr1.4.0 default installation. Is there a place where I can find the optimization status. I sent a optimize http request and it should had finish by now, but I still see the lock file on index folder. Can I see somewhere if the optimization is still running? Thanks, Frederico Azeiteiro
RE: run on reboot on windows
Hi Ahmed, I need to achieve that also. Do you manage to install it as service and start solr with Jetty? After installing and start jetty as service how do you start solr? Thanks, Frederico -Original Message- From: S Ahmed [mailto:sahmed1...@gmail.com] Sent: segunda-feira, 3 de Maio de 2010 01:05 To: solr-user@lucene.apache.org Subject: Re: run on reboot on windows Thanks, for some reason I was looking for a solution outside of jetty/tomcat, when that was the obvious way to get things restarted :) On Sun, May 2, 2010 at 7:53 PM, Dave Searle wrote: > Tomcat is installed as a service on windows. Just go into service > control panel and set startup type to automatic > > Sent from my iPhone > > On 3 May 2010, at 00:43, "S Ahmed" wrote: > > > its not tomcat/jetty that's the issue, its how to get things to re- > > start on > > a windows server (tomcat and jetty don't run as native windows > > services) so > > I am a little confused..thanks. > > > > On Sun, May 2, 2010 at 7:37 PM, caman > > wrote: > > > >> > >> Ahmed, > >> > >> > >> > >> Best is if you take a look at the documentation of jetty or tomcat. > >> SOLR > >> can > >> run on any web container, it's up to you how you configure your web > >> container to run > >> > >> > >> > >> Thanks > >> > >> Aboxy > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> From: S Ahmed [via Lucene] > >> [mailto:ml-node+772174-2097041460-124...@n3.nabble.com > >> %2b772174-2097041460-124...@n3.nabble.com> > >> ] > >> Sent: Sunday, May 02, 2010 4:33 PM > >> To: caman > >> Subject: Re: run on reboot on windows > >> > >> > >> > >> By default it uses Jetty, so your saying Tomcat on windows server > >> 2008/ > >> IIS7 > >> > >> runs as a native windows service? > >> > >> On Sun, May 2, 2010 at 12:46 AM, Dave Searle <[hidden email]>wrote: > >> > >> > >>> Set tomcat6 service to auto start on boot (if running tomat) > >>> > >>> Sent from my iPhone > >>> > >>> On 2 May 2010, at 02:31, "S Ahmed" <[hidden email]> wrote: > >>> > Hi, > > I'm trying to get Solr to run on windows, such that if it reboots > the Solr > service will be running. > > How can I do this? > >>> > >> > >> > >> > >> _ > >> > >> View message @ > >> > >> > http://lucene.472066.n3.nabble.com/run-on-reboot-on-windows-tp770892p772 174 > >> . > >> html > >> To start a new topic under Solr - User, email > >> ml-node+472068-464289649-124...@n3.nabble.com > >> %2b472068-464289649-124...@n3.nabble.com> > >> To unsubscribe from Solr - User, click > >> < (link removed) > >> GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx> here. > >> > >> > >> > >> > >> -- > >> View this message in context: > >> > http://lucene.472066.n3.nabble.com/run-on-reboot-on-windows-tp770892p772 178.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> >
search within sentence or paragraph
Hi all, Is it possible search for a combination of words within the same sentence or paragraph? Ex: American and McDonalds Returns : "McDonalds is a American company" Don't returns: "...went to McDonalds. After that se saw the American flag..." Is this possible? Frederico Azeiteiro
RE: Cant commit on 125 GB index
Yes, the http request is timing out even when using values of 10m. Normally the commit takes about 10s. I did an optimize (it took 6h) and it looks good for now... 59m? well i didn't wait that long, i restarted the solr instance and tried again. I'll try to use autocommit on a near future. Using autocommit how can i check how many commits are happening at the moment, when they started to? Is there a way to control and konw what is happening behind the scenes in "real time"? I'm using solr 1.4 with jetty. De: Lance Norskog [mailto:goks...@gmail.com] Enviada: sáb 13-03-2010 23:31 Para: solr-user@lucene.apache.org Assunto: Re: Cant commit on 125 GB index What is timing out? The external HTTP request? Commit times are a sawtooth and slowly increase. My record is 59 minutes, but I was doing benchmarking. On Thu, Mar 11, 2010 at 1:46 AM, Frederico Azeiteiro wrote: > Hi, > > I'm having timeouts commiting on a 125 GB index with about 2200 > docs. > > > > I'm inserting new docs every 5m and commiting after that. > > > > I would like to try the autocommit option and see if I can get better > results. I need the docs indexed available for searching in about 10 > minutes after the insert. > > > > I was thinking of using something like > > > > > > 5000 > > 86000 > > > > > > I update about 4000 docs every 15m. > > > > Can you share your thoughts on this config? > > Do you think this will solve my commits timeout problem? > > > > Thanks, > > Frederico > > -- Lance Norskog goks...@gmail.com
Cant commit on 125 GB index
Hi, I'm having timeouts commiting on a 125 GB index with about 2200 docs. I'm inserting new docs every 5m and commiting after that. I would like to try the autocommit option and see if I can get better results. I need the docs indexed available for searching in about 10 minutes after the insert. I was thinking of using something like 5000 86000 I update about 4000 docs every 15m. Can you share your thoughts on this config? Do you think this will solve my commits timeout problem? Thanks, Frederico
RE: search and count ocurrences
Thanks Chris. Could something like that be implemented in c# ? :) Does anyone has any link where I can start digging? This is not an urgent matter, just something to investigate and implement in a near future. Frederico -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: terça-feira, 9 de Março de 2010 23:39 To: solr-user@lucene.apache.org Subject: Re: search and count ocurrences : I need to implement a search where i should count the number of times : the string appears on the search field, : : ie: only return articles that mention the word 'HP' at least 2x. ... : Is there a way that SOLR does this type of operation for me? you'd have to implement it in a custom QParser -- if all you are worried about is simple TermQuery style matches, then this should be fairly trivial using SpanNearQuery. -Hoss
search and count ocurrences
Hi, I need to implement a search where i should count the number of times the string appears on the search field, ie: only return articles that mention the word 'HP' at least 2x. I'm currently doing this after the SOLR search with my own methods. Is there a way that SOLR does this type of operation for me? Thanks, Frederico
fieldType "text"
Hi, I'm using the default "text" field type that comes with the example. When searching for simple words as 'HP' or 'TCS' solr is returning results that contains 'HP1' or 'T&CS' Is there a solution for to avoid this? Thanks, Frederico
RE: Solrsharp
Hi Saschin, Yes i had to make some patches too (range queries didn't work very good...) and yes, i thought about changing the parameterjoin. I'm already using solrsharp for indexing without problems but i guess for searches, I'm gonna give a try to solrnet mostly because the lack of support/feedback with solrsharp... Does anyone around here uses c# with solr? What client do you use? Another ideia is perform the search directly on SOLR index using lucene API. I previsouly used c# luceneAPI for a long time without any problems. Maybe it's the quicker solution... Thanks for your feedback. De: Sachin [mailto:sachinni...@aim.com] Enviada: sáb 27-02-2010 12:04 Para: solr-user@lucene.apache.org Assunto: Re: Solrsharp solr# does not have built in support for "NOT" searches, you would have to tweak the solr# library to do that (take a look at how the ParameterJoin is used, add one for Not). I have faced quite a few issues with using solr# in the past like unclosed TCP connections, no spellchecker, json support etc and had to patch it quite frequently. I guess your best bet would be to take a look at some other client like solr.net: http://code.google.com/p/solrnet/. Disclaimer: I haven't evaluated solr.net myself but it looks to be more robust than solr# and is more actively maintained than solr#. S -Original Message- From: Frederico Azeiteiro To: solr-user@lucene.apache.org Sent: Fri, Feb 26, 2010 9:54 pm Subject: Solrsharp Hi, I don't know if this list includes this kind of help, but I'm using Solrsharp with C# to operate SOLR. Please advise if this is off-topic please. I'm having a little trouble to make a search with exclude terms using the query parameters. Does anyone uses Solrsharp around here? Do you manage to exclude terms on searches? Br Frederico
Solrsharp
Hi, I don't know if this list includes this kind of help, but I'm using Solrsharp with C# to operate SOLR. Please advise if this is off-topic please. I'm having a little trouble to make a search with exclude terms using the query parameters. Does anyone uses Solrsharp around here? Do you manage to exclude terms on searches? Br Frederico
Reindex after changing defaultSearchField?
Hi, If i change the "defaultSearchField" in the core schema, do I need to recreate the index? Thanks, Frederico
RE: query all filled field?
I've analyzed my index application and checked the XML before executing the http request and the field it's empty: It should be empty on SOLR. Probably something in the way between my application (.NET) and the SOLR (Jetty on Ubuntu) adds the whitespace. Anyway, I'll try to remove the field but, as I validate each doc to the SOLR schema, I must make some adjustments and stop validate the doc. I don't know if that will be acceptable... Thanks for your help. -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: quinta-feira, 4 de Fevereiro de 2010 17:22 To: solr-user@lucene.apache.org Subject: RE: query all filled field? > XML update. I'm serializing the doc > in .NET, and then using solsharp to > insert/update the doc to SOLR. > > The result is: > > > > > > Dows this means I'm adding a whitespace on XML Update? Yes exactly. You can remove from your ... if value of fieldX.trim() is equal to "" when preparing your xml.
RE: query all filled field?
XML update. I'm serializing the doc in .NET, and then using solsharp to insert/update the doc to SOLR. The result is: Dows this means I'm adding a whitespace on XML Update? Frederico -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: quinta-feira, 4 de Fevereiro de 2010 16:26 To: solr-user@lucene.apache.org Subject: RE: query all filled field? > Theoretically yes,it's correct, but i > have about 1/10 of the docs with > this field not empty and the rest is empty. > > Most of the articles have the field empty as I can see when > query *:*. How are you adding documents to solr? xml update, DIH? Probably you are adding whitespace value to that field. When you query q=*:*&fl=fieldX what do you see? Do you see or
RE: query all filled field?
Theoretically yes,it's correct, but i have about 1/10 of the docs with this field not empty and the rest is empty. Most of the articles have the field empty as I can see when query *:*. So the queries don't make sense... -Original Message- From: Ankit Bhatnagar [mailto:abhatna...@vantage.com] Sent: quinta-feira, 4 de Fevereiro de 2010 14:56 To: 'solr-user@lucene.apache.org' Subject: RE: query all filled field? That's correct. If u want to find "Missing Values" ie fields for whom value is not present then u will use - Ankit -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Thursday, February 04, 2010 9:41 AM To: solr-user@lucene.apache.org Subject: RE: query all filled field? > *:* AND -fieldX:[* TO *] - returns 0 docs > > fieldX:(a*) - return docs, so I'm sure that there's docs > with this field filled. > > Any other ideias what could be wrong? There is not wrong in this scenario. If -fieldX:[* TO *] returns 0 docs, it means that all of your documents have that fieldX filled. Therefore fieldX:[* TO *] returns all of your dataset same as *:*
RE: query all filled field?
I tried another one: fieldX:["" TO *] and it returns articles with the field filled :), so I guess I'm getting there. But I tried also fieldX:[" " TO *] and get a few more results that the first one... Is there a real difference between these, and also if the results are really all docs with field not empty? Thanks again, Frederico -Original Message- From: Frederico Azeiteiro [mailto:frederico.azeite...@cision.com] Sent: quinta-feira, 4 de Fevereiro de 2010 10:55 To: solr-user@lucene.apache.org Subject: RE: query all filled field? Thanks, but still no luck with that: *:* AND -fieldX:[* TO *] - returns 0 docs fieldX:(a*) - return docs, so I'm sure that there's docs with this field filled. Any other ideias what could be wrong? Frederico -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: quinta-feira, 4 de Fevereiro de 2010 05:38 To: solr-user@lucene.apache.org Subject: Re: query all filled field? Queries that start with minus or NOT don't work. You have to do this: *:* AND -fieldX:[* TO *] On Wed, Feb 3, 2010 at 5:04 AM, Frederico Azeiteiro wrote: > Hum, strange.. I reindexed some docs with the field corrected. > > Now I'm sure the field is filled because: > > "fieldX:(*a*)" returns docs. > > But "fieldX:[* TO *]" is returning the same as "*.*" (all results) > > I tried with "-fieldX:[* TO *]" and I get no results at all. > > I wonder if someone has tried this before with success? > > The field is indexed as string, indexed=true and stored=true. > > Thanks, > Frederico > > -Original Message- > From: Ahmet Arslan [mailto:iori...@yahoo.com] > Sent: quarta-feira, 3 de Fevereiro de 2010 11:48 > To: solr-user@lucene.apache.org > Subject: Re: query all filled field? > > >> Is it possible to query some field in order to get only not >> empty >> documents? >> >> >> >> All documents where field x is filled? > > Yes. q=x:[* TO *] will bring documents that has non-empty x field. > > > > -- Lance Norskog goks...@gmail.com
RE: query all filled field?
Thanks, but still no luck with that: *:* AND -fieldX:[* TO *] - returns 0 docs fieldX:(a*) - return docs, so I'm sure that there's docs with this field filled. Any other ideias what could be wrong? Frederico -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: quinta-feira, 4 de Fevereiro de 2010 05:38 To: solr-user@lucene.apache.org Subject: Re: query all filled field? Queries that start with minus or NOT don't work. You have to do this: *:* AND -fieldX:[* TO *] On Wed, Feb 3, 2010 at 5:04 AM, Frederico Azeiteiro wrote: > Hum, strange.. I reindexed some docs with the field corrected. > > Now I'm sure the field is filled because: > > "fieldX:(*a*)" returns docs. > > But "fieldX:[* TO *]" is returning the same as "*.*" (all results) > > I tried with "-fieldX:[* TO *]" and I get no results at all. > > I wonder if someone has tried this before with success? > > The field is indexed as string, indexed=true and stored=true. > > Thanks, > Frederico > > -Original Message- > From: Ahmet Arslan [mailto:iori...@yahoo.com] > Sent: quarta-feira, 3 de Fevereiro de 2010 11:48 > To: solr-user@lucene.apache.org > Subject: Re: query all filled field? > > >> Is it possible to query some field in order to get only not >> empty >> documents? >> >> >> >> All documents where field x is filled? > > Yes. q=x:[* TO *] will bring documents that has non-empty x field. > > > > -- Lance Norskog goks...@gmail.com
RE: query all filled field?
Hum, strange.. I reindexed some docs with the field corrected. Now I'm sure the field is filled because: "fieldX:(*a*)" returns docs. But "fieldX:[* TO *]" is returning the same as "*.*" (all results) I tried with "-fieldX:[* TO *]" and I get no results at all. I wonder if someone has tried this before with success? The field is indexed as string, indexed=true and stored=true. Thanks, Frederico -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: quarta-feira, 3 de Fevereiro de 2010 11:48 To: solr-user@lucene.apache.org Subject: Re: query all filled field? > Is it possible to query some field in order to get only not > empty > documents? > > > > All documents where field x is filled? Yes. q=x:[* TO *] will bring documents that has non-empty x field.
RE: query all filled field?
Ok, if anyone needs it: I tried fieldX:[* TO *] I think this is correct. In my case I found out that I was not indexing this field correctly because they are all empty. :) -Original Message- From: Frederico Azeiteiro [mailto:frederico.azeite...@cision.com] Sent: quarta-feira, 3 de Fevereiro de 2010 11:34 To: solr-user@lucene.apache.org Subject: query all filled field? Hi all, Is it possible to query some field in order to get only not empty documents? All documents where field x is filled? Thanks, Frederico
query all filled field?
Hi all, Is it possible to query some field in order to get only not empty documents? All documents where field x is filled? Thanks, Frederico
RE: Problem comitting on 40GB index
The hanging didn't happen again since yesterday. I never run out of space again. This is still a dev environment, so the number of searches is very low. Maybe I'm just lucky... Where can I see the garbage collection info? -Original Message- From: Marc Des Garets [mailto:marc.desgar...@192.com] Sent: quarta-feira, 13 de Janeiro de 2010 17:20 To: solr-user@lucene.apache.org Subject: RE: Problem comitting on 40GB index Just curious, have you checked if the hanging you are experiencing is not garbage collection related? -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 13 January 2010 13:33 To: solr-user@lucene.apache.org Subject: Re: Problem comitting on 40GB index That's my understanding.. But fortunately disk space is cheap On Wed, Jan 13, 2010 at 5:01 AM, Frederico Azeiteiro < frederico.azeite...@cision.com> wrote: > Sorry, my bad... I replied to a current mailing list message only changing > the subject... Didn't know about this " Hijacking" problem. Will not happen > again. > > Just for close this issue, if I understand correctly, for an index of 40G, > I will need, for running an optimize: > - 40G if all activity on index is stopped > - 80G if index is being searched...) > - 120G if index is being searched and if a commit is performed. > > Is this correct? > > Thanks. > Frederico > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: terça-feira, 12 de Janeiro de 2010 19:18 > To: solr-user@lucene.apache.org > Subject: Re: Problem comitting on 40GB index > > Huh? > > On Tue, Jan 12, 2010 at 2:00 PM, Chris Hostetter > wrote: > > > > > : Subject: Problem comitting on 40GB index > > : In-Reply-To: < > > 7a9c48b51001120345h5a57dbd4o8a8a39fc4a98a...@mail.gmail.com> > > > > http://people.apache.org/~hossman/#threadhijack > > Thread Hijacking on Mailing Lists > > > > When starting a new discussion on a mailing list, please do not reply to > > an existing message, instead start a fresh email. Even if you change the > > subject line of your email, other mail headers still track which thread > > you replied to and your question is "hidden" in that thread and gets less > > attention. It makes following discussions in the mailing list archives > > particularly difficult. > > See Also: http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking > > > > > > > > -Hoss > > > > > -- This transmission is strictly confidential, possibly legally privileged, and intended solely for the addressee. Any views or opinions expressed within it are those of the author and do not necessarily represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's subsidiary companies. If you are not the intended recipient then you must not disclose, copy or take any action in reliance of this transmission. If you have received this transmission in error, please notify the sender as soon as possible. No employee or agent is authorised to conclude any binding agreement on behalf of i-CD Publishing (UK) Ltd with another party by email without express written confirmation by an authorised employee of the Company. http://www.192.com (Tel: 08000 192 192). i-CD Publishing (UK) Ltd is incorporated in England and Wales, company number 3148549, VAT No. GB 673128728.
RE: Problem comitting on 40GB index
Sorry, my bad... I replied to a current mailing list message only changing the subject... Didn't know about this " Hijacking" problem. Will not happen again. Just for close this issue, if I understand correctly, for an index of 40G, I will need, for running an optimize: - 40G if all activity on index is stopped - 80G if index is being searched...) - 120G if index is being searched and if a commit is performed. Is this correct? Thanks. Frederico -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: terça-feira, 12 de Janeiro de 2010 19:18 To: solr-user@lucene.apache.org Subject: Re: Problem comitting on 40GB index Huh? On Tue, Jan 12, 2010 at 2:00 PM, Chris Hostetter wrote: > > : Subject: Problem comitting on 40GB index > : In-Reply-To: < > 7a9c48b51001120345h5a57dbd4o8a8a39fc4a98a...@mail.gmail.com> > > http://people.apache.org/~hossman/#threadhijack > Thread Hijacking on Mailing Lists > > When starting a new discussion on a mailing list, please do not reply to > an existing message, instead start a fresh email. Even if you change the > subject line of your email, other mail headers still track which thread > you replied to and your question is "hidden" in that thread and gets less > attention. It makes following discussions in the mailing list archives > particularly difficult. > See Also: http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking > > > > -Hoss > >
RE: Problem comitting on 40GB index
I restarted the solr and stopped all searches. After that, the commit() was normal (2 secs) and it's been working for 3h without problems (indexing and a few searches too)... I haven't done any optimize yet, mainly because I had no deletes on the index and the performance is ok, so no need to optimize I think.. I had tried this procedure a few times in the morning and the commit always hanged so.. I have no explanation for it start working suddenly.. I'm making a commit every 2m (because I need the results updated on searches), so propably when I have more searches at the same time the commit will hang again right? Sorry for the newbie questions and thanks for your help and explanation Erik. BR, Frederico -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: terça-feira, 12 de Janeiro de 2010 15:15 To: solr-user@lucene.apache.org Subject: Re: Problem comitting on 40GB index Rebooting the machine certainly closes the searchers, but depending upon how you shut it down there may be stale files After reboot (but before you start SOLR), how much space is on your disk? If it's 40G, you have no stale files Yes, IR is IndexReader, which is a searcher. I'll have to leave it to others if you don't have stale files hanging around, although if you're optimizing while searchers are running, you'll use up to 3X the index size... Otherwise I'll have to leave it to others for additional insights Best Erick On Tue, Jan 12, 2010 at 9:22 AM, Frederico Azeiteiro < frederico.azeite...@cision.com> wrote: > Hi Erik, > > I'm a newbie to solr... By IR, you mean searcher? Is there a place where I > can check the open searchers? And rebooting the machine shouldn't closed > that searchers? > > Thanks, > > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: terça-feira, 12 de Janeiro de 2010 13:54 > To: solr-user@lucene.apache.org > Subject: Re: Problem comitting on 40GB index > > There are several possibilities: > > 1> you have some process holding open your indexes, probably > other searchers. You *probably* are OK just committing > new changes if there is exactly *one* searcher keeping > your index open. If you have some process whereby > you periodically open a new search but you fail to close > the old one, then you'll use up an extra 40G for every > version of your index held open by your processes. That's >confusing... I'm saying that if you open any number of IRs, >you'll have 40G consumed. Then if you add >some more documents and open *another* IR, you'll have >another 40G consumed. They'll stay around until you close >your readers. > > 2> If you optimize, there can be up to 3X the index size being >consumed if you also have a previous reader opened. > > So I suspect that sometime recently you've opened another > IR. > > HTH > Erick > > > > On Tue, Jan 12, 2010 at 8:03 AM, Frederico Azeiteiro < > frederico.azeite...@cision.com> wrote: > > > Hi all, > > > > I started working with solr about 1 month ago, and everything was > > running well both indexing as searching documents. > > > > I have a 40GB index with about 10 000 000 documents available. I index > > 3k docs for each 10m and commit after each insert. > > > > Since yesterday, I can't commit no articles to index. I manage to search > > ok, and index documents without commiting. But when I start the commit > > is takes a long time and eats all of the available disk space > > left(60GB). The commit eventually stops with full disk and I have to > > restart SOLR and get the 60GB returned to system. > > > > Before this, the commit was taking a few seconds to complete. > > > > Can someone help to debug the problem? Where should I start? Should I > > try to copy the index to other machine with more free space and try to > > commit? Should I try an optimize? > > > > Log for the last commit I tried: > > > > INFO: start > > commit(optimize=false,waitFlush=false,waitSearcher=true,expungeDeletes=f > > alse) > > (Then, after a long time...) > > Exception in thread "Lucene Merge Thread #0" > > org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: > > No space left on device > >at > > org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(Co > > ncurrentMergeScheduler.java:351) > >at > > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Concurr > > entMergeScheduler.java:315) > > Caused by: java.io.IOException: No space left on device > > > > I'm using Ubuntu 9.04 and Solr 1.4.0. > > > > Thanks in advance, > > > > Frederico > > >
RE: Problem comitting on 40GB index
Hi Erik, I'm a newbie to solr... By IR, you mean searcher? Is there a place where I can check the open searchers? And rebooting the machine shouldn't closed that searchers? Thanks, -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: terça-feira, 12 de Janeiro de 2010 13:54 To: solr-user@lucene.apache.org Subject: Re: Problem comitting on 40GB index There are several possibilities: 1> you have some process holding open your indexes, probably other searchers. You *probably* are OK just committing new changes if there is exactly *one* searcher keeping your index open. If you have some process whereby you periodically open a new search but you fail to close the old one, then you'll use up an extra 40G for every version of your index held open by your processes. That's confusing... I'm saying that if you open any number of IRs, you'll have 40G consumed. Then if you add some more documents and open *another* IR, you'll have another 40G consumed. They'll stay around until you close your readers. 2> If you optimize, there can be up to 3X the index size being consumed if you also have a previous reader opened. So I suspect that sometime recently you've opened another IR..... HTH Erick On Tue, Jan 12, 2010 at 8:03 AM, Frederico Azeiteiro < frederico.azeite...@cision.com> wrote: > Hi all, > > I started working with solr about 1 month ago, and everything was > running well both indexing as searching documents. > > I have a 40GB index with about 10 000 000 documents available. I index > 3k docs for each 10m and commit after each insert. > > Since yesterday, I can't commit no articles to index. I manage to search > ok, and index documents without commiting. But when I start the commit > is takes a long time and eats all of the available disk space > left(60GB). The commit eventually stops with full disk and I have to > restart SOLR and get the 60GB returned to system. > > Before this, the commit was taking a few seconds to complete. > > Can someone help to debug the problem? Where should I start? Should I > try to copy the index to other machine with more free space and try to > commit? Should I try an optimize? > > Log for the last commit I tried: > > INFO: start > commit(optimize=false,waitFlush=false,waitSearcher=true,expungeDeletes=f > alse) > (Then, after a long time...) > Exception in thread "Lucene Merge Thread #0" > org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: > No space left on device >at > org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(Co > ncurrentMergeScheduler.java:351) >at > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Concurr > entMergeScheduler.java:315) > Caused by: java.io.IOException: No space left on device > > I'm using Ubuntu 9.04 and Solr 1.4.0. > > Thanks in advance, > > Frederico >
Problem comitting on 40GB index
Hi all, I started working with solr about 1 month ago, and everything was running well both indexing as searching documents. I have a 40GB index with about 10 000 000 documents available. I index 3k docs for each 10m and commit after each insert. Since yesterday, I can't commit no articles to index. I manage to search ok, and index documents without commiting. But when I start the commit is takes a long time and eats all of the available disk space left(60GB). The commit eventually stops with full disk and I have to restart SOLR and get the 60GB returned to system. Before this, the commit was taking a few seconds to complete. Can someone help to debug the problem? Where should I start? Should I try to copy the index to other machine with more free space and try to commit? Should I try an optimize? Log for the last commit I tried: INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true,expungeDeletes=f alse) (Then, after a long time...) Exception in thread "Lucene Merge Thread #0" org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: No space left on device at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(Co ncurrentMergeScheduler.java:351) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Concurr entMergeScheduler.java:315) Caused by: java.io.IOException: No space left on device I'm using Ubuntu 9.04 and Solr 1.4.0. Thanks in advance, Frederico