Re: Can Master push data to slave
Regarding point b, i mean that when Slave server does a replication from Master, it creates a lock-file in it's index directory. How to avoid that? On Tue, Aug 9, 2011 at 2:56 AM, Markus Jelsma markus.jel...@openindex.iowrote: Hi, Hi I am using Solr 1.4. and doing a replication process where my slave is pulling data from Master. I have 2 questions a. Can Master push data to slave Not in current versions. Not sure about exotic patches for this. b. How to make sure that lock file is not created while replication What do you mean? Please help thanks Pawan
filtering non english text from my results
Hi All, I am looking for a solution to filter out text which contains non english words. Where my goal is to present my english speaking users with results in their language. any ideas? thanks Omri
Re: sorting issue with solr 3.3
I have created an issue with test attached. https://issues.apache.org/jira/browse/SOLR-2713 Will try to figure out whats going wrong. Regards Bernd http://www.base-search.net/ Am 13.08.2011 16:20, schrieb Bernd Fehling: The issue was located in a 31 million docs index and i have already reduced it to a reproducable 4 documents index. It is stock solr 3.3.0. Yes, the documents are also in the wrong order as the field sort values. Just added only the field sort values to the email to keep it short. I will produce a test on Monday when I'm back in my office. Hang on... Regards Bernd http://www.base-search.net/ I've checked in an improved TestSort that adds deleted docs and randomizes things a lot more (and fixes the previous reliance on doc ids not being reordered). I still can't reproduce this error though. Is this stock solr? Can you verify that the documents are in the wrong order also (and not just the field sort values)? -Yonik http://www.lucidimagination.com
Re: Unbuffered entity enclosing request can not be repeated Invalid chunk header
Hi, Hi Markus, thanks for your answer. I'm using Solr. 4.0 and jetty now and observe the behavior and my error logs next week. tomcat can be a reason, we will see, i'll report. I'm indexing WITHOUT batches, one doc after another. But i would try out the batch indexing as well as retry indexing faulty docs. if you indexing one batch, and one doc in batch is corrupt, what happens with another 249docs(total 250/batch)? Are they indexed and updated when you retry to indexing the batch, or fails the complete batch? The entire batch should fail but i cannot confirm. Usually all fail if there is an error somewhere such as an XML error. Regards Vadim 2011/8/11 Markus Jelsma markus.jel...@openindex.io Hi, We see these errors too once on a while but there is real answer on the mailing list here except one user suspecting Tomcat is responsible (connection time outs). Another user proposed to limit the number of documents per batch but that, of course, increases the number of connections made. We do only 250 docs/batch to limit RAM usage on the client and start to see these errors very occasionally. There may be a coincidence.. or not. Anyway, it's really hard to reproduce if not impossible. It happens when connecting directly as well when connecting through a proxy. What you can do is simply retry the batch and it usually works out fine. At least you don't loose a batch in the process. We retry all failures at least a couple of times before giving up an indexing job. Cheers, Hello folks, i use solr 1.4.1 and every 2 to 6 hours i have indexing errors in my log files. on the client side: 2011-08-04 12:01:18,966 ERROR [Worker-242] IndexServiceImpl - Indexing failed with SolrServerException. Details: org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated.: Stacktrace: org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt t pSolrServer.java:469) . . on the server side: INFO: [] webapp=/solr path=/update params={wt=javabinversion=1} status=0 QTime=3 04.08.2011 12:01:18 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {} 0 0 04.08.2011 12:01:18 org.apache.solr.common.SolrException log SCHWERWIEGEND: org.apache.solr.common.SolrException: java.io.IOException: Invalid chunk header . . . i`m indexing ONE document per call, 15-20 documents per second, 24/7. what may be the problem? best regards vadim
Re: Nutch related issue: URL Ignore
The Solr list is not the appropriate list to ask. Please try the Nutch user mailing list. hi i am using nutch 1.2. in my crawl-urlfilter.txt, i am specifying URLs to be skipped. i am giving some patterns that need to be skipped but it is not working e.g. -^http://([a-z0-9]*\.)*domain.com +^http://([a-z0-9]*\.)*domain.com/([0-9-a-z])*.html -^http://([a-z0-9]*\.)*domain.com/([a-z/])* -^http://([a-z0-9]*\.)*domain.com/top-ads.php i want the second URL only to be included while crawling all other patterns to be excluded. but it is crawling all of them. Please suggest where might be the issue thanks Pawan
Migration from Autonomy IDOL to SOLR
Hello. We have a couple of application running on half a dozen Autonomy IDOL servers. Currently, all feature we need are supported by Solr. We have done some internal testing and realized that SOLR would do a better job. So, we are investigation all possibilities for a smooth migration from IDOL to SOLR. I am looking for advice from people who went through something similar. Ideally, we would like to keep most of our legacy code unchanged and have a kind of query-translation-layer plugged into our app if possible. -Is there lib available? -Any thought? Thanks. Arcadius.
Re: strip html from data
2011/8/11 Ahmet Arslan iori...@yahoo.com Is there a way to strip the html tags completly and not index them? If not, how to I retrieve the results without html tags? How do you push documents to solr? You need to strip html tags before the analysis chain. For example, if you are using Data Import Handler, you can use HTMLStripTransformer. http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer Thank you everybody for your help and all the detailed explanations. This solution fixed the problem. Best regards.
Invalid Date String for highlighting any date field match
I must be missing something.. It appears to me with solr 3.2 and 3.3 if you highlight on a date field (e.g by searching on *:*) the application blows up with: ERROR org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException: Invalid Date String:'1306406051000' at org.apache.solr.schema.DateField.parseMath(DateField.java:165) at org.apache.solr.analysis.TrieTokenizer.reset(TrieTokenizerFactory.java:106) at org.apache.solr.analysis.TrieTokenizer.init(TrieTokenizerFactory.java:76) at org.apache.solr.analysis.TrieTokenizerFactory.create(TrieTokenizerFactory.java:51) at org.apache.solr.analysis.TrieTokenizerFactory.create(TrieTokenizerFactory.java:41) at org.apache.solr.analysis.TokenizerChain.getStream(TokenizerChain.java:68) at org.apache.solr.analysis.SolrAnalyzer.reusableTokenStream(SolrAnalyzer.java:75) at org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.reusableTokenStream(IndexSchema.java:385) at org.apache.solr.highlight.DefaultSolrHighlighter.createAnalyzerTStream(DefaultSolrHighlighter.java:550) I am using solrj beans to save Date objects to a schema type of 'date' or 'tdate' - makes no difference. From what I can see this code will never work as the DefaultSolrHighlighter passes the date as a millisecond string all the way down to the TrieTokenizer which calls DateField.parseMath() and this immediately rejects anything which is not formatted as a datestring. -- View this message in context: http://lucene.472066.n3.nabble.com/Invalid-Date-String-for-highlighting-any-date-field-match-tp3255469p3255469.html Sent from the Solr - User mailing list archive at Nabble.com.
A strange Exception in Solr 1.4
java.lang.NullPointerException HI. I meet a NullPointerException in Solr 1.4 . The params is params={q=s_id:112511+AND+b_id:332133defType=lucene} status=500 QTime=1} 2011-08-15 10:31:24,968 ERROR [org.apache.solr.core.SolrCore] - java.lang.NullPointerException at sun.nio.ch.Util.free(Util.java:199) at sun.nio.ch.Util.offerFirstTemporaryDirectBuffer(Util.java:176) at sun.nio.ch.IOUtil.read(IOUtil.java:181) at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:612) at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:161) at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:136) at org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:247) at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38) at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:80) at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:64) at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:129) at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:160) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:232) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:179) at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:975) at org.apache.lucene.index.DirectoryReader.docFreq(DirectoryReader.java:627) at org.apache.lucene.index.FilterIndexReader.docFreq(FilterIndexReader.java:194) at org.apache.lucene.index.MultiReader.docFreq(MultiReader.java:344) at org.apache.solr.search.SolrIndexReader.docFreq(SolrIndexReader.java:308) at org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java:147) at org.apache.lucene.search.Similarity.idfExplain(Similarity.java:765) at org.apache.lucene.search.TermQuery$TermWeight.init(TermQuery.java:46) at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:146) at org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:184) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:415) at org.apache.lucene.search.Query.weight(Query.java:99) at org.apache.lucene.search.Searcher.createWeight(Searcher.java:230) at org.apache.lucene.search.Searcher.search(Searcher.java:171) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:139) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) at com.taobao.terminator.core.realtime.DefaultSearchService.query(DefaultSearchService.java:197) at sun.reflect.GeneratedMethodAccessor73.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.taobao.hsf.rpc.tbremoting.provider.ProviderProcessor.handleRequest0(ProviderProcessor.java:222) at com.taobao.hsf.rpc.tbremoting.provider.ProviderProcessor.handleRequest(ProviderProcessor.java:174) at com.taobao.hsf.rpc.tbremoting.provider.ProviderProcessor.handleRequest(ProviderProcessor.java:41) at com.taobao.remoting.impl.DefaultMsgListener$1ProcessorExecuteTask.run(DefaultMsgListener.java:131) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Thank u allen.Fu
RE: filtering non english text from my results
1. Find a dictionary with the English words you find acceptable 2. Use the KeepWordFilterFactory (doc in the AnalyzerTTokenizersTokenFilters Wiki page). -Original Message- From: Omri Cohen [mailto:omri...@gmail.com] Sent: Monday, August 15, 2011 1:23 AM To: solr-user@lucene.apache.org Subject: filtering non english text from my results Hi All, I am looking for a solution to filter out text which contains non english words. Where my goal is to present my english speaking users with results in their language. any ideas? thanks Omri
Re: parsing many documents takes too long
Sounds like you aren't using SolrJ, which will return a Java object back to you natively. Give that a try and let us know how it fairs against the jaxb method. Erik On Aug 12, 2011, at 02:58 , Tri Nguyen wrote: Hi, My results from solr returns about 982 documents and I use jaxb to parse them into java objects, which takes about 469 ms, which is over my 150-200ms threshold. Is there a solution around this? Can I store the java objects in the index and return them in the solr response and then serialize them back into java objects? Would this take less time? Any other ideas? Thanks, Tri
RE: ideas for indexing large amount of pdf docs
Note on i: Solr replication provides pretty good clustering support out-of-the-box, including replication of multiple cores. Read the Wiki on replication (Google +solr +replication if you don't know where it is). In my experience, the problem with indexing PDFs is it takes a lot of CPU on the document parsing side (client), not on the Solr server side. So make sure you do that part on the client and not the server. Avoiding iii: Suggest that you write yourself a multi-threaded performance test so that you aren't guessing what your performance will be. We wrote one in Perl. It handles an individual thread (we were testing inquiry), and we wrote a little batch file / shell script to start up the desired number of threads. The main statement in our batch file (the rest just set the variables). A Shell script would be even easier. for /L %%i in (1,1,%THREADS%) DO start /B perl solrtest.pl -h %SOLRHOST% -c %COUNT% -u %1 -p %2 -r %SOLRREALM% -f %SOLRLOC%\firstsynonyms.txt -l %SOLRLOC%\lastsynonyms.txt -z %FUZZ% The perl #!/usr/bin/perl # # Perl program to run a thread of solr testing # use Getopt::Std;# For options processing use POSIX; # For time formatting use XML::Simple;# For processing of XML config file use Data::Dumper; # For debugging XML config file use HTTP::Request::Common; # For HTTP request to Solr use HTTP::Response; use LWP::UserAgent; # For HTTP request to Solr $host = YOURHOST:8983; $realm = YOUR AUTHENTICATION REALM; $firstlist = firstsynonyms.txt; $lastlist = lastsynonyms.txt; $fuzzy = ; $me = $0; sub usage() { print perl $me -c iterations [-d] [-h host:port ] [-u user [-p password]] \n; print \t\t[-f firstnamefile] [-l lastnamefile] [-z fuzzy] [-r realm]\n; exit(8); } # # Process the command line options, and open the output file. # getopts('dc:u:p:f:l:h:r:z:') || usage(); if(!$opt_c) { usage(); } $count = $opt_c; if($opt_u) { $user = $opt_u; } if($opt_p) { $password = $opt_p; } if($opt_h) { $host = $opt_h; } if($opt_f) { $firstlist = $opt_f; } if($opt_l) { $lastlist = $opt_l; } if($opt_r) { $realm = $opt_r; } if($opt_z) { $fuzzy = ~ . $opt_z; } $debug = $opt_d; # # If the host string does not include a :, add :80 # if($host !~ /:/) { $host = $host . :80; } # # Read the lists of first and last names # open(SYNFILE,$firstlist) || die Can't open first name list $firstlist\n; while(SYNFILE) { @newwords = split /,/; for($i=0; $i = $#newwords; ++$i) { $newwords[$i] =~ s/^\s+//; $newwords[$i] =~ s/\s+$//; $newwords[$i] = lc($newwords[$i]); } push @firstnames, @newwords; } close(SYNFILE); open(SYNFILE,$lastlist) || die Can't open last name list $lastlist\n; while(SYNFILE) { @newwords = split /,/; for($i=0; $i = $#newwords; ++$i) { $newwords[$i] =~ s/^\s+//; $newwords[$i] =~ s/\s+$//; $newwords[$i] = lc($newwords[$i]); } push @lastnames, @newwords; } close(SYNFILE); print $#firstnames First Names, $#lastnames Last Names\n; print User: $user\n; my $userAgent = LWP::UserAgent-new(agent = 'solrtest.pl'); $userAgent-credentials($host,$realm,$user,$password); $uri = http://$host/solr/select;; $starttime = time(); for($c=0; $c $count; ++$c) { $fname = $firstnames[rand $#firstnames]; $lname = $lastnames[rand $#lastnames]; $response = $userAgent-request( POST $uri, [ q = lnamesyn:$lname AND fnamesyn:$fname$fuzzy, rows = 25 ]); if($debug) { print Query: lnamesyn:$lname AND fnamesyn:$fname$fuzzy; print $response-content(); } print POST for $fname $lname completed, HTTP status= . $response-code . \n; } $elapsed = time() - $starttime; $average = $elapsed / $count; print Time: $elapsed s ($average/request)\n; -Original Message- From: Rode Gonzalez (libnova) [mailto:r...@libnova.es] Sent: Saturday, August 13, 2011 3:50 AM To: solr-user@lucene.apache.org Subject: ideas for indexing large amount of pdf docs Hi all, I want to ask about the best way to implement a solution for indexing a large amount of pdf documents between 10-60 MB each one. 100 to 1000 users connected simultaneously. I actually have 1 core of solr 3.3.0 and it works fine for a few number of pdf docs but I'm afraid about the moment when we enter in production time. some possibilities: i. clustering. I have no experience in this, so it will be a bad idea to venture into this. ii. multicore solution. make some kind of hash to choose one core at each query (exact queries) and thus reduce
Re: Exception DirectSolrSpellChecker when using spellcheck.q
what subversion revision are you using? I think you just need to svn up, as from the line number I can tell its before I fixed this bug in trunk :) On Fri, Aug 12, 2011 at 11:36 AM, O. Klein kl...@octoweb.nl wrote: Spellchecker works fine, but when using spellcheck.q it gives following exception (queryAnalyzerFieldType is defined if that would matter). Is it bug or am I doing something wrong? 2011-08-12 17:30:54,368 java.lang.NullPointerException at org.apache.solr.handler.component.SpellCheckComponent.getTokens(SpellCheckComponent.java:476) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:131) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:202) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1401) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) -- View this message in context: http://lucene.472066.n3.nabble.com/Exception-DirectSolrSpellChecker-when-using-spellcheck-q-tp3249565p3249565.html Sent from the Solr - User mailing list archive at Nabble.com. -- lucidimagination.com
Solr + Arabic Search
I am trying to search Arabic keyword in solr, but am just unable to do so. I have successfully indexed Arabic but the search doesn’t seem to be working, Search URL: http://localhost:8080/solr/tw/select/?q=%D8%AA%D8%A3%D8%AC%D9%8A%D8%B1%20%D8%A7%D9%84%D8%A7%D9%87%D9%84%D9%8A The response : response lst name=responseHeader int name=status0/int int name=QTime18/int lst name=params str name=qتأجير الاهلي/str /lst /lst result name=response numFound=0 start=0/ /response Regards, Rohit Mobile: +91-9901768202 About Me: http://about.me/rohitg http://about.me/rohitg
SolrJ and ContentStreams
Hi I'm considering to use SolrJ to run queries in a MLT fashion against my Solr server. I saw that there is already an open bug filed in Jira (https://issues.apache.org/jira/browse/SOLR-1085). My questions is: Is it possible to use content streams to pass a data stream to the MLT handler in SolrJ? Ideally I'd like to do something like http://localhost:8983/solr/mlt?stream.body=electronics%20memorymlt.fl=manu,catmlt.interestingTerms=listmlt.mintf=0 in SolrJ. Currently I'm defining most of the MLT specific parameters in the solrconfig.xml. Is that possible in SolrJ? Thanks, Marcus -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-and-ContentStreams-tp3256237p3256237.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr + Arabic Search
I am trying to search Arabic keyword in solr, but am just unable to do so. I have successfully indexed Arabic but the search doesn’t seem to be working, Could it be URI encoding of your servlet container? http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config Does 'match all docs query' q=*:*defType=lucene returns something?
Minimum score filter
Is there a way to set a minimum score requirement so that matches below a given score are not return/included in facet counts.
RE: Solr + Arabic Search
Thanks Ahmet, this was the problem I guess. Regards, Rohit Mobile: +91-9901768202 About Me: http://about.me/rohitg -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: 15 August 2011 22:20 To: solr-user@lucene.apache.org Subject: Re: Solr + Arabic Search I am trying to search Arabic keyword in solr, but am just unable to do so. I have successfully indexed Arabic but the search doesn’t seem to be working, Could it be URI encoding of your servlet container? http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config Does 'match all docs query' q=*:*defType=lucene returns something?
Re: Migration from Autonomy IDOL to SOLR
This might be a longshot but... Adobe is deprecating Verity in Coldfusion engine. Version 9 has both databases but I believe CF10 will only have Solr bundled. Idol is the-new-verity since autonomy acquired verity. Although Adobe wraps solr to work like old verity, there might be some info on people who migrated from verity from solr few years ago. Sorry for not helping much but sometimes these little information leads to something. 2011/8/15 Arcadius Ahouansou arcad...@menelic.com Hello. We have a couple of application running on half a dozen Autonomy IDOL servers. Currently, all feature we need are supported by Solr. We have done some internal testing and realized that SOLR would do a better job. So, we are investigation all possibilities for a smooth migration from IDOL to SOLR. I am looking for advice from people who went through something similar. Ideally, we would like to keep most of our legacy code unchanged and have a kind of query-translation-layer plugged into our app if possible. -Is there lib available? -Any thought? Thanks. Arcadius. -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Minimum score filter
The absolute value of a relevance score doesn't have a lot of meaning and the range of scores can vary a lot depending on any boost you may apply. Even if you normalize them (say on a 1-100 scale where 100 is the max relevance) you can't really draw any valid conclusions from those values. It would help if you described exactly what problem you're trying to solve. -Simon On Mon, Aug 15, 2011 at 1:02 PM, Donald J. Organ IV dor...@donaldorgan.comwrote: Is there a way to set a minimum score requirement so that matches below a given score are not return/included in facet counts.
Re: Minimum score filter
OK I am doing a search using the following fields name^2.0 code^1.8 cat_search^1.5 description^0.8 I am searching for: free range dog nips I am getting back 2 documents the first is the document I am looking for, and contains those works in the name field, as the name field is Free Range Dog Nip Chicken Breast Wraps The second looks like its matching because those words are contained within the description. - Original Message - From: simon mtnes...@gmail.com To: solr-user@lucene.apache.org Sent: Monday, August 15, 2011 1:59:17 PM Subject: Re: Minimum score filter The absolute value of a relevance score doesn't have a lot of meaning and the range of scores can vary a lot depending on any boost you may apply. Even if you normalize them (say on a 1-100 scale where 100 is the max relevance) you can't really draw any valid conclusions from those values. It would help if you described exactly what problem you're trying to solve. -Simon On Mon, Aug 15, 2011 at 1:02 PM, Donald J. Organ IV
Re: Tomcat7 with Solr closes at fixed hours, every time another hour
: : I'm having a Solr running within Tomcat7 and Tomcat is closing at : fixed hours, everytime is a different hour. catalina.log doesn't show : anything other than a clean tomcat shutdown (no exception or : anything). I would really apreciate some advice on how to debug this. : Tomcat doesn't run anything other than solr. this doesn't appear to be related to Solr. You can see from your logs that the command originats from outside of solr -- I suspect you would see the same problem if you ran a tomcat instance on this port w/o using solr at all. My guess is you have a rogue cron command either running on the local machine or using the remote shutdown port telling tomcat to shutdown. (perhaps it's looking for tomcat ports whose logs suggest they aren't getting a lot of traffic? or aren't registered with a load balancer?) You might want to start by making sure you have remote shutdown support disabled... https://tomcat.apache.org/tomcat-7.0-doc/security-howto.html#Server ...and checking the crontab on the local machine to see what runs on the hour. -Hoss
Re: Why is boost not always listed in explain when debug is on?
: using Solr Specification Version: 4.0.0.2011.08.09.11.02.13 : : While trying understand scoring I noticed that boost is intermittently : displayed in the explain. For example, using edismax and the query string is Hmmm... that output is strange. it's not just the boost that's missing, all of the details about the queryWeight part of the score from the name:starbucks clause are missing (and only the fieldWeight) is listed... : 8.609147 = (MATCH) weight(name:starbucks^20.0 in 163) [DefaultSimilarity], : result of: : 8.609147 = fieldWeight in 163, product of: : 1.0 = tf(freq=1.0), with freq of: : 1.0 = termFreq=1 : 8.609147 = idf(docFreq=8644, maxDocs=17433139) : 1.0 = fieldNorm(doc=163) ...i honestly have no idea what would cause that ... my best guess is that maybe with the boost thta high the queryWeight winds up being 1.0 and the score Explanation code leaves it out since it doesn't affect things? -Hoss
Re: Migration from Autonomy IDOL to SOLR
Hi Alexei. I had a quick look and it seems that Adobe provides their CF tag as a wrapper around the verity/solr API, therefore, the application code is not poluated with client specific API. This makes app migration easier. Thanks for the input. Arcadius. On Mon, Aug 15, 2011 at 6:46 PM, Alexei Martchenko ale...@superdownloads.com.br wrote: This might be a longshot but... Adobe is deprecating Verity in Coldfusion engine. Version 9 has both databases but I believe CF10 will only have Solr bundled. Idol is the-new-verity since autonomy acquired verity. Although Adobe wraps solr to work like old verity, there might be some info on people who migrated from verity from solr few years ago. Sorry for not helping much but sometimes these little information leads to something. 2011/8/15 Arcadius Ahouansou arcad...@menelic.com Hello. We have a couple of application running on half a dozen Autonomy IDOL servers. Currently, all feature we need are supported by Solr. We have done some internal testing and realized that SOLR would do a better job. So, we are investigation all possibilities for a smooth migration from IDOL to SOLR. I am looking for advice from people who went through something similar. Ideally, we would like to keep most of our legacy code unchanged and have a kind of query-translation-layer plugged into our app if possible. -Is there lib available? -Any thought? Thanks. Arcadius. -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Indexing from a database via SolrJ
Is there a simple way to get all the fields from a jdbc resultset into a bunch of SolrJ documents, which I will then send to be indexed in Solr? I would like to avoid the looping required to copy the data one field at a time. Copying it one document at a time would be acceptable, but it would be nice if there was a way to copy them all at once. Another idea that occurred to me is to add the dataimporter jar to my project and leverage it to do the heavy lifting, but I will need some pointers about what objects and methods to research. Is that a reasonable idea, or is it too integrated into the server code to be used with SolrJ? Can anyone point me in the right direction? Thanks, Shawn
Re: defType argument weirdness
: Huh, I'm still not completely following. I'm sure it makes sense if you : understand the underlying implemetnation, but I don't understand how 'type' : and 'defType' don't mean exactly the same thing, just need to be expressed : differently in different location. ... : prefixing def to type is not making it very clear what the difference is! : What's def supposed to stand for anyway?) def == default. type and defType both select a QParser, but they select the QParser for parsing different levels of sub queries. type can only be used as a localparam and it is how you instruct Solr as to what QParser you want it to use when parsing *that* specific query string. defType can be used as either a top level param or as a localparam (to specify the default value for the type of QParser you want used for the main query string at that level. Here's an example i just used last week in a project (isfdb-solr) that shows what i mean... q={!boost b=sum(views,annualviews) defType=dismax v=$qq} ...that's just syntactic sugar for... q={!type=boost b=sum(views,annualviews) defType=dismax v=$qq} The type localparam (of q) says that the q param should be parsed using the boost QParser (which is what knows to parse the b param as a function and how to use it) regardless of whatever top level defType param might be specified. the defType localparam then says that when parsing the main sub query (the qq param in this case) the default value assumed for the type local param should be dismax. so if i have this... q={!boost b=sum(M,N) defType=dismax v=$qq}qq=XXX that will result in XXX being parsed using the dismax QParser. ...but if i have this... q={!boost b=sum(M,N) defType=dismax v=$qq}qq={!type=lucene}XXX ...then the defType localparam is ignored and XXX is parsed using the lucene QParser (type overrides defType). but defType only applies the default for the main query one level down ... it doesn't recurse forever (and it doesn't apply to secondary query string parsing like fq or facet.query or the b function in the boost QParser) so if you have something like this... q={!boost b=sum(M,N) defType=XXX v=$qq}qq={!lucene v=$zz}zz=CCC that defType=XXX won't be used when parsing CCC (because it's one level removed) -Hoss
Product data schema question
I'm working on an online eCommerce project and am having difficulties building the core / index schema. Here is the way we organize our product information in a normalized database. A product model has many SKUs (called colorways) A SKU has many sizes (called variants) A SKU size has associated inventory (called variant inventory) When we setup our product core we have the following field information Doc * brand * model name * SKU * color name Sample records are as follows * Haynes, Undershirt, 1234, white * Haynes, Undershirt, 1235, grey * Fruit of the Loom, Undershirt, 1236, white * Fruit of the Loom, Underwear, 1237, grey The issue I'm having is I want to add inventory to each size of each SKU for faceting. Example, SKU 1234 has sizes small, medium, large. Size small has 5 in stock, size medium 10, and size large 25. In a normalized data table I would have a separate table just for inventory and related it back to the SKU with a foreign key. How do I store size and inventory information effectively with Solr? -- Steve
hl.useFastVectorHighlighter, fragmentsBuilder and HighlightingParameters
I'm having some trouble trying to upgrade my old hightligher from highlightingfragmenterformatter format (1.4 version, default config in the solr website) to the new Fast Vector highlighter. I'm using SOLR 3.3.0 with luceneMatchVersionLUCENE_33/luceneMatchVersion in config In my solrconfig.xml i added these lines in the default request handler: bool name=hl.useFastVectorHighlightertrue/bool bool name=hl.usePhraseHighlightertrue/bool bool name=hl.highlightMultiTermtrue/bool str name=hl.fragmentsBuildercolored/str and fragmentsBuilder name=colored class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder lst name=defaults str name=hl.tag.pre![CDATA[ b style=background:yellow,b style=background:lawgreen, b style=background:aquamarine,b style=background:magenta, b style=background:palegreen,b style=background:coral, b style=background:wheat,b style=background:khaki, b style=background:lime,b style=background:deepskyblue]]/str str name=hl.tag.post![CDATA[/b]]/str /lst /fragmentsBuilder All I get is: ('grave' means severe) 15/08/2011 20:44:19 org.apache.solr.common.SolrException log GRAVE: org.apache.solr.common.SolrException: Unknown fragmentsBuilder: colored at org.apache.solr.highlight.DefaultSolrHighlighter.getSolrFragmentsBuilder(DefaultSolrHighlighter.java:320) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFastVectorHighlighter(DefaultSolrHighlighter.java:508) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:376) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:116) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Docs in http://wiki.apache.org/solr/HighlightingParameters say: hl.fragmentsBuilder Specify the name of SolrFragmentsBuilderhttp://wiki.apache.org/solr/SolrFragmentsBuilder . [image: !] Solr3.1 http://wiki.apache.org/solr/Solr3.1 This parameter makes sense for FastVectorHighlighterhttp://wiki.apache.org/solr/FastVectorHighlighter only. SolrFragmentsBuilder http://wiki.apache.org/solr/SolrFragmentsBuilder respects hl.tag.pre/post parameters: !-- multi-colored tag FragmentsBuilder -- fragmentsBuilder name=colored class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder lst name=defaults str name=hl.tag.pre![CDATA[ b style=background:yellow,b style=background:lawgreen, b style=background:aquamarine,b style=background:magenta, b style=background:palegreen,b style=background:coral, b style=background:wheat,b style=background:khaki, b style=background:lime,b style=background:deepskyblue]]/str str name=hl.tag.post![CDATA[/b]]/str /lst /fragmentsBuilder -- *Alexei*
Re: Indexing from a database via SolrJ
Hi Shawn. Unles you are doing complex pre-processing before indexing, you may want to have a look at: http://wiki.apache.org/solr/DataImportHandler#Usage_with_RDBMS That should take care of it without any coding. You may need to periodically do a HTTP GET to trigger the import. Arcadius. On Mon, Aug 15, 2011 at 11:25 PM, Shawn Heisey s...@elyograg.org wrote: Is there a simple way to get all the fields from a jdbc resultset into a bunch of SolrJ documents, which I will then send to be indexed in Solr? I would like to avoid the looping required to copy the data one field at a time. Copying it one document at a time would be acceptable, but it would be nice if there was a way to copy them all at once. Another idea that occurred to me is to add the dataimporter jar to my project and leverage it to do the heavy lifting, but I will need some pointers about what objects and methods to research. Is that a reasonable idea, or is it too integrated into the server code to be used with SolrJ? Can anyone point me in the right direction? Thanks, Shawn -- W: www.menelic.com ---
Score
How do I change the score to scale it between 0 and 100 irregardless of the score? q.alt=*:*bq=lang:SpanishdefType=dismax Bill Bell Sent from mobile
Re: Score
https://wiki.apache.org/lucene-java/ScoresAsPercentages On Mon, Aug 15, 2011 at 8:13 PM, Bill Bell billnb...@gmail.com wrote: How do I change the score to scale it between 0 and 100 irregardless of the score? q.alt=*:*bq=lang:SpanishdefType=dismax Bill Bell Sent from mobile
Re: Indexing from a database via SolrJ
On 8/15/2011 5:55 PM, Arcadius Ahouansou wrote: Hi Shawn. Unles you are doing complex pre-processing before indexing, you may want to have a look at: http://wiki.apache.org/solr/DataImportHandler#Usage_with_RDBMS That should take care of it without any coding. You may need to periodically do a HTTP GET to trigger the import. I'm aware of this, and my current build system written in Perl works this way. When I need to do a full index rebuild, I will still use the DIH, but it has become too limiting for regular indexing needs. It will be inadequate for things that we have in development. We need more flexibility, so I am wanting to handle the interface to the DB myself and index directly with SolrJ. Thanks, Shawn