Exception while processing: attach document
Hello all, I'm getting stuck when trying to import oracle DB to solr index, could any one of you give a hand. Thanks million. Below is some short info. that might be a question My Sorl: 1.4.1 *LOG * INFO: Starting Full Import Oct 29, 2010 1:19:35 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Oct 29, 2010 1:19:35 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity attach with URL: jdbc:oracle:thin:@192.168.72.7:1521:OFIRDS22 Oct 29, 2010 1:19:36 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument *SEVERE: Exception while processing: attach document *: SolrInputDocument[{}] org.apache.solr.handler.dataimport.DataImportHandlerException: *Unable to execute query: *select * from /A.B/ Processing Document # 1 where A: a schema B: a table *dataSource *=== dataSource name=jdbc driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@192.168.72.7:1521:OFIRDS22 user=abc password=xyz readOnly=true autoCommit=false batchSize=1/ document entity dataSource=jdbc name=attach query=select * from /A.B/ entity processor=SqlEntityProcessor dataField=attach.TOPIC format=text field column=text name=text / /entity /entity /document where TOPIC is a filed of table B Thanks again
Re: Looking for Developers
When I first saw this particular email, I wrote a letter intend to ask the sender remove solr-user from its recepient cause I thought this should go to solr-dev. But then I thought again, it's about 'job-offer' not 'development of Solr', I just delete my email. Maybe solr-job is a good suggestion. A selfish reason pro this suggestion is that I'm also looking for some one familiar with Solr to work for me in Taiwan I really don't know where to ask. Scott - Original Message - From: Dennis Gearon gear...@sbcglobal.net To: solr-user@lucene.apache.org; doh...@gmail.com Sent: Friday, October 29, 2010 4:28 AM Subject: Re: Looking for Developers Hey! I represent those remarks! I was on that committee (really) because I am/was a: http://www.rhyolite.com/anti-spam/you-might-be.html#spam-fighter and about 20 other 'types' on that list. I'm a little bit more mature, but only a little. White lists are the only way to go. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Thu, 10/28/10, Ken Stanley doh...@gmail.com wrote: From: Ken Stanley doh...@gmail.com Subject: Re: Looking for Developers To: solr-user@lucene.apache.org Date: Thursday, October 28, 2010, 12:33 PM On Thu, Oct 28, 2010 at 2:57 PM, Michael McCandless luc...@mikemccandless.com wrote: I don't think we should do this until it becomes a real problem. The number of job offers is tiny compared to dev emails, so far, as far as I can tell. Mike By the time that it becomes a real problem, it would be too late to get people to stop spamming the -user mailing list; no? - Ken ___b___J_T_f_r_C Checked by AVG - www.avg.com Version: 9.0.865 / Virus Database: 271.1.1/3223 - Release Date: 10/28/10 03:12:00
Re: Looking for Developers
On Fri, Oct 29, 2010 at 12:23 PM, scott chu (朱炎詹) scott@udngroup.com wrote: When I first saw this particular email, I wrote a letter intend to ask the sender remove solr-user from its recepient cause I thought this should go to solr-dev. But then I thought again, it's about 'job-offer' not 'development of Solr', I just delete my email. To add more with regards to the original mail that started this thread: We are based in India, and for the first mail, I replied to the person off-list offering our services, but never got a reply. So, I wonder how serious this guy was in the first place. Maybe solr-job is a good suggestion. A selfish reason pro this suggestion is that I'm also looking for some one familiar with Solr to work for me in Taiwan I really don't know where to ask. In other lists with a broader audience, such as a local Linux users list, our practice has been that job offers are tolerated if posted once, and marked as Commercial in the subject header. Given the low volume of such posts in this list, maybe that could be an acceptable solution? We would also be happy with a separate solr-jobs list. Regards, Gora
Maximum of length of a Dismax Query?
Hi Everybody, It seems that the maximum query length supported by the Dismax Query Handler is 3534 characters. Is there anyway I can set this limit to be around 12,000? If I fire a query beyond 3534 characters, I don't even get error messages in the catalina.XXX log files. Swapnonil Mukherjee +91-40092712 +91-9007131999
Re: QueryElevation Component is so slow
anyone has some suggestions to improve the search? thanks On 10/28/10, Chamnap Chhorn chamnapchh...@gmail.com wrote: Sorry for very bad pasting. I paste it again. Slowest Components Count Exclusive Total QueryElevationComponent 1 506,858 ms 100% 506,858 ms 100% SolrIndexSearcher 1 2.0 ms 0% 2.0 ms 0% org.apache.solr.servlet.SolrDispatchFilter.doFilter() 1 1.0 ms 0% 506,862 ms 100% QueryComponent 1 1.0 ms 0% 1.0 ms 0% DebugComponent 1 0.0 ms 0% 0.0 ms 0% FacetComponent 1 0.0 ms 0% 0.0 ms 0% On Thu, Oct 28, 2010 at 4:57 PM, Chamnap Chhorn chamnapchh...@gmail.comwrote: Hi, I'm using solr 1.4 and using QueryElevation Component for guaranteed search position. I have around 700,000 documents with 1 Mb elevation file. It turns out it is quite slow on the newrelic monitoring website: Slowest Components Count Exclusive Total QueryElevationComponent 1 506,858 ms 100% 506,858 ms 100% SolrIndexSearcher 1 2.0 ms 0% 2.0 ms 0% org.apache.solr.servlet.SolrDispatchFilter.doFilter() 1 1.0 ms 0% 506,862 ms 100% QueryComponent 1 1.0 ms 0% 1.0 ms 0% DebugComponent 1 0.0 ms 0% 0.0 ms 0% FacetComponent 1 0.0 ms 0% 0.0 ms 0% As you could see, QueryElevationComponent takes quite a lot of time. Any suggestion how to improve this? -- Chhorn Chamnap http://chamnapchhorn.blogspot.com/ -- Chhorn Chamnap http://chamnapchhorn.blogspot.com/ -- Chhorn Chamnap http://chamnapchhorn.blogspot.com/
Newbie to Solr, LIKE:foo
I'm Nutch user but I'm considering to use Solr for the following reason. I need a LIKE:foo , which turns into a *foo* query. I saw the built-in prefix query parser but it does only look for foo*, if I understand it well So is there a query parser that does what I'm looking. If not how difficult is it to build one with Solr ? -- -MilleBii-
Re: Looking for Developers
For me, I simply deleted the original email, but I'm now quite enjoying the irony of the complaints causing more noise on the list than the original email! ;-) M -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Re: Possible bug in query sorting
That's my schema XML: ?xml version=1.0 encoding=UTF-8 ? schema name=example version=1.2 types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=uuid class=solr.UUIDField indexed=true required=true omitNorms=true/ fieldType name=date class=solr.TrieDateField omitNorms=true precisionStep=0 positionIncrementGap=0/ fieldType name=integer class=solr.IntField omitNorms=true/ fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr. WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.ISOLatin1AccentFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory / filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.ISOLatin1AccentFilterFactory / /analyzer /fieldType /types fields field name=text type=text indexed=true stored=false required=false multiValued=false omitNorms=false / field name=icms_collection type=text indexed=true stored=true required=true multiValued=false omitNorms=false / field name=link type=text indexed=true stored=true required=true multiValued=false omitNorms=false / field name=title type=text indexed=true stored=true required=true multiValued=false omitNorms=false / field name=contributor type=text indexed=false stored=false required=false multiValued=false omitNorms=false / /fields uniqueKeylink/uniqueKey defaultSearchFieldtext/defaultSearchField solrQueryParser defaultOperator=AND/ copyField source=title dest=text/ copyField source=contributor dest=text/ ... /schema 2010/10/28 Gora Mohanty g...@mimirtech.com On Thu, Oct 28, 2010 at 5:18 PM, Michael McCandless luc...@mikemccandless.com wrote: Is it somehow possible that you are trying to sort by a multi-valued field? [...] Either that, or or your field gets processed into multiple tokens via the analyzer/tokenizer path in your schema. The reported error is a consequence of the fact that different documents might result in a different number of tokens. Please show us the part of schema.xml that defines the field type for the field title. Regards, Gora
Natural string sorting
Just a quick question about natural sorting of strings. I've a simple dynamic field in my schema: fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ field name=nameSort_en type=string indexed=true stored=false omitNorms=true/ There are 3 indexed strings for example string1,string2,string10 Executing a query and sorting by this field leads to unnatural sorting of : string1 string10 string2 (Some time ago i used Lucene and i was pretty sure that Lucene used a natural sort, thus i expected the same from solr) Is there a way to sort in a natural order? Config option? Plugin? Expected output would be: string1 string2 string10 Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Natural-string-sorting-tp1791227p1791227.html Sent from the Solr - User mailing list archive at Nabble.com.
org.tartarus package in lucene/solr?
Hi, How come $subject is present?? -- Regards, Tharindu
Re: Natural string sorting
I think string10 is before string2 in lexicographic order? On 29 October 2010 09:18, RL rl.subscri...@gmail.com wrote: Just a quick question about natural sorting of strings. I've a simple dynamic field in my schema: fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ field name=nameSort_en type=string indexed=true stored=false omitNorms=true/ There are 3 indexed strings for example string1,string2,string10 Executing a query and sorting by this field leads to unnatural sorting of : string1 string10 string2 (Some time ago i used Lucene and i was pretty sure that Lucene used a natural sort, thus i expected the same from solr) Is there a way to sort in a natural order? Config option? Plugin? Expected output would be: string1 string2 string10 Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Natural-string-sorting-tp1791227p1791227.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Possible bug in query sorting
On Fri, Oct 29, 2010 at 1:47 PM, Pablo Recio pre...@yaco.es wrote: That's my schema XML: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr. WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.ISOLatin1AccentFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory / filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.ISOLatin1AccentFilterFactory / /analyzer /fieldType /types fields [...] field name=title type=text indexed=true stored=true required=true multiValued=false omitNorms=false / field name=contributor type=text indexed=false stored=false required=false multiValued=false omitNorms=false / /fields [...] The issue is that you are using the WhitespaceTokenizerFactory as an analyzer for the field. This is resulting in a different number of tokens in different documents, which is causing the error. Use a field that is non-tokenized, e.g., change the type of the title field to string. If you need a tokenized title field, copy the field to another of type string, and sort on that field instead. Please see http://wiki.apache.org/solr/CommonQueryParameters#sort Regards, Gora
Re: Searching for terms on specific fields
Cheers Hoss. That did it for me. ~~Sent by an Android On 29 Oct 2010 00:39, Chris Hostetter hossman_luc...@fucit.org wrote: The specifics of your overall goal confuse me a bit, but drilling down to your core question... : I want to be able to use the dismax parser to search on both terms : (assigning slops and tie breaks). I take it the 'fq' is a candidate for : this,but can I add dismax capabilities to fq as well? Also my query would be ...you can use any parser you want for fq, using the localparams syntax... http://wiki.apache.org/solr/LocalParams ..so you could have something like... q=foo:barfq={!dismax qf='yak zak'}baz ..the one thing you have to watch out for when using localparams and dismax is that the outer params are inherited by the inner params by default -- so if you are using dismax for your main query 'q' (with defType) and you have global params for qf, pf, bq, etc... those are inherited by your fq={!dismax} query unless you override them with local params -Hoss
Re: OutOfMemory and auto-commit
If the problem is autowarming queries running in the meantime maybe you could consider changing set to true the following: useColdSearcherfalse/useColdSearcher and/or change this value maxWarmingSearchers2/maxWarmingSearchers another option would be lowering the value of autowarmCount inside the cache definitions. Hope this helps. Tommaso 2010/10/25 Jonathan Rochkind rochk...@jhu.edu Yes, that's my question too. Anyone? Dennis Gearon wrote: How is this avoided? Dennis Gearon --- On Thu, 10/21/10, Lance Norskog goks...@gmail.com wrote: From: Lance Norskog goks...@gmail.com Subject: Re: OutOfMemory and auto-commit To: solr-user@lucene.apache.org Date: Thursday, October 21, 2010, 9:53 PM Yes. Indexing activity suspends until the commit finishes, then starts. Having both queries and indexing on the same Solr will have this memory problem. Lance On Thu, Oct 21, 2010 at 1:16 PM, Jonathan Rochkind rochk...@jhu.edu wrote: If I do _not_ have any auto-commit enabled, and add 500k documents and commit at end, no problem. If I instead set auto-commit maxDocs to 10 (pretty large number), and try to add 500k docs, with autocommits theoretically happening every 100k... I run into an OutOfMemory error. Can anyone think of any reasons that would cause this, and how to resolve it? All I can think of is that in the first case, my newSearcher and firstSearcher warming queries don't run until the 'document add' is completely done. In the second case, there are newSearcher and firstSearcher warming queries happening at the same time another process is continuing to stream 'add's to Solr. Although at a maxDocs of 10, I shouldn't (I think) get _overlapping_ warming queries, the warming queries should be done before the next commit. I think. But nonetheless, just the fact that warming queries are happening at the same time 'add's are continuing to stream, could that be enough to somehow increase memory usage enough to run into OOM? -- Lance Norskog goks...@gmail.com
Re: Maximum of length of a Dismax Query?
I am using the SOLRJ client to post my query, The query length is roughly 10,000 characters. I am using GET like this. int page = 1; int resultsPerPage = 24; ModifiableSolrParams params = new ModifiableSolrParams(); params.set(q, query); params.set(start, + (page - 1) * resultsPerPage); params.set(rows, resultsPerPage); try { QueryResponse response = QueryServerManager.getSolrServer().query(params, SolrRequest.METHOD.GET); assertNotNull(response); } catch (SolrServerException e) { e.printStackTrace(); } This hits the exception block with the following exception org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:472) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:122) at com.getty.search.tests.DismaxQueryTestCase.testAssetQuery(DismaxQueryTestCase.java:34) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at junit.framework.TestCase.runTest(TestCase.java:154) at junit.framework.TestCase.runBare(TestCase.java:127) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:118) at junit.textui.TestRunner.doRun(TestRunner.java:116) at com.intellij.junit3.JUnit3IdeaTestRunner.doRun(JUnit3IdeaTestRunner.java:108) at junit.textui.TestRunner.doRun(TestRunner.java:109) at com.intellij.junit3.JUnit3IdeaTestRunner.startRunnerWithArgs(JUnit3IdeaTestRunner.java:42) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:192) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:64) Swapnonil Mukherjee On 29-Oct-2010, at 12:44 PM, Swapnonil Mukherjee wrote: Hi Everybody, It seems that the maximum query length supported by the Dismax Query Handler is 3534 characters. Is there anyway I can set this limit to be around 12,000? If I fire a query beyond 3534 characters, I don't even get error messages in the catalina.XXX log files. Swapnonil Mukherjee +91-40092712 +91-9007131999
Re: Natural string sorting
On Fri, 2010-10-29 at 10:18 +0200, RL wrote: Executing a query and sorting by this field leads to unnatural sorting of : string1 string10 string2 That's very much natural. Numbers are not treated any different from words made up of letters. Your have to use alignment if you want to use natural sorting: string01 string02 string10 (Some time ago i used Lucene and i was pretty sure that Lucene used a natural sort, thus i expected the same from solr) Lucene sorts the same way, if you just use standard sort. Is there a way to sort in a natural order? Config option? Plugin? Expected output would be: string1 string2 string10 I don't know how to do this in Solr, sorry. To do it in Lucene without changing the terms, you could use a custom comparator that tokenizes the Strings in numbers vs. everything else and do the compare token-by-token, alternating between natural sort and numeric sort depending on the token type.
Re: Overriding Tika's field processing
If you change 'title' to be single-valued, the Extracting thing may or may not override it. I remember a go-round on this problem. But the ExtractingWhatsIt has code that explicitly checks for single-valued v.s. multi-valued. And this may all be different in different Solr versions. The DataImportHandler has Tika support in 3.x and trunk, and the DIH gives a lot more control about what field has what value. On Thu, Oct 28, 2010 at 8:53 AM, Tod listac...@gmail.com wrote: I'm reading my document data from a CMS and indexing it using calls to curl. The curl call includes 'stream.url' so Tika will also index the actual document pointed to by the CMS' stored url. This works fine. Presentation side I have a dropdown with the title of all the indexed documents such that when a user clicks one of them it opens in a new window. Using js, I've been parsing the json returned from Solr to create the dropdown. The problem is I can't get the titles sorted alphabetically. If I use a facet.sort on the title field I get back ALL the sorted titles in the facet block, but that doesn't include the associated URL's. A sorted query won't work because title is a multivalued field. The one option I can think of is to make the title single valued so that I have a one to one relationship to the returned url. To do that I'd need to be able to *not* index the Tika returned values. If I read right, my understanding was that I could use 'literal.title' in the curl call to limit what would be included in the index from Tika. That doesn't seem to be working as a test facet query returns more than I have in the CMS. Am I understanding the 'literal.title' processing correctly? Does anybody have experience/suggestions on how to handle this? Thanks - Tod -- Lance Norskog goks...@gmail.com
Re: RAM increase
Hi All, Thanks for your reply.I have a doubt whether to increase the ram or heap size to java or to tomcat where the solr is running Regards, satya
Re: Looking for Developers
On Fri, 2010-10-29 at 10:06 +0200, Mark Allan wrote: For me, I simply deleted the original email, but I'm now quite enjoying the irony of the complaints causing more noise on the list than the original email! ;-) He he. An old classic. Next in line is the meta-meta-discussion about whether meta-discussions belong on the list or if they should be moved to solr-user-meta. Repeat ad nauseam. Job-postings are on-topic IMHO and unless their volume grows significantly, I see no reason to create a new mail lists.
Re: Upgrading from Solr 1.2 to 1.4.1
Yes, from Solr 1.2 to 1.3/Lucene 2.4.1 to 2.9 there was a change in the Porter stemmer for English. I don't know what it was. It may also affect the other language variants of the stemmer. If stemming is important for your users, you might want to try the Solr 3.x branch instead, or find Lucid's KStem implementation for 1.4.1. 3.x has a lot of work on better stemmers for many languages. On Thu, Oct 28, 2010 at 2:23 PM, Robert Muir rcm...@gmail.com wrote: On Thu, Oct 28, 2010 at 4:44 PM, johnmu...@aol.com wrote: I'm using Solr 1.2. If I upgrade to 1.4.1, must I re-index because of LUCENE-1142? If so, how will this affect me if I don’t re-index (I'm using EnglishPorterFilterFactory)? What about when I’m using non-English stammers from Snowball? Beside the brief note IMPORTANT UPGRADE NOTE about this in CHANGES.txt, where can I read more about this? I looked in JIRA, LUCENE-1142, there isn't much. I haven't looked in detail regarding these changes, but the snowball was upgraded to revision 500 here. you can see the revisions/logs of the various algorithms here: http://svn.tartarus.org/snowball/trunk/snowball/algorithms/?pathrev=500 One problem being, i don't know the previous revision you were using...but since it had no Hungarian before LUCENE-1142, it couldnt have possibly been any *later* than revision 385: Revision 385 - Directory Listing Added Mon Sep 4 14:06:56 2006 UTC (4 years, 1 month ago) by martin New Hungarian stemmer This means, for example, that you would certainly be affected by changes in the english stemmer such as revision 414, among others: Revision 414 - Directory Listing Modified Mon Nov 20 10:49:29 2006 UTC (3 years, 11 months ago) by martin 'arsen' as exceptional p1 position, to prevent 'arsenic' and 'arsenal' conflating In my opinion, it would be best to re-index. -- Lance Norskog goks...@gmail.com
Re: No response from Solr on complex request after several days
There are a few problems that can happen. This is usually a sign of garbage collection problems. You can monitor the Tomcat instance with JConsole or one of the other java monitoring tools and see if there is a memory leak. Also, most people don't need to do it, but you can automatically restart it once a day. On Thu, Oct 28, 2010 at 2:20 AM, Xavier Schepler xavier.schep...@sciences-po.fr wrote: Hi, We are in a beta testing phase, with several users a day. After several days of waiting, the solr server didn't respond to requests that require a lot of processing time. I'm using Solr inside Tomcat. This is the request that had no response from the server : wt=jsonomitHeader=trueq=qiAndMSwFR%3A%28transport%29q.op=ANDstart=0rows=5fl=id,domainId,solrLangCode,ddiFileId,studyDescriptionId,studyYearAndDescriptionId,nesstarServerId,studyNesstarId,variableId,questionId,variableNesstarId,concept,studyTitle,studyQuestionCount,hasMultipleItems,variableName,hasQuestionnaire,questionnaireUrl,studyDescriptionUrl,universe,notes,preQuestionText,postQuestionText,interviewerInstructions,questionPosition,vlFR,qFR,iFR,mFR,vlEN,qEN,iEN,mEN,sort=score%20descfq=solrLangCode%3AFRfacet=truefacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DdomainIdfacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudyDecadefacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudySerieIdfacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudyYearAndDescriptionIdfacet.sort=countf.studyDecade.facet.sort=lexspellcheck=truespellcheck.count=10spellcheck.dictionary=qiAndMFRspellcheck.q=transporthl=onhl.fl=qSwFR,iHLSwFR,mHLSwFRhl.fragsize=0hl.snippets=1hl.usePhraseHighlighter=truehl.highlightMultiTerm=truehl.simple.pre=%3Cb%3Ehl.simple.post=%3C%2Fb%3Ehl.mergeContiguous=false It involves highlighting on a multivalued field with more than 600 short values inside. It takes 200 or 300 ms because of highlighting. After restarting tomcat all went fine again. I'm trying to understand why I had to restart tomcat and solr and what should I do to have it working 7/7 24/24. Xavier -- Lance Norskog goks...@gmail.com
Re: Sorting and filtering on fluctuating multi-currency price data?
ExternalFileField can only be used for boosting. It is not a first-class field. On Thu, Oct 28, 2010 at 11:07 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : Another approach would be to use ExternalFileField and keep the price data, : normalized to USD, outside of the index. Every time the currency rates : changed, we would calculate new normalized prices for every document in the : index. ...that is the approach i would normally suggest. : Still another approach would be to do the currency conversion at IndexReader : warmup time. We would index native price and currency code and create a : normalized currency field on the fly. This would be somewhat like : ExternalFileField in that it involved data from outside the index, but it : wouldn't need to be scoped to the parent SolrIndexReader, but could be : per-segment. Perhaps a custom poly-field could accomplish something like : this? ...that would essentially be what ExternalFileFiled should start doing, it just hasn't had anyone bite the bullet to implement it yet -- if you wnat to tackle that, then i would suggest/request/encourage you to look at doing it as a patch to ExternalFileField that could be contributed back and reused by all. With all of that said: there has also been a recent contribution of a MoneyFieldType for dealing precisesly with multicurrency sorting/filtering issues that you should definitley take a look at... https://issues.apache.org/jira/browse/SOLR-2202 -Hoss -- Lance Norskog goks...@gmail.com
Re: Looking for Developers
Then, Godwin! On Fri, Oct 29, 2010 at 3:04 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote: On Fri, 2010-10-29 at 10:06 +0200, Mark Allan wrote: For me, I simply deleted the original email, but I'm now quite enjoying the irony of the complaints causing more noise on the list than the original email! ;-) He he. An old classic. Next in line is the meta-meta-discussion about whether meta-discussions belong on the list or if they should be moved to solr-user-meta. Repeat ad nauseam. Job-postings are on-topic IMHO and unless their volume grows significantly, I see no reason to create a new mail lists. -- Lance Norskog goks...@gmail.com
Re: No response from Solr on complex request after several days
On 29/10/2010 12:08, Lance Norskog wrote: There are a few problems that can happen. This is usually a sign of garbage collection problems. You can monitor the Tomcat instance with JConsole or one of the other java monitoring tools and see if there is a memory leak. Also, most people don't need to do it, but you can automatically restart it once a day. On Thu, Oct 28, 2010 at 2:20 AM, Xavier Schepler xavier.schep...@sciences-po.fr wrote: Hi, We are in a beta testing phase, with several users a day. After several days of waiting, the solr server didn't respond to requests that require a lot of processing time. I'm using Solr inside Tomcat. This is the request that had no response from the server : wt=jsonomitHeader=trueq=qiAndMSwFR%3A%28transport%29q.op=ANDstart=0rows=5fl=id,domainId,solrLangCode,ddiFileId,studyDescriptionId,studyYearAndDescriptionId,nesstarServerId,studyNesstarId,variableId,questionId,variableNesstarId,concept,studyTitle,studyQuestionCount,hasMultipleItems,variableName,hasQuestionnaire,questionnaireUrl,studyDescriptionUrl,universe,notes,preQuestionText,postQuestionText,interviewerInstructions,questionPosition,vlFR,qFR,iFR,mFR,vlEN,qEN,iEN,mEN,sort=score%20descfq=solrLangCode%3AFRfacet=truefacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DdomainIdfacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudyDecadefacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudySerieIdfacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudyYearAndDescriptionIdfacet.sort=countf.studyDecade.facet.sort=lexspellcheck=truespellcheck.count=10spellcheck.dictionary=qiAndMFRspellcheck.q=transporthl=onhl.fl=qSwFR,iHLSwFR,mHLSwFRhl.fragsize=0hl.snippets=1hl.usePhraseHighlighter=truehl.highlightMultiTerm=truehl.simple.pre=%3Cb%3Ehl.simple.post=%3C%2Fb%3Ehl.mergeContiguous=false It involves highlighting on a multivalued field with more than 600 short values inside. It takes 200 or 300 ms because of highlighting. After restarting tomcat all went fine again. I'm trying to understand why I had to restart tomcat and solr and what should I do to have it working 7/7 24/24. Xavier Thanks for your response. Today, I've increased the Tomcat JVM heap size from 128-256 to 1024-2048. I will see if it helps.
Re: RAM increase
When you start the Tomcat app, you tell it how much memory to allocate to the JVM. I don't remember where, probably in catalina.sh. On Fri, Oct 29, 2010 at 2:56 AM, satya swaroop satya.yada...@gmail.com wrote: Hi All, Thanks for your reply.I have a doubt whether to increase the ram or heap size to java or to tomcat where the solr is running Regards, satya -- Lance Norskog goks...@gmail.com
Re: QueryElevation Component is so slow
I do not know if this is accurate. There are direct tools to monitor these problems: jconsole, visualgc/visualvm, YourKit, etc. Often these counts allot many things to one place that should be spread out. On Fri, Oct 29, 2010 at 12:27 AM, Chamnap Chhorn chamnapchh...@gmail.com wrote: anyone has some suggestions to improve the search? thanks On 10/28/10, Chamnap Chhorn chamnapchh...@gmail.com wrote: Sorry for very bad pasting. I paste it again. Slowest Components Count Exclusive Total QueryElevationComponent 1 506,858 ms 100% 506,858 ms 100% SolrIndexSearcher 1 2.0 ms 0% 2.0 ms 0% org.apache.solr.servlet.SolrDispatchFilter.doFilter() 1 1.0 ms 0% 506,862 ms 100% QueryComponent 1 1.0 ms 0% 1.0 ms 0% DebugComponent 1 0.0 ms 0% 0.0 ms 0% FacetComponent 1 0.0 ms 0% 0.0 ms 0% On Thu, Oct 28, 2010 at 4:57 PM, Chamnap Chhorn chamnapchh...@gmail.comwrote: Hi, I'm using solr 1.4 and using QueryElevation Component for guaranteed search position. I have around 700,000 documents with 1 Mb elevation file. It turns out it is quite slow on the newrelic monitoring website: Slowest Components Count Exclusive Total QueryElevationComponent 1 506,858 ms 100% 506,858 ms 100% SolrIndexSearcher 1 2.0 ms 0% 2.0 ms 0% org.apache.solr.servlet.SolrDispatchFilter.doFilter() 1 1.0 ms 0% 506,862 ms 100% QueryComponent 1 1.0 ms 0% 1.0 ms 0% DebugComponent 1 0.0 ms 0% 0.0 ms 0% FacetComponent 1 0.0 ms 0% 0.0 ms 0% As you could see, QueryElevationComponent takes quite a lot of time. Any suggestion how to improve this? -- Chhorn Chamnap http://chamnapchhorn.blogspot.com/ -- Chhorn Chamnap http://chamnapchhorn.blogspot.com/ -- Chhorn Chamnap http://chamnapchhorn.blogspot.com/ -- Lance Norskog goks...@gmail.com
Influencing scores on values in multiValue fields
Hi All We've got an index in which we have a multiValued field per document. Assume the multivalue field values in each document to be; Doc1: bar lifters Doc2: truck tires back drops bar lifters Doc 3: iron bar lifters Doc 4: brass bar lifters iron bar lifters tire something truck something oil gas Now when we search for 'bar lifters' the expectation (based on the requirements) is that we get results in the order of Doc1, Doc 2, Doc4 and Doc3. Doc 1 - since there's an exact match (and only one) for the search terms Doc 2 - since ther'e an exact match amongst the values Doc 4 - since there's a partial match on the values but the number of matches are more than Doc 3 Doc 3 - since there's a partial match However, the results come out as Doc1, Doc3, Doc2, Doc4. Looking at the explaination of the result it appears Doc 2 is loosing to Doc3 and Doc 4 is loosing to Doc3 based on length normalisation. We think we can see the reason for that - the field length in doc2 is greater than doc3 and doc 4 is greater doc3. However, is there any mechanism I can force doc2 to beat doc3 and doc4 to beat doc3 with this structure. We did look at using omitNorms=true, but that messes up the scores for all docs. The result comes out as Doc4, Doc1, Doc2, Doc3 (where Doc1, Doc2 and Doc3 gets the same score) This is because the fieldNorm is not taken into account anymore (as expected) and the termFrequence being the only contributing factor. So trying to avoid length normalisation through omitNorms is not helping. Is there anyway where we can influence an exact match of a value in a multiValue field to add on to the overall score whilst keeping the lenght normalisation? Hope that makes sense. Cheers -- Imran
Re: Exception while processing: attach document
Could any one shed a light please? I saw in the log a message as below, but I don't think it's the root cause, because my dataSrouce, the readOnly is true Caused by: java.sql.SQLException: READ_COMMITTED and SERIALIZABLE are the only valid transaction levels A newbie Solr user = On 10/29/2010 1:49 PM, Bac Hoang wrote: Hello all, I'm getting stuck when trying to import oracle DB to solr index, could any one of you give a hand. Thanks million. Below is some short info. that might be a question My Sorl: 1.4.1 *LOG * INFO: Starting Full Import Oct 29, 2010 1:19:35 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Oct 29, 2010 1:19:35 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity attach with URL: jdbc:oracle:thin:@192.168.72.7:1521:OFIRDS22 Oct 29, 2010 1:19:36 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument *SEVERE: Exception while processing: attach document *: SolrInputDocument[{}] org.apache.solr.handler.dataimport.DataImportHandlerException: *Unable to execute query: *select * from A.B Processing Document # 1 where A: a schema B: a table *dataSource *=== dataSource name=jdbc driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@192.168.72.7:1521:OFIRDS22 user=abc password=xyz readOnly=true autoCommit=false batchSize=1/ document entity dataSource=jdbc name=attach query=select * from A.B entity processor=SqlEntityProcessor dataField=attach.TOPIC format=text field column=text name=text / /entity /entity /document where TOPIC is a filed of table B Thanks again
RE: Influencing scores on values in multiValue fields
How about creating another field for doing exact matches (a string); searching both and boosting the string match? -Mike -Original Message- From: Imran [mailto:imranboho...@gmail.com] Sent: Friday, October 29, 2010 6:25 AM To: solr-user@lucene.apache.org Subject: Influencing scores on values in multiValue fields Hi All We've got an index in which we have a multiValued field per document. Assume the multivalue field values in each document to be; Doc1: bar lifters Doc2: truck tires back drops bar lifters Doc 3: iron bar lifters Doc 4: brass bar lifters iron bar lifters tire something truck something oil gas Now when we search for 'bar lifters' the expectation (based on the requirements) is that we get results in the order of Doc1, Doc 2, Doc4 and Doc3. Doc 1 - since there's an exact match (and only one) for the search terms Doc 2 - since ther'e an exact match amongst the values Doc 4 - since there's a partial match on the values but the number of matches are more than Doc 3 Doc 3 - since there's a partial match However, the results come out as Doc1, Doc3, Doc2, Doc4. Looking at the explaination of the result it appears Doc 2 is loosing to Doc3 and Doc 4 is loosing to Doc3 based on length normalisation. We think we can see the reason for that - the field length in doc2 is greater than doc3 and doc 4 is greater doc3. However, is there any mechanism I can force doc2 to beat doc3 and doc4 to beat doc3 with this structure. We did look at using omitNorms=true, but that messes up the scores for all docs. The result comes out as Doc4, Doc1, Doc2, Doc3 (where Doc1, Doc2 and Doc3 gets the same score) This is because the fieldNorm is not taken into account anymore (as expected) and the termFrequence being the only contributing factor. So trying to avoid length normalisation through omitNorms is not helping. Is there anyway where we can influence an exact match of a value in a multiValue field to add on to the overall score whilst keeping the lenght normalisation? Hope that makes sense. Cheers -- Imran
Re: Reverse range query
I modified the text of this hopefully to make it clearer. I wasn't sure what I was asking was coming across well. And I'm adding this comment in a shameless attempt to boost my question back to the top for people to see. Before I write a messy work around, just wanted to check the community to see if this was already handled, it seems like a useful, common, data type. -- View this message in context: http://lucene.472066.n3.nabble.com/Reverse-range-query-tp1789135p1792126.html Sent from the Solr - User mailing list archive at Nabble.com.
eDismax result differs from Dismax
We are launching a new version of our job board helping returning veterans find a civilian job, and we chose Solr and Sunspot[1] to power our search. We really didn't consider the power users in the HR world who are trained to use boolean search, for example: Engineer AND (Electrical OR Mechanical) Sunspot supports the Dismax request handler, which unfortunately does not handle the query above properly. So we read about eDismax and that it was baked into Solr 1.5. At the same time, Sunspot has switched from LocalSolr integration to storing a geohash in a full-text searchable field. We're having some problems with some complex queries that Sunspot generates: INFO: [] webapp=/solr path=/select params={fl=+scorestart=0q=query:{!dismax+qf%3D'title_text+description_text'}Ruby+on+Rails+Developer+(location_details_s:dngythdb25fu^1.0+OR+location_details_s:dngythdb25f^0.0625+OR+location_details_s:dngythdb25*^0.00391+OR+location_details_s:dngythdb2*^0.000244+OR+location_details_s:dngythdb*^0.153+OR+location_details_s:dngythd*^0.00954+OR+location_details_s:dngyth*^0.000596+OR+location_details_s:dngyt*^0.373+OR+location_details_s:dngy*^0.0233+OR+location_details_s:dng*^0.00146)wt=rubyfq=type:JobdefType=edismaxrows=20} hits=1 status=0 QTime=13 Under Dismax no results are returned for this query, however, as you can see above with eDismax a result is returned -- the only difference between the two queries are 'defType=edismax' vs 'defType=dismax' Debug Output Solr 1.5 eDismax: https://gist.github.com/32f3a52064ec300fdca0 Debug Output Solr 1.5 Dismax: https://gist.github.com/d82b82a026878ecce36b My question is if you have any ideas why the query above returns a record that doesn't match, in eDismax? We are at a crossroads where we have to decide if we want to forge ahead with Sunspot 1.2rc4 and Solr 1.5, or we may fall back to Sunspot 1.1 and Solr 1.4 until Solr 3.1/4.0 come out, hopefully with eDismax support and better location search support. I plan to do a blog posting on this issue when we figure it out, I'll give you props if you can help us out :) Best regards, Ryan Walker Chief Experience Officer http://www.recruitmilitary.com 513.677.7078 Best regards, [1] http://outoftime.github.com/sunspot/
Re: eDismax result differs from Dismax
On Fri, Oct 29, 2010 at 9:30 AM, Ryan Walker r...@recruitmilitary.com wrote: We are launching a new version of our job board helping returning veterans find a civilian job, and we chose Solr and Sunspot[1] to power our search. We really didn't consider the power users in the HR world who are trained to use boolean search, for example: Engineer AND (Electrical OR Mechanical) Sunspot supports the Dismax request handler, which unfortunately does not handle the query above properly. So we read about eDismax and that it was baked into Solr 1.5. At the same time, Sunspot has switched from LocalSolr integration to storing a geohash in a full-text searchable field. We're having some problems with some complex queries that Sunspot generates: INFO: [] webapp=/solr path=/select params={fl=+scorestart=0q=query:{!dismax+qf%3D'title_text+description_text'}Ruby+on+Rails+Developer+(location_details_s:dngythdb25fu^1.0+OR+location_details_s:dngythdb25f^0.0625+OR+location_details_s:dngythdb25*^0.00391+OR+location_details_s:dngythdb2*^0.000244+OR+location_details_s:dngythdb*^0.153+OR+location_details_s:dngythd*^0.00954+OR+location_details_s:dngyth*^0.000596+OR+location_details_s:dngyt*^0.373+OR+location_details_s:dngy*^0.0233+OR+location_details_s:dng*^0.00146)wt=rubyfq=type:JobdefType=edismaxrows=20} hits=1 status=0 QTime=13 Under Dismax no results are returned for this query, however, as you can see above with eDismax a result is returned -- the only difference between the two queries are 'defType=edismax' vs 'defType=dismax' That's to be expected. Dismax doesn't even support fielded queries (where you specify the fieldname in the query itself) so this clause is treated all as text: (location_details_s:dngythdb25fu^1.0 and dismax QP will be looking for tokens like location_details_s dngythdb25fu (assuming tokenization would split on the non-alphanumeric chars) in your text fields. -Yonik http://www.lucidimagination.com
Re: Maximum of length of a Dismax Query?
Solved this issue, by setting the maxHttpHeaderSize to 65536 in tomcat/conf/server.xml file. Otherwise Tomcat was not responding. Swapnonil Mukherjee On 29-Oct-2010, at 2:43 PM, Swapnonil Mukherjee wrote: I am using the SOLRJ client to post my query, The query length is roughly 10,000 characters. I am using GET like this. int page = 1; int resultsPerPage = 24; ModifiableSolrParams params = new ModifiableSolrParams(); params.set(q, query); params.set(start, + (page - 1) * resultsPerPage); params.set(rows, resultsPerPage); try { QueryResponse response = QueryServerManager.getSolrServer().query(params, SolrRequest.METHOD.GET); assertNotNull(response); } catch (SolrServerException e) { e.printStackTrace(); } This hits the exception block with the following exception org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:472) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:122) at com.getty.search.tests.DismaxQueryTestCase.testAssetQuery(DismaxQueryTestCase.java:34) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at junit.framework.TestCase.runTest(TestCase.java:154) at junit.framework.TestCase.runBare(TestCase.java:127) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:118) at junit.textui.TestRunner.doRun(TestRunner.java:116) at com.intellij.junit3.JUnit3IdeaTestRunner.doRun(JUnit3IdeaTestRunner.java:108) at junit.textui.TestRunner.doRun(TestRunner.java:109) at com.intellij.junit3.JUnit3IdeaTestRunner.startRunnerWithArgs(JUnit3IdeaTestRunner.java:42) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:192) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:64) Swapnonil Mukherjee On 29-Oct-2010, at 12:44 PM, Swapnonil Mukherjee wrote: Hi Everybody, It seems that the maximum query length supported by the Dismax Query Handler is 3534 characters. Is there anyway I can set this limit to be around 12,000? If I fire a query beyond 3534 characters, I don't even get error messages in the catalina.XXX log files. Swapnonil Mukherjee +91-40092712 +91-9007131999
Re: QueryElevation Component is so slow
Thanks for reply. I'm looking for how to improve the speed of the search query. The QueryElevation Component is taking too much time which is unacceptable. The size of elevation file is only 1 Mb. I wonder other people using this component without problems (related to speed)? Am I using it the wrong way or there is a limit when using this component? On 10/29/10, Lance Norskog goks...@gmail.com wrote: I do not know if this is accurate. There are direct tools to monitor these problems: jconsole, visualgc/visualvm, YourKit, etc. Often these counts allot many things to one place that should be spread out. On Fri, Oct 29, 2010 at 12:27 AM, Chamnap Chhorn chamnapchh...@gmail.com wrote: anyone has some suggestions to improve the search? thanks On 10/28/10, Chamnap Chhorn chamnapchh...@gmail.com wrote: Sorry for very bad pasting. I paste it again. Slowest Components Count Exclusive Total QueryElevationComponent 1 506,858 ms 100% 506,858 ms 100% SolrIndexSearcher 1 2.0 ms 0% 2.0 ms 0% org.apache.solr.servlet.SolrDispatchFilter.doFilter() 1 1.0 ms 0% 506,862 ms 100% QueryComponent 1 1.0 ms 0% 1.0 ms 0% DebugComponent 1 0.0 ms 0% 0.0 ms 0% FacetComponent 1 0.0 ms 0% 0.0 ms 0% On Thu, Oct 28, 2010 at 4:57 PM, Chamnap Chhorn chamnapchh...@gmail.comwrote: Hi, I'm using solr 1.4 and using QueryElevation Component for guaranteed search position. I have around 700,000 documents with 1 Mb elevation file. It turns out it is quite slow on the newrelic monitoring website: Slowest Components Count Exclusive Total QueryElevationComponent 1 506,858 ms 100% 506,858 ms 100% SolrIndexSearcher 1 2.0 ms 0% 2.0 ms 0% org.apache.solr.servlet.SolrDispatchFilter.doFilter() 1 1.0 ms 0% 506,862 ms 100% QueryComponent 1 1.0 ms 0% 1.0 ms 0% DebugComponent 1 0.0 ms 0% 0.0 ms 0% FacetComponent 1 0.0 ms 0% 0.0 ms 0% As you could see, QueryElevationComponent takes quite a lot of time. Any suggestion how to improve this? -- Chhorn Chamnap http://chamnapchhorn.blogspot.com/ -- Chhorn Chamnap http://chamnapchhorn.blogspot.com/ -- Chhorn Chamnap http://chamnapchhorn.blogspot.com/ -- Lance Norskog goks...@gmail.com -- Chhorn Chamnap http://chamnapchhorn.blogspot.com/
RE: Natural string sorting
Well, you could do a magnitude notation approach. Depends on how complex the strings are, but based on your examples, this would work: 1) Identify a series of integers in the string. (This assumes lengths are no more than 9 for each series). 2) Insert the number of integers into the string before the integer series itself So - for sorting - you would have: string1 -- string11 string10 -- string210 string2 -- string12 which will then sort as string11, string12, string210, but use the original strings as the displays you want. Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Savvas-Andreas Moysidis [mailto:savvas.andreas.moysi...@googlemail.com] Sent: Friday, October 29, 2010 4:33 AM To: solr-user@lucene.apache.org Subject: Re: Natural string sorting I think string10 is before string2 in lexicographic order? On 29 October 2010 09:18, RL rl.subscri...@gmail.com wrote: Just a quick question about natural sorting of strings. I've a simple dynamic field in my schema: fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ field name=nameSort_en type=string indexed=true stored=false omitNorms=true/ There are 3 indexed strings for example string1,string2,string10 Executing a query and sorting by this field leads to unnatural sorting of : string1 string10 string2 (Some time ago i used Lucene and i was pretty sure that Lucene used a natural sort, thus i expected the same from solr) Is there a way to sort in a natural order? Config option? Plugin? Expected output would be: string1 string2 string10 Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Natural-string-sorting- tp1791227p1791227.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: spellchecker results not as desired
You should be building your index on a field that creates tokens on whitespace. So your dictionary would have iphone and case as separate terms instead of iphone case as one term. And if you query on something like iphole case, it will give suggestions for iphole but not for case because the later is in the dictionary. (The spellchecker will always assume a term is correctly spelled if it is in the Dictionary). If you set collate=true, in addition to getting word-by-word suggestions, it will return a re-written query (aka a collation). SOLR 1.4 wil always use the top suggestions for each word to form the collation. In this example, the collation would be iphone case. You can then requery SOLR with the collation and hope to get better hits. While 1.4 doesn't check to see if the collation is going to return any hits, an enhancement to 3.x and 4.0 allows you to guarantee that collations will always give you hits if you requery them. As for your second question, likely ipj is close enough to ipad to warrant a suggestion but the others are not considered close enough. You can tweak this by setting spellcheck.accuracy. However, I do not believe this option is available in 1.4. The wiki indicates it is 3.x/4.0 only. For more information, look at the SpellCheckComponent page on the wiki. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: abhayd [mailto:ajdabhol...@hotmail.com] Sent: Thursday, October 28, 2010 4:34 PM To: solr-user@lucene.apache.org Subject: spellchecker results not as desired hi I added spellchecker to request handler. Spellchecker is indexed based. Terms in index are like iphone iphone 4 iphone case phone gophoe when i set q=iphole i get suggestions like iphone phone gophone ipad Not sure how would i get iphone, iphone 4, iphone case, phone. Any thoughts? At the same time when i type ipj i get result as ipad, why not iphone, iphone 4 , ipad -- View this message in context: http://lucene.472066.n3.nabble.com/spellchecker-results-not-as-desired-tp1789192p1789192.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: RAM increase
Hello Lance, form the command line run: export JAVA_OPTS='-d64 -Xms128m -Xmx5g' eventually changing values of Xms and Xmx. Hope this helps. Tommaso 2010/10/29 Lance Norskog goks...@gmail.com When you start the Tomcat app, you tell it how much memory to allocate to the JVM. I don't remember where, probably in catalina.sh. On Fri, Oct 29, 2010 at 2:56 AM, satya swaroop satya.yada...@gmail.com wrote: Hi All, Thanks for your reply.I have a doubt whether to increase the ram or heap size to java or to tomcat where the solr is running Regards, satya -- Lance Norskog goks...@gmail.com
Something for the weekend - Lily 0.2 is OUT ! :)
Dear all, three months after the highly anticipated proof of architecture release, we're living up to our promises, and are releasing Lily 'CR' 0.2 today - a fully-distributed, highly scalable and highly available content repository, marrying best-of-breed database and search technology into a powerful, productive and easy-to-use solution for contemporary internet-scale content applications. For whom You're building content applications (content management, archiving, asset management, DMS, WCMS, portals, ...) that scale well, either as a product, a project or in the cloud. You need a trustworthy underlying content repository that provides a flexible and easy-to-use content model you can adapt to your requirements. You have a keen interest in NoSQL/HBase technology but needs a higher-level API, and scalable indexing and search as well. Foundations Lily builds further upon Apache HBase and Apache SOLR. HBase is a faithful implementation of the Google BigTable database, and provides infinite elastic scaling and high-performance access to huge amounts of data. SOLR is the server version of Lucene, the industry-standard search library. Lily joins HBase and SOLR in a single, solidly packaged content repository product, with automated sharding (making use of multiple hardware nodes to provide scaling of volume and performance) and automatic index maintenance. Lily adds a sophisticated, yet flexible and surprisingly practical content schema on top of this, providing the structuredness of more classic databases, versioning, secondary indexing, queuing: all the stuff developers care for when fixing real-world problems. Key features of this release - Fully distributed: Lily has a fully-distributed architecture making maximum use of all available hardware for scalability and availability. ZooKeeper is used for distributed process coordination, configuration and locking. Index maintenance is based on an HBase-backed RowLog mechanism allowing fast but reliable updating of SOLR indexes. - Index maintenance: Lily offers all the features and functionality of SOLR, but makes index maintenance a breeze, both for interactive as-you-go updating and MapReduce-based full index rebuilds - Multi-indexers: for high-load situations, multiple indexers can work in parallel and talk to a sharded SOLR setup - REST interface: a flexible and platform-neutral access method for all Lily operations using HTTP and JSON - Improved content model: we added URI as a base Lily type as a (small) indication of our interest in semantic technology More importantly, we commit ourselves to take care of API compatibility and data format layout from this release onwards - as much as humanly possible. Lily 0.2 offers the API we want to support in the final release. Lily 0.2 is our contract for content application developers, upgrading to Lily final should require them to do as little code or data changes as possible. From where Download Lily from www.lilyproject.org. It's Apache Licensed Open Source. No strings attached. Enterprise support Together with this release, we're rolling out our commercial support services http://outerthought.org/site/services/lily.html (and signed up a first customer, yay!) that allows you to use Lily with peace of mind. Also, this release has been fully tested and depends on the latest Cloudera Distribution for Hadoop http://www.cloudera.com/hadoop/ (CDH3 beta3). Next up Lily 1.0 is planned for March 2011, with an interim release candidate in January. We'll be working on performance enhancements, feature additions, and are happily - eagerly - awaiting your feedback and comments. We'll post a roadmap for Lily 0.3 and onwards by mid November. Follow us If you want to keep track of Lily's on-going development, join the Lily discussion list or follow our company Twitter @outerthoughthttp://twitter.com/#%21/outerthought . Thank you I'd like to thank Bruno and Evert for their hard work so far, the HBase and SOLR community for their help, the IWT government fund for their partial financial support, and all of our early Lily adopters and enthusiasts for their much valued feedback. You guys rock! Steven. -- Steven Noels http://outerthought.org/ Open Source Content Applications Makers of Kauri, Daisy CMS and Lily
Re: Exception while processing: attach document
I think this is a JDBC warning message since some isolation levels may not be implemented in the actual (Oracle) implementation (e.g.: READ_UNCOMMITTED). May your issue be related to some transactions updating/inserting/deleting records on your Oracle DB while trying to run DIH? Regards, Tommaso 2010/10/29 Bac Hoang bac.ho...@axonactive.vn Could any one shed a light please? I saw in the log a message as below, but I don't think it's the root cause, because my dataSrouce, the readOnly is true Caused by: java.sql.SQLException: READ_COMMITTED and SERIALIZABLE are the only valid transaction levels A newbie Solr user = On 10/29/2010 1:49 PM, Bac Hoang wrote: Hello all, I'm getting stuck when trying to import oracle DB to solr index, could any one of you give a hand. Thanks million. Below is some short info. that might be a question My Sorl: 1.4.1 *LOG * INFO: Starting Full Import Oct 29, 2010 1:19:35 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Oct 29, 2010 1:19:35 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity attach with URL: jdbc:oracle:thin:@192.168.72.7:1521:OFIRDS22 Oct 29, 2010 1:19:36 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument *SEVERE: Exception while processing: attach document *: SolrInputDocument[{}] org.apache.solr.handler.dataimport.DataImportHandlerException: *Unable to execute query: *select * from A.B Processing Document # 1 where A: a schema B: a table *dataSource *=== dataSource name=jdbc driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@192.168.72.7:1521:OFIRDS22 user=abc password=xyz readOnly=true autoCommit=false batchSize=1/ document entity dataSource=jdbc name=attach query=select * from A.B entity processor=SqlEntityProcessor dataField=attach.TOPIC format=text field column=text name=text / /entity /entity /document where TOPIC is a filed of table B Thanks again
Re: Multiple indexes inside a single core
Here's the Jira issue for the distributed search issue. https://issues.apache.org/jira/browse/SOLR-1632 I tried applying this patch but, get the same error that is posted in the discussion section for that issue. I will be glad to help too on this one. On Sat, Oct 23, 2010 at 2:35 PM, Erick Erickson erickerick...@gmail.comwrote: Ah, I should have read more carefully... I remember this being discussed on the dev list, and I thought there might be a Jira attached but I sure can't find it. If you're willing to work on it, you might hop over to the solr dev list and start a discussion, maybe ask for a place to start. I'm sure some of the devs have thought about this... If nobody on the dev list says There's already a JIRA on it, then you should open one. The Jira issues are generally preferred when you start getting into design because the comments are preserved for the next person who tries the idea or makes changes, etc Best Erick On Wed, Oct 20, 2010 at 9:52 PM, Ben Boggess ben.bogg...@gmail.com wrote: Thanks Erick. The problem with multiple cores is that the documents are scored independently in each core. I would like to be able to search across both cores and have the scores 'normalized' in a way that's similar to what Lucene's MultiSearcher would do. As far a I understand, multiple cores would likely result in seriously skewed scores in my case since the documents are not distributed evenly or randomly. I could have one core/index with 20 million docs and another with 200. I've poked around in the code and this feature doesn't seem to exist. I would be happy with finding a decent place to try to add it. I'm not sure if there is a clean place for it. Ben On Oct 20, 2010, at 8:36 PM, Erick Erickson erickerick...@gmail.com wrote: It seems to me that multiple cores are along the lines you need, a single instance of Solr that can search across multiple sub-indexes that do not necessarily share schemas, and are independently maintainable.. This might be a good place to start: http://wiki.apache.org/solr/CoreAdmin HTH Erick On Wed, Oct 20, 2010 at 3:23 PM, ben boggess ben.bogg...@gmail.com wrote: We are trying to convert a Lucene-based search solution to a Solr/Lucene-based solution. The problem we have is that we currently have our data split into many indexes and Solr expects things to be in a single index unless you're sharding. In addition to this, our indexes wouldn't work well using the distributed search functionality in Solr because the documents are not evenly or randomly distributed. We are currently using Lucene's MultiSearcher to search over subsets of these indexes. I know this has been brought up a number of times in previous posts and the typical response is that the best thing to do is to convert everything into a single index. One of the major reasons for having the indexes split up the way we do is because different types of data need to be indexed at different intervals. You may need one index to be updated every 20 minutes and another is only updated every week. If we move to a single index, then we will constantly be warming and replacing searchers for the entire dataset, and will essentially render the searcher caches useless. If we were able to have multiple indexes, they would each have a searcher and updates would be isolated to a subset of the data. The other problem is that we will likely need to shard this large single index and there isn't a clean way to shard randomly and evenly across the of the data. We would, however like to shard a single data type. If we could use multiple indexes, we would likely be also sharding a small sub-set of them. Thanks in advance, Ben
Re: Stored or indexed?
Hi Ron, In a nutshell - an indexed field is searchable, and a stored field has its content stored in the index so it is retrievable. Here are some examples that will hopefully give you a feel for how to set the indexed and stored options: indexed=true stored=true Use this for information you want to search on and also display in search results - for example, book title or author. indexed=false stored=true Use this for fields that you want displayed with search results but that don't need to be searchable - for example, destination URL, file system path, time stamp, or icon image. indexed=true stored=false Use this for fields you want to search on but don't need to get their values in search results. Here are some of the common reasons you would want this: Large fields and a database: Storing a field makes your index larger, so set stored to false when possible, especially for big fields. For this case a database is often used, as the previous responder said. Use a separate identifier field to get the field's content from the database. Ordering results: Say you define field name=bookName type=text indexed=true stored=true that is tokenized and used for searching. If you want to sort results based on book name, you could copy the field into a separate nonretrievable, nontokenized field that can be used just for sorting - field name=bookSort type=string indexed=true stored=false copyField source=bookName dest=bookSort Easier searching: If you define the field field name=text type=text indexed=true stored=false multiValued=true/ you can use it as a catch-all field that contains all of the other text fields. Since solr looks in a default field when given a text query without field names, you can support this type of general phrase query by making the catch-all the default field. indexed=false stored=false Use this when you want to ignore fields. For example, the following will ignore unknown fields that don't match a defined field rather than throwing an error by default. fieldtype name=ignored stored=false indexed=false dynamicField name=* type=ignored Elizabeth Murnane emurn...@architexa.com Architexa Lead Developer - www.architexa.com Understand Document Code In Seconds --- On Thu, 10/28/10, Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com wrote: From: Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com Subject: Re: Stored or indexed? To: solr-user@lucene.apache.org Date: Thursday, October 28, 2010, 4:25 AM In our case, we just store a database id and do a secondary db query when displaying the results. This is handy and leads to a more centralised architecture when you need to display properties of a domain object which you don't index/search. On 28 October 2010 05:02, kenf_nc ken.fos...@realestate.com wrote: Interesting wiki link, I hadn't seen that table before. And to answer your specific question about indexed=true, stored=false, this is most often done when you are using analyzers/tokenizers on your field. This field is for search only, you would never retrieve it's contents for display. It may in fact be an amalgam of several fields into one 'content' field. You have your display copy stored in another field marked indexed=false, stored=true and optionally compressed. I also have simple string fields set to lowercase so searching is case-insensitive, and have a duplicate field where the string is normal case. the first one is indexed/not stored, the second is stored/not indexed. -- View this message in context: http://lucene.472066.n3.nabble.com/Stored-or-indexed-tp1782805p1784315.html Sent from the Solr - User mailing list archive at Nabble.com.
How can I disable fsync()?
Thanks to all and I made Solr work very well on one newer machine. Now I am setting up Solr on an older server with an IDE hard drive. Unfortunately, populating the index takes FOREVER due to Solr/Lucene/Tomcat calling fsync() a lot after every write. I would like to know how to disable fsync. I am very aware of the risks of not having fsync() and I DO NOT CARE ABOUT THEM AND DO NOT WANT TO BE REMINDED. I just want to know how can I disable fsync() when adding to Solr index. Thanks, guys! Igor
Re: documentCache clarification
On Thu, Oct 28, 2010 at 7:27 PM, Chris Hostetter hossman_luc...@fucit.org wrote: The queryResultCache is keyed on Query,Sort,Start,Rows,Filters and the value is a DocList object ... http://lucene.apache.org/solr/api/org/apache/solr/search/DocList.html Unlike the Document objects in the documentCache, the DocLists in the queryResultCache never get modified (techincally Solr doesn't actually modify the Documents either, the Document just keeps track of it's fields and updates itself as Lazy Load fields are needed) if a DocList containing results 0-10 is put in the cache, it's not going to be of any use for a query with start=50. but if it contains 0-50 it *can* be used if start 50 and rows 50 -- that's where the queryResultWindowSize comes in. if you use start=0rows=10, but your window size is 50, SolrIndexSearcher will (under the covers) use start=0rows=50 and put that in the cache, returning a slice from 0-10 for your query. the next query asking for 10-20 will be a cache hit. This makes sense but still doesn't explain what I'm seeing in my cache stats. When I issue a request with rows=10 the stats show an insert into the queryResultCache. If I send the same query, this time with rows=1000, I would not expect to see a cache hit but I do. So it seems like there must be something useful in whatever gets cached on the first request for rows=10 for it to be re-used by the request for rows=1000. --jay
Custom Sorting in Solr
Hi all guys! I'm in a weird situation here. We have index a set of documents which are ordered using a linked list (each documents has the reference of the previous and the next). Is there a way when sorting in the solr search, Use the linked list to sort? If that is not possible, how can i use the DIH to access a Service in WCF or a Webservice? Should i develop my own DIH? -- __ Ezequiel. Http://www.ironicnet.com
RE: Custom Sorting in Solr
There's no way I know of to make Solr use that kind of data to create the sort order you want. Generally for 'custom' sorts, you want to create a field in your Solr index with possibly artificially constructed values that will 'naturally' sort the way you want. How to do that with a linked list seems kind of tricky, before you index you may have to write code to analyze your whole graph order and then just supply sort order keys. And then if you sometimes update just a few documents, but not your whole thing.. Geez, i'm not really sure. It's kind of a tricky problem. That kind of data is not really the expected use case for Solr sorting. Sorry, I'm not sure what this means or how it would help: use the DIH to access a Service in WCF or a Webservice? Maybe someone else will know exactly what you mean. Or maybe if you rephrase with more specificity as to how you think this will help you solve your problem, it will be more clear. Recall that you don't need to use DIH to index at all, it's just one of several methods, it simplifies things for common patterns, it's possible you fall out of the common pattern nad it would be simpler not to use DIH. Although even without DIH, I can't think of a particularly simple way to solve your problem. Just curious, but is your _entire_ corpus, your entire document set, part of a _single_ linked list? Or do you have several different linked lists in there? If several, what do you want to happen with sort if two documents in the result set aren't even part of the same linked list? This kind of thing is one reason translating the sort of data you have to a solr sort order starts to seem kind of confusing to me. From: Ezequiel Calderara [ezech...@gmail.com] Sent: Friday, October 29, 2010 3:39 PM To: Solr Mailing List Subject: Custom Sorting in Solr Hi all guys! I'm in a weird situation here. We have index a set of documents which are ordered using a linked list (each documents has the reference of the previous and the next). Is there a way when sorting in the solr search, Use the linked list to sort? If that is not possible, how can i use the DIH to access a Service in WCF or a Webservice? Should i develop my own DIH? -- __ Ezequiel. Http://www.ironicnet.com
Re: documentCache clarification
: This is a limitation in the SolrCache API. : The key into the cache does not contain rows, so the cache returns the : first 10 docs and increments it's hit count. Then the cache user : (SolrIndexSearcher) looks at the entry and determines it can't use it. Wow, I never realized that. Why don't we just include the start rows (modulo the window size) in the cache key? -Hoss
Re: Custom Sorting in Solr
On Fri, Oct 29, 2010 at 3:39 PM, Ezequiel Calderara ezech...@gmail.com wrote: Hi all guys! I'm in a weird situation here. We have index a set of documents which are ordered using a linked list (each documents has the reference of the previous and the next). Is there a way when sorting in the solr search, Use the linked list to sort? It seems like you should be able to encode this linked list as an integer instead, and sort by that? If there are multiple linked lists in the index, it seems like you could even use the high bits of the int to designate which list the doc belongs to, and the low order bits as the order in that list. -Yonik http://www.lucidimagination.com
Re: documentCache clarification
On Fri, Oct 29, 2010 at 3:49 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : This is a limitation in the SolrCache API. : The key into the cache does not contain rows, so the cache returns the : first 10 docs and increments it's hit count. Then the cache user : (SolrIndexSearcher) looks at the entry and determines it can't use it. Wow, I never realized that. Why don't we just include the start rows (modulo the window size) in the cache key? The implementation of equals() would be rather difficult... actually impossible w/o abusing the semantics. It would also be impossible w/o the Map implementation guaranteeing what object was on the LHS vs the RHS when equals was called. Unless I'm missing something obvious? -Yonik http://www.lucidimagination.com
Re: documentCache clarification
: Why don't we just include the start rows (modulo the window size) in : the cache key? : : The implementation of equals() would be rather difficult... actually : impossible w/o abusing the semantics. : It would also be impossible w/o the Map implementation guaranteeing : what object was on the LHS vs the RHS when equals was called. : : Unless I'm missing something obvious? You've totally confused me. What i'm saying is that SolrIndexSearcher should consult the window size before consulting the cache -- the start param should be rounded down to the nearest multiple of hte window size, and start+rows (ie: end) should be rounded up to one less then the nearest multiple of the windows size, and then that should be looked up in the cache. equality on the cache key is straight forward... this.q==that.q this.start==that.start this.end==that.end this.sort == that.sort this.filters == that.filters so if the window size is 50 and SOlrIndexSearcher gets a request like q=xstart=33rows=10sort=yfq=... it should generate a cache key where start=0 and end=49. (if start=33rows=42, then the key would contain start=0 and end=99 ... which could result in some overlap, but that's why people are suppose to pick a window size greater then the largest number of rows typically requested) -Hoss
Re: documentCache clarification
On Fri, Oct 29, 2010 at 4:21 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Why don't we just include the start rows (modulo the window size) in : the cache key? : : The implementation of equals() would be rather difficult... actually : impossible w/o abusing the semantics. : It would also be impossible w/o the Map implementation guaranteeing : what object was on the LHS vs the RHS when equals was called. : : Unless I'm missing something obvious? You've totally confused me. What i'm saying is that SolrIndexSearcher should consult the window size before consulting the cache -- the start param should be rounded down to the nearest multiple of hte window size, and start+rows (ie: end) should be rounded up to one less then the nearest multiple of the windows size, and then that should be looked up in the cache. That's already done. In example, do q=*:*rows=12 q=*:*rows=16 and you should see a queryResultCache hit since queryResultWindowSize is 20 and both requests round up to that. *but* if you do this (with an index with more than 20 docs in it) q=*:*rows=25 Currently that query will round up to 40, but since nResults (start+row) isn't in the key, it will still get a cache hit but then not be usable. Now, if your proposal is to put nResults into the key, we then have a worse problem. Assume we're starting over with a clean cache. q=*:*rows=25 // cached under a key including nResults=40 q=*:*rows=15 // looked up under a key including nResults=20... not found! but that's why people are suppose to pick a window size greater then the largest number of rows typically requested) Hmmm, I don't think so. If that were the case, there would be no need for two parameters (no need for queryResultWindowSize) since we would always just pick queryResultMaxDocsCached. -Yonik http://www.lucidimagination.com
SolrCore.getSearcher() and postCommit()
Is it OK to call and increment a Searcher ref (i.e. SolrCore.getSearcher()) in a SolrEventListener.postCommit() hook as long as I decrement it when I am done? I need to get a handle on an IndexReader so I can dump out a portion of the index to an external process. Thanks, Grant
Re: How can I disable fsync()?
On Oct 29, 2010, at 2:11 PM, Igor Chudov wrote: Thanks to all and I made Solr work very well on one newer machine. Now I am setting up Solr on an older server with an IDE hard drive. Unfortunately, populating the index takes FOREVER due to Solr/Lucene/Tomcat calling fsync() a lot after every write. I would like to know how to disable fsync. I am very aware of the risks of not having fsync() and I DO NOT CARE ABOUT THEM AND DO NOT WANT TO BE REMINDED. I just want to know how can I disable fsync() when adding to Solr index. Have a look at FSDirectory.fsync(). That's at least a starting point. YMMV.
Re: SolrCore.getSearcher() and postCommit()
On Fri, Oct 29, 2010 at 5:36 PM, Grant Ingersoll gsing...@apache.org wrote: Is it OK to call and increment a Searcher ref (i.e. SolrCore.getSearcher()) in a SolrEventListener.postCommit() hook as long as I decrement it when I am done? I need to get a handle on an IndexReader so I can dump out a portion of the index to an external process. Yes, just be aware that the searcher you will get will not contain the recently committed documents. If you want that, look at the newSearcher hook instead. -Yonik http://www.lucidimagination.com
Re: NOT keyword - doesn't work with dismax?
I couldn't even get the bq= to work with negated queries, although with edismax, negated queries work with just q=-term Works: /solr/select?qt=edismaxq=-red Here is the failed attempt with dismax /solr/select?qt=dismaxrows=1indent=trueq=-redbq=*:*^0.001echoParams=alldebugQuery=true { responseHeader:{ status:0, QTime:20, params:{ mm:2-1 5-2 690%, pf:title^10.0 sbody^2.0, echoParams:all, tie:0.01, qf:title^10.0 sbody^2.0 tags^1.0 text^1.0, q.alt:*:*, hl.fl:body, wt:json, ps:100, defType:dismax, bq:*:*^0.001, echoParams:all, debugQuery:true, indent:true, q:-red, qt:dismax, rows:1}}, response:{numFound:0,start:0,docs:[] }, debug:{ rawquerystring:-red, querystring:-red, parsedquery:+(-DisjunctionMaxQuery((tags:red | text:red | title:red^10.0 | sbody:red^2.0)~0.01)) DisjunctionMaxQuery((title:red^10.0 | sbody:red^2.0)~0.01) MatchAllDocsQuery(*:*^0.0010), parsedquery_toString:+(-(tags:red | text:red | title:red^10.0 | sbody:red^2.0)~0.01) (title:red^10.0 | sbody:red^2.0)~0.01 *:*^0.0010, explain:{}, QParser:DisMaxQParser, altquerystring:null, boost_queries:[*:*^0.001], parsed_boost_queries:[MatchAllDocsQuery(*:*^0.0010)], boostfuncs:null, timing:{ time:20.0, prepare:{ time:19.0, org.apache.solr.handler.component.QueryComponent:{ time:19.0}, org.apache.solr.handler.component.FacetComponent:{ time:0.0}, org.apache.solr.handler.component.MoreLikeThisComponent:{ time:0.0}, org.apache.solr.handler.component.HighlightComponent:{ time:0.0}, org.apache.solr.handler.component.StatsComponent:{ time:0.0}, org.apache.solr.handler.component.DebugComponent:{ time:0.0}}, process:{ time:1.0, org.apache.solr.handler.component.QueryComponent:{ time:0.0}, org.apache.solr.handler.component.FacetComponent:{ time:0.0}, org.apache.solr.handler.component.MoreLikeThisComponent:{ time:0.0}, org.apache.solr.handler.component.HighlightComponent:{ time:0.0}, org.apache.solr.handler.component.StatsComponent:{ time:0.0}, org.apache.solr.handler.component.DebugComponent:{ time:1.0} On Wed, Apr 28, 2010 at 23:35, Chris Hostetter hossman_luc...@fucit.org wrote: : Ah, dismax doesn't support top-level NOT query. Hmm, yeah i don' think support for purely negated queries was ever added to dismax. I'm pretty sure that as a workarround you can use add something like... bq=*:*^0.001 ...to your query. based on the dismax structure, that should allow purely negative queries to work. -Hoss
Solr + Zookeeper Integration
Hi people, I'm trying to configure a little solr cluster but I need to shard the documents. I configured my solr with core0 (/opt/solr/core0) and installer the zookeeper (/opt/zookeeper). 1. On my solrconfig.xml I added the lines below: zookeeper str name=zkhostPortshost1:2181/str str name=mehttp://host1:8983/solr/core0/str str name=timeout5000/str str name=nodesDir/solr_domain/nodes/str /zookeeper 2. On my /opt/zookeeper/conf/zoo.cfg I configured this way: tickTime=2000 dataDir=/var/zookeeper clientPort=2181 And start it with zkServer.sh After start the zookeeper my dir /solr_domain/nodes continues empty, following the documentations I didn't find some extra thing to do, but nothing is working. SOmebody could tell me what is missing or wrong please? Thanks
Would it be nuts to store a bunch of large attachments (images, videos) in stored but-not-indexed fields
I have some documents with a bunch of attachments (images, thumbnails for them, audio clips, word docs, etc); and am currently dealing with them by just putting a path on a filesystem to them in solr; and then jumping through hoops of keeping them in sync with solr. Would it be nuts to stick the image data itself in solr? More specifically - if I have a bunch of large stored fields, would it significantly impact search performance in the cases when those fields aren't fetched. Searches are very common in this system, and it's very rare that someone actually opens up one of these attachments so I'm not really worried about the time it takes to fetch them when someone does actually want one.
Re: replication not working between 1.4.1 and 3.1-dev
On 10/27/2010 8:34 PM, Shawn Heisey wrote: I started to upgrade my slave servers from 1.4.1 to 3.1-dev checked out this morning. Because of SOLR-2034 (new javabin version) the replication fails. Asking about it in comments on SOLR-2034 brought up the suggestion of switching to XML instead of javabin, but so far I have not been able to figure out how to do this. I filed a new Jira (SOLR-2204) on the replication failure. Is there any way (through either a config change or minor code changes) to make the replication handler use XML? If I have to make small edits to the 1.4.1 source as well as 3.1, that would be OK. Talking to yourself is probably a sign of mental instability, but I'm doing it anyway. There's been deafening silence from everyone else! The recommended method of safely upgrading Solr that I've read about is to upgrade slave servers, keeping your production application pointed either at another set of slave servers or your master servers. Then you test it with a dev copy of your application, and once you're sure it's working, you can switch production traffic over to the upgraded set. If it falls over, you just switch back to the old version. Once you're sure it's TRULY working, you upgrade everything else. To convert fully to the new index format, you have the option of reindexing or optimizing your existing indexes. I like this method, and this is the way I want to do it, except that the new javabin format makes it impossible. I need a viable way to replicate indexes from a set of 1.4.1 master servers to 3.1-dev slaves. Delving into the source and tackling the problem myself is something I would truly love to do, but I lack the necessary skills. I believe this will be a showstopper problem if 3.1 is released in its current state. Are there any clever workarounds that would let me proceed with my upgrade now? Thanks, Shawn
Re: Looking for Developers
LOL! We ARE programmers, and we do like absolutes :-) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Fri, 10/29/10, Lance Norskog goks...@gmail.com wrote: From: Lance Norskog goks...@gmail.com Subject: Re: Looking for Developers To: solr-user@lucene.apache.org, t...@statsbiblioteket.dk Date: Friday, October 29, 2010, 3:14 AM Then, Godwin! On Fri, Oct 29, 2010 at 3:04 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote: On Fri, 2010-10-29 at 10:06 +0200, Mark Allan wrote: For me, I simply deleted the original email, but I'm now quite enjoying the irony of the complaints causing more noise on the list than the original email! ;-) He he. An old classic. Next in line is the meta-meta-discussion about whether meta-discussions belong on the list or if they should be moved to solr-user-meta. Repeat ad nauseam. Job-postings are on-topic IMHO and unless their volume grows significantly, I see no reason to create a new mail lists. -- Lance Norskog goks...@gmail.com
Re: Would it be nuts to store a bunch of large attachments (images, videos) in stored but-not-indexed fields
On Fri, Oct 29, 2010 at 6:00 PM, Ron Mayer r...@0ape.com wrote: I have some documents with a bunch of attachments (images, thumbnails for them, audio clips, word docs, etc); and am currently dealing with them by just putting a path on a filesystem to them in solr; and then jumping through hoops of keeping them in sync with solr. Not sure why that is an issue. Keeping them in sync with solr would be the same as storing within a file-system. Why would storing within solr be any different. Would it be nuts to stick the image data itself in solr? More specifically - if I have a bunch of large stored fields, would it significantly impact search performance in the cases when those fields aren't fetched. Hard to say. Assume you mean storing by converting into a base64 format. If you do not retrieve the field when fetching, AFAIK should not affect it significantly, if at all. So if you manage your retrieval should be fine. Searches are very common in this system, and it's very rare that someone actually opens up one of these attachments so I'm not really worried about the time it takes to fetch them when someone does actually want one.