Re: edismax, inconsistencies with implicit/explicit AND when used with explicit OR
Hi Mark, I suspect that issue you are facing is https://issues.apache.org/jira/browse/SOLR-2649 You can verify this by toggling default operator between 'AND' and 'OR'. --- On Wed, 8/10/11, Mark juszczec mark.juszc...@gmail.com wrote: From: Mark juszczec mark.juszc...@gmail.com Subject: edismax, inconsistencies with implicit/explicit AND when used with explicit OR To: solr-user@lucene.apache.org Date: Wednesday, August 10, 2011, 12:27 AM Hello all We've just switched from the default parser to the edismax parser and a user has noticed some inconsistencies when using implicit/explicit ANDs, ORs and grouping search terms in parenthesis. First, the default query operator is AND. I switched it from OR today. The query: customersJoin/select?indent=onversion=3.3q=CUSTOMER_NM:*IBM*%20CUSTOMER_NM:*Software*%20OR%20CUSTOMER_NM:*something*fq=start=0rows=10fl=*%2CscoredefType=edismaxwt=explainOther=hl.flhttp://cn-nyc1-ad-dev1.cnet.com:8983/solr/customersJoin/select?indent=onversion=3.3q=CUSTOMER_NM:*IBM*%20CUSTOMER_NM:*Software*%20OR%20CUSTOMER_NM:*something*fq=start=0rows=10fl=*%2CscoredefType=edismaxwt=explainOther=hl.fl = returns 1053 results. Some have only IBM in CUSTOMER_NM, some have only Software in the name, some have both. However, when I explicitly specify an AND between CUSTOMER_NM:*IBM* and CUSTOMER_NM:*Software* : customersJoin/select?indent=onversion=3.3q=CUSTOMER_NM:*IBM*%20AND%20CUSTOMER_NM:*Software*%20OR%20CUSTOMER_NM:*something*fq=start=0rows=10fl=*%2CscoredefType=edismaxwt=explainOther=hl.fl= I only get 3 results and all of them contain both IBM and Software. I found this reference to inconsistencies with edismax, but I'm not sure it explains this situation 100%. http://lucene.472066.n3.nabble.com/edismax-inconsistency-AND-OR-td2131795.html Have I found a bug or am I doing something terribly wrong? Mark
Re: Indexing tweet and searching @keyword OR #keyword
I tried tweaking WordDelimiterFactory but I won't accept # OR @ symbols and it ignored totally. I need solution plz suggest. On 4 August 2011 21:08, Jonathan Rochkind rochk...@jhu.edu wrote: It's the WordDelimiterFactory in your filter chain that's removing the punctuation entirely from your index, I think. Read up on what the WordDelimiter filter does, and what it's settings are; decide how you want things to be tokenized in your index to get the behavior your want; either get WordDelimiter to do it that way by passing it different arguments, or stop using WordDelimiter; come back with any questions after trying that! On 8/4/2011 11:22 AM, Mohammad Shariq wrote: I have indexed around 1 million tweets ( using text dataType). when I search the tweet with # OR @ I dont get the exact result. e.g. when I search for #ipad OR @ipad I get the result where ipad is mentioned skipping the # and @. please suggest me, how to tune or what are filterFactories to use to get the desired result. I am indexing the tweet as text, below is text which is there in my schema.xml. fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.**KeywordTokenizerFactory/ filter class=solr.**CommonGramsFilterFactory words=stopwords.txt minShingleSize=3 maxShingleSize=3 ignoreCase=true/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.**SnowballPorterFilterFactory protected=protwords.txt language=English/ /analyzer analyzer type=query tokenizer class=solr.**KeywordTokenizerFactory/ filter class=solr.**CommonGramsFilterFactory words=stopwords.txt minShingleSize=3 maxShingleSize=3 ignoreCase=true/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.**SnowballPorterFilterFactory protected=protwords.txt language=English/ /analyzer /fieldType -- Thanks and Regards Mohammad Shariq
Re: Is optimize needed on slaves if it replicates from optimized master?
From what I see on my slaves, yes. After replication has finished and new index is in place and new reader has started I have always a write.lock file in my index directory on slaves, even though the index on master is optimized. Regards Bernd Am 10.08.2011 09:12, schrieb Pranav Prakash: Do slaves need a separate optimize command if they replicate from optimized master? *Pranav Prakash* temet nosce Twitterhttp://twitter.com/pranavprakash | Bloghttp://blog.myblive.com | Googlehttp://www.google.com/profiles/pranny
Re: Is optimize needed on slaves if it replicates from optimized master?
On Wed, Aug 10, 2011 at 1:11 PM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: From what I see on my slaves, yes. After replication has finished and new index is in place and new reader has started I have always a write.lock file in my index directory on slaves, even though the index on master is optimized. That is not true. Replication is roughly a copy of the diff between the master and the slave's index. An optimized index is a merged and re-written index so replication from an optimized master will give an optimized copy on the slave. The write lock is due to the fact that an IndexWriter is always open in Solr even on the slaves. -- Regards, Shalin Shekhar Mangar.
Re: Is optimize needed on slaves if it replicates from optimized master?
Sure there is actually no optimizing on the slave needed, but after calling optimize on the slave the write.lock will be removed. So why is the replication process not doing this? Regards Bernd Am 10.08.2011 10:57, schrieb Shalin Shekhar Mangar: On Wed, Aug 10, 2011 at 1:11 PM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: From what I see on my slaves, yes. After replication has finished and new index is in place and new reader has started I have always a write.lock file in my index directory on slaves, even though the index on master is optimized. That is not true. Replication is roughly a copy of the diff between the master and the slave's index. An optimized index is a merged and re-written index so replication from an optimized master will give an optimized copy on the slave. The write lock is due to the fact that an IndexWriter is always open in Solr even on the slaves.
document indexing
Hello, First of all, I am a beginner and i am trying to develop a sample application using SolrNet. I am struggling about schema definition i need to use to correspond my needs. In database, i have Books(bookId, name) and Pages(pageId, bookId, text) tables. They have master-detail relationship. I want to be able to search in Text area of Pages but list the books. Should i use a schema for Pages (with pageid as unique key) or for Books (with bookId as unique key) in this scenario? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/document-indexing-tp3241832p3241832.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is optimize needed on slaves if it replicates from optimized master?
That is not true. Replication is roughly a copy of the diff between the master and the slave's index. In my case, during replication entire index is copied from master to slave, during which the size of index goes a little over double. Then it shrinks to its original size. Am I doing something wrong? How can I get the master to serve only delta index instead of serving whole index and the slaves merging the new and old index? *Pranav Prakash*
Re: document indexing
It really does depend upon what you want to do in your app but from the info given I'd go for denormalizing by repeating the least number of values. So in your case that would be book PageID+BookID(uniqueKey), pageID, PageVal1, PageValn, BookID, BookName On 10 August 2011 09:46, directorscott dgul...@gmail.com wrote: Hello, First of all, I am a beginner and i am trying to develop a sample application using SolrNet. I am struggling about schema definition i need to use to correspond my needs. In database, i have Books(bookId, name) and Pages(pageId, bookId, text) tables. They have master-detail relationship. I want to be able to search in Text area of Pages but list the books. Should i use a schema for Pages (with pageid as unique key) or for Books (with bookId as unique key) in this scenario? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/document-indexing-tp3241832p3241832.html Sent from the Solr - User mailing list archive at Nabble.com.
frange not working in query
Hi All, I am trying to sort the results on a unix timestamp using this query. http://url.com:8983/solr/db/select/?indent=onversion=2.1q={!frange%20l=0.25}query($qq)qq=nokiasort=unix-timestamp%20descstart=0rows=10qt=dismaxwt=dismaxfl=*,scorehl=onhl.snippets=1 When I run this query, it says 'no field name specified in query and no defaultSearchField defined in schema.xml' As soon as I remove the frange query and run this, it starts working fine. http://url.com:8983/solr/db/select/?indent=onversion=2.1q=nokiasort=unix-timestamp%20descstart=0rows=10qt=dismaxwt=dismaxfl=*,scorehl=onhl.snippets=1 Any pointers? Thanks, Amit
RE: Trying to index pdf docs - lazy loading error - ClassNotFoundException: solr.extraction.ExtractingRequestHandler
I have had a mistake with the configs files. From the example directory all works correctly. Thanks to all. --- Rode González Libnova, SL Paseo de la Castellana, 153-Madrid [t]91 449 08 94 [f]91 141 21 21 www.libnova.es -Mensaje original- De: Rode González [mailto:r...@libnova.es] Enviado el: martes, 09 de agosto de 2011 13:04 Para: solr-user@lucene.apache.org CC: Leo Asunto: Trying to index pdf docs - lazy loading error - ClassNotFoundException: solr.extraction.ExtractingRequestHandler Hi all. I've tried to index pdf documents using the libraries includes in the example distribution of solr 3.3.0. I've copied all the jars includes in /dist and /contrib directories in a common /lib directory and I've included this path to the solrconfig.xml file. The request handler for binary docs has no changes from the example: requestHandler name=/update/extract startup=lazy class=solr.extraction.ExtractingRequestHandler lst name=defaults !-- All the main content goes into text... if you need to return the extracted text or do highlighting, use a stored field. - - str name=fmap.contenttext/str !-- str name=lowernamestrue/str -- !-- str name=uprefixignored_/str -- !-- capture link hrefs but ignore div attributes -- !-- str name=captureAttrtrue/str -- !-- str name=fmap.alinks/str -- !-- str name=fmap.divignored_/str -- /lst /requestHandler I've commented all subnodes except fmap.content because I don't use the rest of them. ...BUT... :) When I try : curl http://myserver:8080/solr/update/extract/?literal.id=1000commit=true; -F myfile=@myfile_.pdf I get: Status HTTP 500 - lazy loading error org.apache.solr.common.SolrException: lazy loading error ... Caused by: org.apache.solr.common.SolrException: Error loading class 'solr.extraction.ExtractingRequestHandler' ... I've moved contrib/extraction/lib/* to my lib/* . Restart the server and I can see in the log that apache-solr-cell- 3.3.0.jar was added to the classloader. But I get the same result :( ... lazy loading error, error loading class. # What am I forgetting? what am I missing? Thanks --- Rode González _ No se encontraron virus en este mensaje. Comprobado por AVG - www.avg.com Versión: 10.0.1392 / Base de datos de virus: 1520/3822 - Fecha de publicación: 08/08/11 - No se encontraron virus en este mensaje. Comprobado por AVG - www.avg.com Versión: 10.0.1392 / Base de datos de virus: 1520/3822 - Fecha de publicación: 08/08/11 - No se encontraron virus en este mensaje. Comprobado por AVG - www.avg.com Versión: 10.0.1392 / Base de datos de virus: 1520/3824 - Fecha de publicación: 08/09/11
Re: Possible bug in FastVectorHighlighter
Worked fine. Thanks a lot! Massimo On 09/08/2011 11:58, Jayendra Patil wrote: Try using - str name=hl.tag.pre![CDATA[b]]/str str name=hl.tag.post![CDATA[/b]]/str Regards, Jayendra On Tue, Aug 9, 2011 at 4:46 AM, Massimo Schiavonmschia...@volunia.com wrote: In my Solr (3.3) configuration I specified these two params: str name=hl.simple.pre![CDATA[b]]/str str name=hl.simple.post![CDATA[/b]]/str when I do a simple search I obtain correctly highlighted results where matches areenclosed with correct tag. If I do the same request with hl.useFastVectorHighlighter=true in the http query string (or specifying the same parameter in the config file) the metches are enclosed withem tag (the default value). Anyone has encountered the same
Re: document indexing
Could you please tell me schema.xml fields tag content for such case? Currently index data is something like this: PageID BookID Text 1 1some text 2 1some text 3 1some text 4 1some text 5 2some text 6 2some text 7 2some text 8 2some text when i make a simple query for the word some on Text field, i will have all 8 rows returned. but i want to list only 2 items (Books with IDs 1 and 2) I am also considering to concatenate Text columns and have the index like this: BookID PageTexts 1 some text some text some text 2 some text some text some text I wonder which index structure is better. lee carroll wrote: It really does depend upon what you want to do in your app but from the info given I'd go for denormalizing by repeating the least number of values. So in your case that would be book PageID+BookID(uniqueKey), pageID, PageVal1, PageValn, BookID, BookName On 10 August 2011 09:46, directorscott lt;dgul...@gmail.comgt; wrote: Hello, First of all, I am a beginner and i am trying to develop a sample application using SolrNet. I am struggling about schema definition i need to use to correspond my needs. In database, i have Books(bookId, name) and Pages(pageId, bookId, text) tables. They have master-detail relationship. I want to be able to search in Text area of Pages but list the books. Should i use a schema for Pages (with pageid as unique key) or for Books (with bookId as unique key) in this scenario? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/document-indexing-tp3241832p3241832.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/document-indexing-tp3241832p3242219.html Sent from the Solr - User mailing list archive at Nabble.com.
Date faceting per last hour, three days and last week
Hi, I'm trying date faceting per last 24 hours, three days and last week, but I don't know how to do it. I have a DateField and I want to set different ranges, it is posible? I understand the example from solr wikihttp://wiki.apache.org/solr/SimpleFacetParameters#Date_Faceting:_per_day_for_the_past_5_daysbut I want to do more gaps with the same field_date. How I do this? Thanks, Joan
paging size in SOLR
hi, i want to retrieve all the data from solr (say 10,000 ids ) and my page size is 1000 . how do i get back the data (pages) one after other ?do i have to increment the start value each time by the page size from 0 and do the iteration ? In this case am i querying the index 10 time instead of one or after first query the result will be cached somewhere for the subsequent pages ? JAME VAALET
How come this query string starts with wildcard?
While going through my error logs of Solr, i found that a user had fired a query - jawapan ujian bulanan thn 4 (bahasa melayu). This was converted to following for autosuggest purposes - jawapan?ujian?bulanan?thn?4?(bahasa?melayu)* by the javascript code. Solr threw the exception Cannot parse 'jawapan?ujian?bulanan?thn?4?(bahasa?melayu)*': '*' or '?' not allowed as first character in WildcardQuery How come this query string begins with wildcard character? When I changed the query to remove brackets, everything went smooth. There were no results, because probably my search index didn't had any. *Pranav Prakash* temet nosce Twitter http://twitter.com/pranavprakash | Blog http://blog.myblive.com | Google http://www.google.com/profiles/pranny
Re: Date faceting per last hour, three days and last week
I would use facet queries: facet.query=date:[NOW-1DAY TO NOW] facet.query=date:[NOW-3DAY TO NOW] facet.query=date:[NOW-7DAY TO NOW] -- View this message in context: http://lucene.472066.n3.nabble.com/Date-faceting-per-last-hour-three-days-and-last-week-tp3242364p3242574.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: How come this query string starts with wildcard?
I think this is because ) is treated as a token delimiter. So (foo)bar is treated the same as (foo) bar (that is, bar is treated as a separate word). So (foo)* is really parsed as (foo) * and thus the * is treated as the start of a new word. -Michael
[Help Wanted] Graphics and other help for new Lucene/Solr website
Hi, We are in the process of putting up a new Lucene/Solr/PyLucene/OpenRelevance website. You can see a preview at http://lucene.staging.apache.org/lucene/. It is more or less a look and feel copy of Mahout and Open For Biz websites. This new site, IMO, both looks better than the old one and will be a lot easier for us committers to maintain/update and for others to contribute to. So, how can you help? 0. All of the code is at https://svn.apache.org/repos/asf/lucene/cms/trunk. Check it out the usual way using SVN. If you want to build locally, see https://issues.apache.org/jira/browse/LUCENE-2748 and the links to the ASF CMS guide. 1. If you have any graphic design skills: - I'd love to have some mantle/slide images along the lines of http://lucene.staging.apache.org/lucene/images/mantle-lucene-solr.png. These are used in the slideshow at the top of the Lucene, Core and Solr pages and should be interesting, inviting, etc. and should give people warm fuzzy feelings about all of our software and the great community we have. (Think Marketing!) - Help us coordinate the color selection on the various pages, especially in the slides and especially on the Solr page, as I'm not sure I like the green and black background contrasted with the orange of the Solr logo. 2. In a few more days or maybe a week or so, patches to fix content errors, etc. will be welcome. For now, we are still porting things, so I don't want to duplicate effort. 3. New, useful documentation is also, of course, always welcome. 4. Test with your favorite browser. In particular, I don't have IE handy. I've checked the site in Chrome, Firefox and Safari. If you come up w/ images (I won't guarantee they will be accepted, but I am appreciative of the help) or other style fixes, etc., please submit all content/patches to https://issues.apache.org/jira/browse/LUCENE-2748 and please make sure to check the donation box when attaching the file. -Grant
Re: unique terms and multi-valued fields
Well, it depends (tm). If you're talking about *indexed* terms, then the value is stored only once in both the cases you mentioned below. There's really very little difference between a non-multi-valued field and a multi-valued field in terms of how it's stored in the searchable portion of the index, except for some position information. So, having an XML doc with a single-valued field field name=categorycomputers laptops/field is almost identical (except for position info as positionIncrementGap) as a field name=categorycomputers/field field name=categorylaptops/field multiValued refers to the *input*, not whether more than one word is allowed in that field. Now, about *stored* fields. If you store the data, verbatim copies are kept in the storage-specific files in each segment, and the values will be on disk for each document. But you probably don't care much because this data is only referenced when you assemble a document for return to the client, it's irrelevant for searching. Best Erick On Tue, Aug 9, 2011 at 8:02 PM, Kevin Osborn osbo...@yahoo.com wrote: Please verify my understanding. I have a field called category and it has a value computers. If I use this same field and value for all of my documents, it is really only stored on disk once because category:computers is a unique term. Is this correct? But, what about multi-valued fields. So, I have a field called category. For 100 documents, it has the values computers and laptops. For 100 other documents, it has the values computers and tablets. Is this stored as category:computers, category:laptops, category:tablets, meaning 3 unique terms. Or is it stored as category:computers,laptops and category:computers,tablets. I believe it is the first case (hopefully), but I am not sure. Thanks.
Re: document indexing
With the first option you can be page specific in your search results and searches. Field collapsing/grouping will help with your normalisation issue. (what you have listed is different to what I listed you don't have a unique key) Option 2 means you loose any ability to reference page, but as you note your documents are at the level you wish your search results to be returned. if you are not interested in page then option 2. On 10 August 2011 12:22, directorscott dgul...@gmail.com wrote: Could you please tell me schema.xml fields tag content for such case? Currently index data is something like this: PageID BookID Text 1 1 some text 2 1 some text 3 1 some text 4 1 some text 5 2 some text 6 2 some text 7 2 some text 8 2 some text when i make a simple query for the word some on Text field, i will have all 8 rows returned. but i want to list only 2 items (Books with IDs 1 and 2) I am also considering to concatenate Text columns and have the index like this: BookID PageTexts 1 some text some text some text 2 some text some text some text I wonder which index structure is better. lee carroll wrote: It really does depend upon what you want to do in your app but from the info given I'd go for denormalizing by repeating the least number of values. So in your case that would be book PageID+BookID(uniqueKey), pageID, PageVal1, PageValn, BookID, BookName On 10 August 2011 09:46, directorscott lt;dgul...@gmail.comgt; wrote: Hello, First of all, I am a beginner and i am trying to develop a sample application using SolrNet. I am struggling about schema definition i need to use to correspond my needs. In database, i have Books(bookId, name) and Pages(pageId, bookId, text) tables. They have master-detail relationship. I want to be able to search in Text area of Pages but list the books. Should i use a schema for Pages (with pageid as unique key) or for Books (with bookId as unique key) in this scenario? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/document-indexing-tp3241832p3241832.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/document-indexing-tp3241832p3242219.html Sent from the Solr - User mailing list archive at Nabble.com.
AW: Problem with DIH: How to map key value pair stored in 1-N relation from a JDBC Source?
Thanks, for this quick and enlightening answer! I didn't consider that a Transformer can create new columns. In combination with dynamic fields it is exactly what I was looking for. Thanks James ^^ -Ursprüngliche Nachricht- Von: Dyer, James [mailto:james.d...@ingrambook.com] Gesendet: Dienstag, 9. August 2011 16:03 An: solr-user@lucene.apache.org Betreff: RE: Problem with DIH: How to map key value pair stored in 1-N relation from a JDBC Source? Christian, It looks like you should probably write a Transformer for your DIH script. I assume you have a child entity set up for PriceTable. Add a Transformer to this entity that will look at the value of currency and price, remove these from the row, then add them back in with currency as the field name and price as the column value. By the way, it would likely be better if instead of field names like EUR and CHF, you created a dynamic field entry in schema.xml with a dynamic field like this: dynamicField name=CURRENCY_* type=tfloat indexed=true stored=false / Then have your DIH Transformer prepend CURRENCY_ in front of the field name. This way should your company ever add a new currency, you wouldn't need to change your schema. For more information on writing a DIH Transformer, see http://wiki.apache.org/solr/DIHCustomTransformer If you would rather use a scripting language such as javascript instead of writing your Transformer in java, see http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer . James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311
Re: Indexing tweet and searching @keyword OR #keyword
Please look more carefully at the documentation for WDDF, specifically: split on intra-word delimiters (all non alpha-numeric characters). WordDelimiterFilterFactory will always throw away non alpha-numeric characters, you can't tell it do to otherwise. Try some of the other tokenizers/analyzers to get what you want, and also look at the admin/analysis page to see what the exact effects are of your fieldType definitions. Here's a great place to start: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters You probably want something like WhitespaceTokenizerFactory followed by LowerCaseFilterFactory or some such... But I really question whether this is what you want either. Do you really want a search on ipad to *fail* to match input of #ipad? Or vice-versa? KeywordTokenizerFactory is probably not the place you want to start, the tokenization process doesn't break anything up, you happen to be getting separate tokens because of WDDF, which as you see can't process things the way you want. Best Erick On Wed, Aug 10, 2011 at 3:09 AM, Mohammad Shariq shariqn...@gmail.com wrote: I tried tweaking WordDelimiterFactory but I won't accept # OR @ symbols and it ignored totally. I need solution plz suggest. On 4 August 2011 21:08, Jonathan Rochkind rochk...@jhu.edu wrote: It's the WordDelimiterFactory in your filter chain that's removing the punctuation entirely from your index, I think. Read up on what the WordDelimiter filter does, and what it's settings are; decide how you want things to be tokenized in your index to get the behavior your want; either get WordDelimiter to do it that way by passing it different arguments, or stop using WordDelimiter; come back with any questions after trying that! On 8/4/2011 11:22 AM, Mohammad Shariq wrote: I have indexed around 1 million tweets ( using text dataType). when I search the tweet with # OR @ I dont get the exact result. e.g. when I search for #ipad OR @ipad I get the result where ipad is mentioned skipping the # and @. please suggest me, how to tune or what are filterFactories to use to get the desired result. I am indexing the tweet as text, below is text which is there in my schema.xml. fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.**KeywordTokenizerFactory/ filter class=solr.**CommonGramsFilterFactory words=stopwords.txt minShingleSize=3 maxShingleSize=3 ignoreCase=true/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.**SnowballPorterFilterFactory protected=protwords.txt language=English/ /analyzer analyzer type=query tokenizer class=solr.**KeywordTokenizerFactory/ filter class=solr.**CommonGramsFilterFactory words=stopwords.txt minShingleSize=3 maxShingleSize=3 ignoreCase=true/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.**SnowballPorterFilterFactory protected=protwords.txt language=English/ /analyzer /fieldType -- Thanks and Regards Mohammad Shariq
Re: frange not working in query
Could you tell us what you're trying to achieve with the range query ? It's not clear. -Simon On Wed, Aug 10, 2011 at 5:57 AM, Amit Sawhney sawhney.a...@gmail.com wrote: Hi All, I am trying to sort the results on a unix timestamp using this query. http://url.com:8983/solr/db/select/?indent=onversion=2.1q={!frange%20l=0.25}query($qq)qq=nokiasort=unix-timestamp%20descstart=0rows=10qt=dismaxwt=dismaxfl=*,scorehl=onhl.snippets=1 When I run this query, it says 'no field name specified in query and no defaultSearchField defined in schema.xml' As soon as I remove the frange query and run this, it starts working fine. http://url.com:8983/solr/db/select/?indent=onversion=2.1q=nokiasort=unix-timestamp%20descstart=0rows=10qt=dismaxwt=dismaxfl=*,scorehl=onhl.snippets=1 Any pointers? Thanks, Amit
Re: frange not working in query
I meant the frange query, of course On Wed, Aug 10, 2011 at 10:21 AM, simon mtnes...@gmail.com wrote: Could you tell us what you're trying to achieve with the range query ? It's not clear. -Simon On Wed, Aug 10, 2011 at 5:57 AM, Amit Sawhney sawhney.a...@gmail.com wrote: Hi All, I am trying to sort the results on a unix timestamp using this query. http://url.com:8983/solr/db/select/?indent=onversion=2.1q={!frange%20l=0.25}query($qq)qq=nokiasort=unix-timestamp%20descstart=0rows=10qt=dismaxwt=dismaxfl=*,scorehl=onhl.snippets=1 When I run this query, it says 'no field name specified in query and no defaultSearchField defined in schema.xml' As soon as I remove the frange query and run this, it starts working fine. http://url.com:8983/solr/db/select/?indent=onversion=2.1q=nokiasort=unix-timestamp%20descstart=0rows=10qt=dismaxwt=dismaxfl=*,scorehl=onhl.snippets=1 Any pointers? Thanks, Amit
Re: Is optimize needed on slaves if it replicates from optimized master?
This is expected behavior. You might be optimizing your index on the master after every set of changes, in which case the entire index is copied. During this period, the space on disk will at least double, there's no way around that. If you do NOT optimize, then the slave will only copy changed segments instead of the entire index. Optimizing isn't usually necessary except periodically (daily, perhaps weekly, perhaps never actually). All that said, depending on how merging happens, you will always have the possibility of the entire index being copied sometimes because you'll happen to hit a merge that merges all segments into one. There are some advanced options that can control some parts of merging, but you need to get to the bottom of why the whole index is getting copied every time before you go there. I'd bet you're issuing an optimize. Best Erick On Wed, Aug 10, 2011 at 5:30 AM, Pranav Prakash pra...@gmail.com wrote: That is not true. Replication is roughly a copy of the diff between the master and the slave's index. In my case, during replication entire index is copied from master to slave, during which the size of index goes a little over double. Then it shrinks to its original size. Am I doing something wrong? How can I get the master to serve only delta index instead of serving whole index and the slaves merging the new and old index? *Pranav Prakash*
Re: paging size in SOLR
Well, if you really want to you can specify start=0 and rows=1 and get them all back at once. You can do page-by-page by incrementing the start parameter as you indicated. You can keep from re-executing the search by setting your queryResultCache appropriately, but this affects all searches so might be an issue. Best Erick On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet jamevaa...@gmail.com wrote: hi, i want to retrieve all the data from solr (say 10,000 ids ) and my page size is 1000 . how do i get back the data (pages) one after other ?do i have to increment the start value each time by the page size from 0 and do the iteration ? In this case am i querying the index 10 time instead of one or after first query the result will be cached somewhere for the subsequent pages ? JAME VAALET
Re: paging size in SOLR
Worth remembering there are some performance penalties with deep paging, if you use the page-by-page approach. may not be too much of a problem if you really are only looking to retrieve 10K docs. -Simon On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson erickerick...@gmail.com wrote: Well, if you really want to you can specify start=0 and rows=1 and get them all back at once. You can do page-by-page by incrementing the start parameter as you indicated. You can keep from re-executing the search by setting your queryResultCache appropriately, but this affects all searches so might be an issue. Best Erick On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet jamevaa...@gmail.com wrote: hi, i want to retrieve all the data from solr (say 10,000 ids ) and my page size is 1000 . how do i get back the data (pages) one after other ?do i have to increment the start value each time by the page size from 0 and do the iteration ? In this case am i querying the index 10 time instead of one or after first query the result will be cached somewhere for the subsequent pages ? JAME VAALET
RE: paging size in SOLR
I would imagine the performance penalties with deep paging will ALSO be there if you just ask for 1 rows all at once though, instead of in, say, 100 row paged batches. Yes? No? -Original Message- From: simon [mailto:mtnes...@gmail.com] Sent: Wednesday, August 10, 2011 10:44 AM To: solr-user@lucene.apache.org Subject: Re: paging size in SOLR Worth remembering there are some performance penalties with deep paging, if you use the page-by-page approach. may not be too much of a problem if you really are only looking to retrieve 10K docs. -Simon On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson erickerick...@gmail.com wrote: Well, if you really want to you can specify start=0 and rows=1 and get them all back at once. You can do page-by-page by incrementing the start parameter as you indicated. You can keep from re-executing the search by setting your queryResultCache appropriately, but this affects all searches so might be an issue. Best Erick On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet jamevaa...@gmail.com wrote: hi, i want to retrieve all the data from solr (say 10,000 ids ) and my page size is 1000 . how do i get back the data (pages) one after other ?do i have to increment the start value each time by the page size from 0 and do the iteration ? In this case am i querying the index 10 time instead of one or after first query the result will be cached somewhere for the subsequent pages ? JAME VAALET
Building a facet query in SolrJ
Hi - I'm trying to do a (I think) simple facet query, but I'm not getting the results I expect. I have a field, MyField, and I want to get facets for specific values of that field. That is, I want a FacetField if MyField is ABC, DEF, etc. (a specific list of values), but not if MyField is any other value. If I build my query like this: SolrQuery query = new SolrQuery( luceneQueryStr ); query.setStart( request.getStartIndex() ); query.setRows( request.getMaxResults() ); query.setFacet(true); query.setFacetMinCount(1); query.addFacetField(MYFIELD); for (String fieldValue : desiredFieldValues) { query.addFacetQuery(MYFIELD + : + fieldValue); } queryResponse.getFacetFields returns facets for ALL values of MyField. I figured that was because setting the facet field with addFacetField caused Solr to examine all values. But, if I take out that line, then getFacetFields returns an empty list. I'm sure I'm doing something simple wrong, but I'm out of ideas right now. -Rich
Re: paging size in SOLR
when you say queryResultCache, does it only cache n number of result for the last one query or more than one queries? On 10 August 2011 20:14, simon mtnes...@gmail.com wrote: Worth remembering there are some performance penalties with deep paging, if you use the page-by-page approach. may not be too much of a problem if you really are only looking to retrieve 10K docs. -Simon On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson erickerick...@gmail.com wrote: Well, if you really want to you can specify start=0 and rows=1 and get them all back at once. You can do page-by-page by incrementing the start parameter as you indicated. You can keep from re-executing the search by setting your queryResultCache appropriately, but this affects all searches so might be an issue. Best Erick On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet jamevaa...@gmail.com wrote: hi, i want to retrieve all the data from solr (say 10,000 ids ) and my page size is 1000 . how do i get back the data (pages) one after other ?do i have to increment the start value each time by the page size from 0 and do the iteration ? In this case am i querying the index 10 time instead of one or after first query the result will be cached somewhere for the subsequent pages ? JAME VAALET -- -JAME
Re: Solr 3.3 crashes after ~18 hours?
Okay, with this command it hangs. Also: I managed to get a Thread Dump (attached). regards Am 05.08.2011 15:08, schrieb Yonik Seeley: On Fri, Aug 5, 2011 at 7:33 AM, alexander sulza.s...@digiconcept.net wrote: Usually you get a XML-Response when doing commits or optimize, in this case I get nothing in return, but the site ( http://[...]/solr/update?optimize=true ) DOESN'T load forever or anything. It doesn't hang! I just get a blank page / empty response. Sounds like you are doing it from a browser? Can you try it from the command line? It should give back some sort of response (or hang waiting for a response). curl http://localhost:8983/solr/update?commit=true; -Yonik http://www.lucidimagination.com I use the stuff in the example folder, the only changes i made was enable logging and changing the port to 8985. I'll try getting a thread dump if it happens again! So far its looking good with having allocated more memory to it. Am 04.08.2011 16:08, schrieb Yonik Seeley: On Thu, Aug 4, 2011 at 8:09 AM, alexander sulza.s...@digiconcept.net wrote: Thank you for the many replies! Like I said, I couldn't find anything in logs created by solr. I just had a look at the /var/logs/messages and there wasn't anything either. What I mean by crash is that the process is still there and http GET pings would return 200 but when i try visiting /solr/admin, I'd get a blank page! The server ignores any incoming updates or commits, ignores means what? The request hangs? If so, could you get a thread dump? Do queries work (like /solr/select?q=*:*) ? thous throwing no errors, no 503's.. It's like the server has a blackout and stares blankly into space. Are you using a different servlet container than what is shipped with solr? If you did start with the solr example server, what jetty configuration changes have you made? -Yonik http://www.lucidimagination.com Full thread dump Java HotSpot(TM) Server VM (19.1-b02 mixed mode): DestroyJavaVM prio=10 tid=0x6e32e800 nid=0x5aeb waiting on condition [0x] java.lang.Thread.State: RUNNABLE Timer-2 daemon prio=10 tid=0x6e3ff800 nid=0x5b0b in Object.wait() [0x6e6e5000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xb0260108 (a java.util.TaskQueue) at java.util.TimerThread.mainLoop(Unknown Source) - locked 0xb0260108 (a java.util.TaskQueue) at java.util.TimerThread.run(Unknown Source) pool-1-thread-1 prio=10 tid=0x6e32dc00 nid=0x5b0a waiting on condition [0x6dae] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0xb02680e8 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown Source) at java.util.concurrent.LinkedBlockingQueue.take(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Timer-1 daemon prio=10 tid=0x0874e000 nid=0x5b07 in Object.wait() [0x6eb6d000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xb02601c0 (a java.util.TaskQueue) at java.util.TimerThread.mainLoop(Unknown Source) - locked 0xb02601c0 (a java.util.TaskQueue) at java.util.TimerThread.run(Unknown Source) 8106640@qtp-25094328-9 - Acceptor0 SocketConnector@0.0.0.0:8985 prio=10 tid=0x0832dc00 nid=0x5b06 runnable [0x6ecc7000] java.lang.Thread.State: RUNNABLE at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(Unknown Source) - locked 0xb0260288 (a java.net.SocksSocketImpl) at java.net.ServerSocket.implAccept(Unknown Source) at java.net.ServerSocket.accept(Unknown Source) at org.mortbay.jetty.bio.SocketConnector.accept(SocketConnector.java:99) at org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:708) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) 9097070@qtp-25094328-8 prio=10 tid=0x0832c400 nid=0x5b05 in Object.wait() [0x6ed18000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xb0264018 (a org.mortbay.thread.QueuedThreadPool$PoolThread) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:626) - locked 0xb0264018 (a org.mortbay.thread.QueuedThreadPool$PoolThread) 4098499@qtp-25094328-7 prio=10 tid=0x0832ac00 nid=0x5b04 in Object.wait() [0x6ed69000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at
Error loading a custom request handler in Solr 4.0
Hi, Apologies if this is really basic. I'm trying to learn how to create a custom request handler, so I wrote the minimal class (attached), compiled and jar'd it, and placed it in example/lib. I added this to solrconfig.xml: requestHandler name=/flaxtest class=FlaxTestHandler / When I started Solr with java -jar start.jar, I got this: ... SEVERE: java.lang.NoClassDefFoundError: org/apache/solr/handler/RequestHandlerBase at java.lang.ClassLoader.defineClass1(Native Method) ... So I copied all the dist/*.jar files into lib and tried again. This time it seemed to start ok, but browsing to http://localhost:8983/solr/ displayed this: org.apache.solr.common.SolrException: Error Instantiating Request Handler, FlaxTestHandler is not a org.apache.solr.request.SolrRequestHandler at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:410) ... Any ideas? thanks, Tom
RE: Building a facet query in SolrJ
Oops. I think I found it. My desiredFieldValues list has the wrong info. Knew there was something simple wrong. From: Simon, Richard T Sent: Wednesday, August 10, 2011 10:55 AM To: solr-user@lucene.apache.org Cc: Simon, Richard T Subject: Building a facet query in SolrJ Hi - I'm trying to do a (I think) simple facet query, but I'm not getting the results I expect. I have a field, MyField, and I want to get facets for specific values of that field. That is, I want a FacetField if MyField is ABC, DEF, etc. (a specific list of values), but not if MyField is any other value. If I build my query like this: SolrQuery query = new SolrQuery( luceneQueryStr ); query.setStart( request.getStartIndex() ); query.setRows( request.getMaxResults() ); query.setFacet(true); query.setFacetMinCount(1); query.addFacetField(MYFIELD); for (String fieldValue : desiredFieldValues) { query.addFacetQuery(MYFIELD + : + fieldValue); } queryResponse.getFacetFields returns facets for ALL values of MyField. I figured that was because setting the facet field with addFacetField caused Solr to examine all values. But, if I take out that line, then getFacetFields returns an empty list. I'm sure I'm doing something simple wrong, but I'm out of ideas right now. -Rich
Re: Solr 3.3 crashes after ~18 hours?
On Wed, Aug 10, 2011 at 11:00 AM, alexander sulz a.s...@digiconcept.net wrote: Okay, with this command it hangs. It doesn't look like a hang from this thread dump. It doesn't look like any solr requests are executing at the time the dump was taken. Did you do this from the command line? curl http://localhost:8983/solr/update?commit=true; Are you saying that the curl command just hung and never returned? -Yonik http://www.lucidimagination.com Also: I managed to get a Thread Dump (attached). regards Am 05.08.2011 15:08, schrieb Yonik Seeley: On Fri, Aug 5, 2011 at 7:33 AM, alexander sulza.s...@digiconcept.net wrote: Usually you get a XML-Response when doing commits or optimize, in this case I get nothing in return, but the site ( http://[...]/solr/update?optimize=true ) DOESN'T load forever or anything. It doesn't hang! I just get a blank page / empty response. Sounds like you are doing it from a browser? Can you try it from the command line? It should give back some sort of response (or hang waiting for a response). curl http://localhost:8983/solr/update?commit=true; -Yonik http://www.lucidimagination.com I use the stuff in the example folder, the only changes i made was enable logging and changing the port to 8985. I'll try getting a thread dump if it happens again! So far its looking good with having allocated more memory to it. Am 04.08.2011 16:08, schrieb Yonik Seeley: On Thu, Aug 4, 2011 at 8:09 AM, alexander sulza.s...@digiconcept.net wrote: Thank you for the many replies! Like I said, I couldn't find anything in logs created by solr. I just had a look at the /var/logs/messages and there wasn't anything either. What I mean by crash is that the process is still there and http GET pings would return 200 but when i try visiting /solr/admin, I'd get a blank page! The server ignores any incoming updates or commits, ignores means what? The request hangs? If so, could you get a thread dump? Do queries work (like /solr/select?q=*:*) ? thous throwing no errors, no 503's.. It's like the server has a blackout and stares blankly into space. Are you using a different servlet container than what is shipped with solr? If you did start with the solr example server, what jetty configuration changes have you made? -Yonik http://www.lucidimagination.com
Re: Cache replication
Consider putting a cache (memcached, redis, etc) *in front* of your solr slaves. Just make sure to update it when replication occurs. didier On Tue, Aug 9, 2011 at 6:07 PM, arian487 akarb...@tagged.com wrote: I'm wondering if the caches on all the slaves are replicated across (such as queryResultCache). That is to say, if I hit one of my slaves and cache a result, and I make a search later and that search happens to hit a different slave, will that first cached result be available for use? This is pretty important because I'm going to have a lot of slaves and if this isn't done, then I'd have a high chance of running a lot uncached queries. Thanks :) -- View this message in context: http://lucene.472066.n3.nabble.com/Cache-replication-tp3240708p3240708.html Sent from the Solr - User mailing list archive at Nabble.com.
Dates off by 1 day?
Hi all- I apologize in advance if this turns out to be a problem between the keyboard and the chair, but I'm confused about why my date field is correct in the index, but wrong in SolrJ. I have a field defined as a date in the index: field name=FILE_DATE type=date indexed=true stored=true/ And if I use the admin site to query the data, I get the right date: date name=FILE_DATE2002-05-13T00:00:00Z/date But in my SolrJ code: IteratorSolrDocument iter = queryResponse.getResults().iterator(); while (iter.hasNext()) { SolrDocument resultDoc = iter.next(); System.out.println(-- + resultDoc.getFieldValue(FILE_DATE)); } I get: -- Sun May 12 19:00:00 CDT 2002 I've been searching around through the wiki and other places, but can't seem to find anything that either mentions this problem or talks about date handling in Solr/SolrJ that might refer to something like this. Thanks for any info, Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Re: Dates off by 1 day?
The Date difference is coming because of different time zones. In Solr the date is stored as Zulu time zone and Solrj is returning date in CDT timezone (jvm is picking system time zone.) date name=FILE_DATE2002-05-13T00:00:00Z/date I get: -- Sun May 12 19:00:00 CDT 2002 You can convert Date in different time-zones using Java Util date functions if required. Hope it helps! -param On 8/10/11 11:20 AM, Olson, Ron rol...@lbpc.com wrote: Hi all- I apologize in advance if this turns out to be a problem between the keyboard and the chair, but I'm confused about why my date field is correct in the index, but wrong in SolrJ. I have a field defined as a date in the index: field name=FILE_DATE type=date indexed=true stored=true/ And if I use the admin site to query the data, I get the right date: date name=FILE_DATE2002-05-13T00:00:00Z/date But in my SolrJ code: IteratorSolrDocument iter = queryResponse.getResults().iterator(); while (iter.hasNext()) { SolrDocument resultDoc = iter.next(); System.out.println(-- + resultDoc.getFieldValue(FILE_DATE)); } I get: -- Sun May 12 19:00:00 CDT 2002 I've been searching around through the wiki and other places, but can't seem to find anything that either mentions this problem or talks about date handling in Solr/SolrJ that might refer to something like this. Thanks for any info, Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
RE: Dates off by 1 day?
Ah, great! I knew the problem was between the keyboard and the chair. Thanks! -Original Message- From: Sethi, Parampreet [mailto:parampreet.se...@teamaol.com] Sent: Wednesday, August 10, 2011 10:25 AM To: solr-user@lucene.apache.org Subject: Re: Dates off by 1 day? The Date difference is coming because of different time zones. In Solr the date is stored as Zulu time zone and Solrj is returning date in CDT timezone (jvm is picking system time zone.) date name=FILE_DATE2002-05-13T00:00:00Z/date I get: -- Sun May 12 19:00:00 CDT 2002 You can convert Date in different time-zones using Java Util date functions if required. Hope it helps! -param On 8/10/11 11:20 AM, Olson, Ron rol...@lbpc.com wrote: Hi all- I apologize in advance if this turns out to be a problem between the keyboard and the chair, but I'm confused about why my date field is correct in the index, but wrong in SolrJ. I have a field defined as a date in the index: field name=FILE_DATE type=date indexed=true stored=true/ And if I use the admin site to query the data, I get the right date: date name=FILE_DATE2002-05-13T00:00:00Z/date But in my SolrJ code: IteratorSolrDocument iter = queryResponse.getResults().iterator(); while (iter.hasNext()) { SolrDocument resultDoc = iter.next(); System.out.println(-- + resultDoc.getFieldValue(FILE_DATE)); } I get: -- Sun May 12 19:00:00 CDT 2002 I've been searching around through the wiki and other places, but can't seem to find anything that either mentions this problem or talks about date handling in Solr/SolrJ that might refer to something like this. Thanks for any info, Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you. DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Re: Error loading a custom request handler in Solr 4.0
Th attachment isn't showing up (in gmail, at least). Can you inline the relevant bits of code ? On Wed, Aug 10, 2011 at 11:05 AM, Tom Mortimer t...@flax.co.uk wrote: Hi, Apologies if this is really basic. I'm trying to learn how to create a custom request handler, so I wrote the minimal class (attached), compiled and jar'd it, and placed it in example/lib. I added this to solrconfig.xml: requestHandler name=/flaxtest class=FlaxTestHandler / When I started Solr with java -jar start.jar, I got this: ... SEVERE: java.lang.NoClassDefFoundError: org/apache/solr/handler/RequestHandlerBase at java.lang.ClassLoader.defineClass1(Native Method) ... So I copied all the dist/*.jar files into lib and tried again. This time it seemed to start ok, but browsing to http://localhost:8983/solr/ displayed this: org.apache.solr.common.SolrException: Error Instantiating Request Handler, FlaxTestHandler is not a org.apache.solr.request.SolrRequestHandler at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:410) ... Any ideas? thanks, Tom
Re: Error loading a custom request handler in Solr 4.0
Sure - import org.apache.solr.request.SolrQueryRequest; import org.apache.solr.response.SolrQueryResponse; import org.apache.solr.handler.RequestHandlerBase; public class FlaxTestHandler extends RequestHandlerBase { public FlaxTestHandler() { } public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception { rsp.add(FlaxTest, Hello!); } public String getDescription() { return Flax; } public String getSourceId() { return Flax; } public String getSource() { return Flax; } public String getVersion() { return Flax; } } On 10 August 2011 16:43, simon mtnes...@gmail.com wrote: Th attachment isn't showing up (in gmail, at least). Can you inline the relevant bits of code ? On Wed, Aug 10, 2011 at 11:05 AM, Tom Mortimer t...@flax.co.uk wrote: Hi, Apologies if this is really basic. I'm trying to learn how to create a custom request handler, so I wrote the minimal class (attached), compiled and jar'd it, and placed it in example/lib. I added this to solrconfig.xml: requestHandler name=/flaxtest class=FlaxTestHandler / When I started Solr with java -jar start.jar, I got this: ... SEVERE: java.lang.NoClassDefFoundError: org/apache/solr/handler/RequestHandlerBase at java.lang.ClassLoader.defineClass1(Native Method) ... So I copied all the dist/*.jar files into lib and tried again. This time it seemed to start ok, but browsing to http://localhost:8983/solr/displayed this: org.apache.solr.common.SolrException: Error Instantiating Request Handler, FlaxTestHandler is not a org.apache.solr.request.SolrRequestHandler at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:410) ... Any ideas? thanks, Tom
Re: how to ignore case in solr search field?
You can use solr.LowerCaseFilterFactory in an analyser chain for both indexing and queries. The schema.xml supplied with example has several field types using this (including text_general). Tom On 10 August 2011 16:42, nagarjuna nagarjuna.avul...@gmail.com wrote: Hi please help me .. how to ignore case while searching in solr ex:i need same results for the keywords abc, ABC , aBc,AbC and all the cases. Thank u in advance -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-ignore-case-in-solr-search-field-tp3242967p3242967.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is optimize needed on slaves if it replicates from optimized master?
Very well explained. Thanks. Yes, we do optimize Index before replication. I am not particularly worried about disk space usage. I was more curious of that behavior. *Pranav Prakash* temet nosce Twitter http://twitter.com/pranavprakash | Blog http://blog.myblive.com | Google http://www.google.com/profiles/pranny On Wed, Aug 10, 2011 at 19:55, Erick Erickson erickerick...@gmail.comwrote: This is expected behavior. You might be optimizing your index on the master after every set of changes, in which case the entire index is copied. During this period, the space on disk will at least double, there's no way around that. If you do NOT optimize, then the slave will only copy changed segments instead of the entire index. Optimizing isn't usually necessary except periodically (daily, perhaps weekly, perhaps never actually). All that said, depending on how merging happens, you will always have the possibility of the entire index being copied sometimes because you'll happen to hit a merge that merges all segments into one. There are some advanced options that can control some parts of merging, but you need to get to the bottom of why the whole index is getting copied every time before you go there. I'd bet you're issuing an optimize. Best Erick On Wed, Aug 10, 2011 at 5:30 AM, Pranav Prakash pra...@gmail.com wrote: That is not true. Replication is roughly a copy of the diff between the master and the slave's index. In my case, during replication entire index is copied from master to slave, during which the size of index goes a little over double. Then it shrinks to its original size. Am I doing something wrong? How can I get the master to serve only delta index instead of serving whole index and the slaves merging the new and old index? *Pranav Prakash*
RE: [Help Wanted] Graphics and other help for new Lucene/Solr website
The site looks great. And thank you for including the ManifoldCF link. ;-) Karl -Original Message- From: ext Grant Ingersoll [mailto:gsing...@apache.org] Sent: Wednesday, August 10, 2011 10:09 AM To: solr-user@lucene.apache.org; java-u...@lucene.apache.org Subject: [Help Wanted] Graphics and other help for new Lucene/Solr website Hi, We are in the process of putting up a new Lucene/Solr/PyLucene/OpenRelevance website. You can see a preview at http://lucene.staging.apache.org/lucene/. It is more or less a look and feel copy of Mahout and Open For Biz websites. This new site, IMO, both looks better than the old one and will be a lot easier for us committers to maintain/update and for others to contribute to. So, how can you help? 0. All of the code is at https://svn.apache.org/repos/asf/lucene/cms/trunk. Check it out the usual way using SVN. If you want to build locally, see https://issues.apache.org/jira/browse/LUCENE-2748 and the links to the ASF CMS guide. 1. If you have any graphic design skills: - I'd love to have some mantle/slide images along the lines of http://lucene.staging.apache.org/lucene/images/mantle-lucene-solr.png. These are used in the slideshow at the top of the Lucene, Core and Solr pages and should be interesting, inviting, etc. and should give people warm fuzzy feelings about all of our software and the great community we have. (Think Marketing!) - Help us coordinate the color selection on the various pages, especially in the slides and especially on the Solr page, as I'm not sure I like the green and black background contrasted with the orange of the Solr logo. 2. In a few more days or maybe a week or so, patches to fix content errors, etc. will be welcome. For now, we are still porting things, so I don't want to duplicate effort. 3. New, useful documentation is also, of course, always welcome. 4. Test with your favorite browser. In particular, I don't have IE handy. I've checked the site in Chrome, Firefox and Safari. If you come up w/ images (I won't guarantee they will be accepted, but I am appreciative of the help) or other style fixes, etc., please submit all content/patches to https://issues.apache.org/jira/browse/LUCENE-2748 and please make sure to check the donation box when attaching the file. -Grant
Re: [Help Wanted] Graphics and other help for new Lucene/Solr website
Looks nice! Font seems too light to read with comfort though. Hi, We are in the process of putting up a new Lucene/Solr/PyLucene/OpenRelevance website. You can see a preview at http://lucene.staging.apache.org/lucene/. It is more or less a look and feel copy of Mahout and Open For Biz websites. This new site, IMO, both looks better than the old one and will be a lot easier for us committers to maintain/update and for others to contribute to. So, how can you help? 0. All of the code is at https://svn.apache.org/repos/asf/lucene/cms/trunk. Check it out the usual way using SVN. If you want to build locally, see https://issues.apache.org/jira/browse/LUCENE-2748 and the links to the ASF CMS guide. 1. If you have any graphic design skills: - I'd love to have some mantle/slide images along the lines of http://lucene.staging.apache.org/lucene/images/mantle-lucene-solr.png. These are used in the slideshow at the top of the Lucene, Core and Solr pages and should be interesting, inviting, etc. and should give people warm fuzzy feelings about all of our software and the great community we have. (Think Marketing!) - Help us coordinate the color selection on the various pages, especially in the slides and especially on the Solr page, as I'm not sure I like the green and black background contrasted with the orange of the Solr logo. 2. In a few more days or maybe a week or so, patches to fix content errors, etc. will be welcome. For now, we are still porting things, so I don't want to duplicate effort. 3. New, useful documentation is also, of course, always welcome. 4. Test with your favorite browser. In particular, I don't have IE handy. I've checked the site in Chrome, Firefox and Safari. If you come up w/ images (I won't guarantee they will be accepted, but I am appreciative of the help) or other style fixes, etc., please submit all content/patches to https://issues.apache.org/jira/browse/LUCENE-2748 and please make sure to check the donation box when attaching the file. -Grant
Re: Error loading a custom request handler in Solr 4.0
It's working for me. Compiled, inserted in solr/lib, added the config line to solrconfig. when I send a /flaxtest request i get response lst name=responseHeader int name=status0/int int name=QTime16/int /lst str name=FlaxTestHello!/str /response I was doing this within a core defined in solr.xml -Simon On Wed, Aug 10, 2011 at 11:46 AM, Tom Mortimer t...@flax.co.uk wrote: Sure - import org.apache.solr.request.SolrQueryRequest; import org.apache.solr.response.SolrQueryResponse; import org.apache.solr.handler.RequestHandlerBase; public class FlaxTestHandler extends RequestHandlerBase { public FlaxTestHandler() { } public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception { rsp.add(FlaxTest, Hello!); } public String getDescription() { return Flax; } public String getSourceId() { return Flax; } public String getSource() { return Flax; } public String getVersion() { return Flax; } } On 10 August 2011 16:43, simon mtnes...@gmail.com wrote: Th attachment isn't showing up (in gmail, at least). Can you inline the relevant bits of code ? On Wed, Aug 10, 2011 at 11:05 AM, Tom Mortimer t...@flax.co.uk wrote: Hi, Apologies if this is really basic. I'm trying to learn how to create a custom request handler, so I wrote the minimal class (attached), compiled and jar'd it, and placed it in example/lib. I added this to solrconfig.xml: requestHandler name=/flaxtest class=FlaxTestHandler / When I started Solr with java -jar start.jar, I got this: ... SEVERE: java.lang.NoClassDefFoundError: org/apache/solr/handler/RequestHandlerBase at java.lang.ClassLoader.defineClass1(Native Method) ... So I copied all the dist/*.jar files into lib and tried again. This time it seemed to start ok, but browsing to http://localhost:8983/solr/displayed this: org.apache.solr.common.SolrException: Error Instantiating Request Handler, FlaxTestHandler is not a org.apache.solr.request.SolrRequestHandler at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:410) ... Any ideas? thanks, Tom
query time problem
Hi, I've noticed poor performance for my solr queries in the past few days. Queries of that type : http://server:5000/solr/select?q=story_search_field_en:(water boston) OR story_search_field_fr:(water boston)rows=350start=0sort=r_modify_date descshards=shard1:5001/solr,shard2:5002/solrfq=type:(cch_story OR cch_published_story) Are slow (more than 10 seconds). I would like to know if someone knows how I could investigate the problem ? I tried to specify the parameters debugQuery=onexplainOther=on but this doesn't help much. I also monitored the shards log. Sometimes, there is broken pipe in the shards logs. Also, is there a way I could monitor the cache statistics ? For your information, every shards master and slaves computers have enough RAM and disk space. Charles-André Martin
Re: Error loading a custom request handler in Solr 4.0
Interesting.. is this in trunk (4.0)? Maybe I've broken mine somehow! What classpath did you use for compiling? And did you copy anything other than the new jar into lib/ ? thanks, Tom On 10 August 2011 18:07, simon mtnes...@gmail.com wrote: It's working for me. Compiled, inserted in solr/lib, added the config line to solrconfig. when I send a /flaxtest request i get response lst name=responseHeader int name=status0/int int name=QTime16/int /lst str name=FlaxTestHello!/str /response I was doing this within a core defined in solr.xml -Simon On Wed, Aug 10, 2011 at 11:46 AM, Tom Mortimer t...@flax.co.uk wrote: Sure - import org.apache.solr.request.SolrQueryRequest; import org.apache.solr.response.SolrQueryResponse; import org.apache.solr.handler.RequestHandlerBase; public class FlaxTestHandler extends RequestHandlerBase { public FlaxTestHandler() { } public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception { rsp.add(FlaxTest, Hello!); } public String getDescription() { return Flax; } public String getSourceId() { return Flax; } public String getSource() { return Flax; } public String getVersion() { return Flax; } } On 10 August 2011 16:43, simon mtnes...@gmail.com wrote: Th attachment isn't showing up (in gmail, at least). Can you inline the relevant bits of code ? On Wed, Aug 10, 2011 at 11:05 AM, Tom Mortimer t...@flax.co.uk wrote: Hi, Apologies if this is really basic. I'm trying to learn how to create a custom request handler, so I wrote the minimal class (attached), compiled and jar'd it, and placed it in example/lib. I added this to solrconfig.xml: requestHandler name=/flaxtest class=FlaxTestHandler / When I started Solr with java -jar start.jar, I got this: ... SEVERE: java.lang.NoClassDefFoundError: org/apache/solr/handler/RequestHandlerBase at java.lang.ClassLoader.defineClass1(Native Method) ... So I copied all the dist/*.jar files into lib and tried again. This time it seemed to start ok, but browsing to http://localhost:8983/solr/displayed this: org.apache.solr.common.SolrException: Error Instantiating Request Handler, FlaxTestHandler is not a org.apache.solr.request.SolrRequestHandler at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:410) ... Any ideas? thanks, Tom
RE: Building a facet query in SolrJ
I take it back. I didn't find it. I corrected my values and the facet queries still don't find what I want. The values I'm looking for are URIs, so they look like: http://place.org/abc/def I add the facet query like so: query.addFacetQuery(MyField + : + \ + uri + \); I print the query, just to see what it is: Facet Query: MyField: : http://place.org/abc/def; But when I examine queryResponse.getFacetFields, it's an empty list, if I do not set the facet field. If I set the facet field to MyField, then I get facets for ALL the values of MyField, not just the ones in the facet queries. Can anyone help here? Thanks. From: Simon, Richard T Sent: Wednesday, August 10, 2011 11:07 AM To: Simon, Richard T; solr-user@lucene.apache.org Subject: RE: Building a facet query in SolrJ Oops. I think I found it. My desiredFieldValues list has the wrong info. Knew there was something simple wrong. From: Simon, Richard T Sent: Wednesday, August 10, 2011 10:55 AM To: solr-user@lucene.apache.org Cc: Simon, Richard T Subject: Building a facet query in SolrJ Hi - I'm trying to do a (I think) simple facet query, but I'm not getting the results I expect. I have a field, MyField, and I want to get facets for specific values of that field. That is, I want a FacetField if MyField is ABC, DEF, etc. (a specific list of values), but not if MyField is any other value. If I build my query like this: SolrQuery query = new SolrQuery( luceneQueryStr ); query.setStart( request.getStartIndex() ); query.setRows( request.getMaxResults() ); query.setFacet(true); query.setFacetMinCount(1); query.addFacetField(MYFIELD); for (String fieldValue : desiredFieldValues) { query.addFacetQuery(MYFIELD + : + fieldValue); } queryResponse.getFacetFields returns facets for ALL values of MyField. I figured that was because setting the facet field with addFacetField caused Solr to examine all values. But, if I take out that line, then getFacetFields returns an empty list. I'm sure I'm doing something simple wrong, but I'm out of ideas right now. -Rich
Re: Error loading a custom request handler in Solr 4.0
This is in trunk (up to date). Compiler is 1.6.0_26 classpath was dist/apache-solr-solrj-4.0-SNAPSHOT.jar:dist/apache-solr-core-4.0-SNAPSHOT.jar built from trunk just prior by 'ant dist' I'd try again with a clean trunk . -Simon On Wed, Aug 10, 2011 at 1:20 PM, Tom Mortimer t...@flax.co.uk wrote: Interesting.. is this in trunk (4.0)? Maybe I've broken mine somehow! What classpath did you use for compiling? And did you copy anything other than the new jar into lib/ ? thanks, Tom On 10 August 2011 18:07, simon mtnes...@gmail.com wrote: It's working for me. Compiled, inserted in solr/lib, added the config line to solrconfig. when I send a /flaxtest request i get response lst name=responseHeader int name=status0/int int name=QTime16/int /lst str name=FlaxTestHello!/str /response I was doing this within a core defined in solr.xml -Simon On Wed, Aug 10, 2011 at 11:46 AM, Tom Mortimer t...@flax.co.uk wrote: Sure - import org.apache.solr.request.SolrQueryRequest; import org.apache.solr.response.SolrQueryResponse; import org.apache.solr.handler.RequestHandlerBase; public class FlaxTestHandler extends RequestHandlerBase { public FlaxTestHandler() { } public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception { rsp.add(FlaxTest, Hello!); } public String getDescription() { return Flax; } public String getSourceId() { return Flax; } public String getSource() { return Flax; } public String getVersion() { return Flax; } } On 10 August 2011 16:43, simon mtnes...@gmail.com wrote: Th attachment isn't showing up (in gmail, at least). Can you inline the relevant bits of code ? On Wed, Aug 10, 2011 at 11:05 AM, Tom Mortimer t...@flax.co.uk wrote: Hi, Apologies if this is really basic. I'm trying to learn how to create a custom request handler, so I wrote the minimal class (attached), compiled and jar'd it, and placed it in example/lib. I added this to solrconfig.xml: requestHandler name=/flaxtest class=FlaxTestHandler / When I started Solr with java -jar start.jar, I got this: ... SEVERE: java.lang.NoClassDefFoundError: org/apache/solr/handler/RequestHandlerBase at java.lang.ClassLoader.defineClass1(Native Method) ... So I copied all the dist/*.jar files into lib and tried again. This time it seemed to start ok, but browsing to http://localhost:8983/solr/displayed this: org.apache.solr.common.SolrException: Error Instantiating Request Handler, FlaxTestHandler is not a org.apache.solr.request.SolrRequestHandler at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:410) ... Any ideas? thanks, Tom
Re: query time problem
Off the top of my head ... Can you tell if GC is happening more frequently than usual/expected ? Is the index optimized - if not, how many segments ? It's possible that one of the shards is behind a flaky network connection. Is the 10s performance just for the Solr query or wallclock time at the browser ? You can monitor cache statistics from the admin console 'statistics' page Are you seeing anything untoward in the solr logs ? -Simon On Wed, Aug 10, 2011 at 1:11 PM, Charles-Andre Martin charles-andre.mar...@sunmedia.ca wrote: Hi, I've noticed poor performance for my solr queries in the past few days. Queries of that type : http://server:5000/solr/select?q=story_search_field_en:(water boston) OR story_search_field_fr:(water boston)rows=350start=0sort=r_modify_date descshards=shard1:5001/solr,shard2:5002/solrfq=type:(cch_story OR cch_published_story) Are slow (more than 10 seconds). I would like to know if someone knows how I could investigate the problem ? I tried to specify the parameters debugQuery=onexplainOther=on but this doesn't help much. I also monitored the shards log. Sometimes, there is broken pipe in the shards logs. Also, is there a way I could monitor the cache statistics ? For your information, every shards master and slaves computers have enough RAM and disk space. Charles-André Martin
How to start troubleshooting a content extraction issue
Hello So, I'm a newbie to Solr and Tika and whatnot, so please use simple words for me :P I am running Solr on Tomcat 7 on Windows Server 2008 r2, running as the search engine for a Drupal web site. Up until recently, everything has been fine - searching works, faceting works, etc. Recently a user uploaded a 5mb xltm file, which seems to be causing Tomcat to spike in CPU usage, and eventually error out. When the documents are submitted to be index, the tomcat process spikes up to use 100% of 1 available CPU, with the eventual error in Drupal of Exception occured sending *sites/default/files/nodefiles/533/June 30, 2011.xltm* to Solr 0 Status: Communication Error. I am looking for some help in figuring out where to troubleshoot this. I assume it's this file, but I guess I'd like to be sure - so how can I submit this file for content extraction manually to see what happens? Thanks, Tim
Re: Error loading a custom request handler in Solr 4.0
Thanks Simon. I'll try again tomorrow. Tom On 10 August 2011 18:46, simon mtnes...@gmail.com wrote: This is in trunk (up to date). Compiler is 1.6.0_26 classpath was dist/apache-solr-solrj-4.0-SNAPSHOT.jar:dist/apache-solr-core-4.0-SNAPSHOT.jar built from trunk just prior by 'ant dist' I'd try again with a clean trunk . -Simon On Wed, Aug 10, 2011 at 1:20 PM, Tom Mortimer t...@flax.co.uk wrote: Interesting.. is this in trunk (4.0)? Maybe I've broken mine somehow! What classpath did you use for compiling? And did you copy anything other than the new jar into lib/ ? thanks, Tom On 10 August 2011 18:07, simon mtnes...@gmail.com wrote: It's working for me. Compiled, inserted in solr/lib, added the config line to solrconfig. when I send a /flaxtest request i get response lst name=responseHeader int name=status0/int int name=QTime16/int /lst str name=FlaxTestHello!/str /response I was doing this within a core defined in solr.xml -Simon On Wed, Aug 10, 2011 at 11:46 AM, Tom Mortimer t...@flax.co.uk wrote: Sure - import org.apache.solr.request.SolrQueryRequest; import org.apache.solr.response.SolrQueryResponse; import org.apache.solr.handler.RequestHandlerBase; public class FlaxTestHandler extends RequestHandlerBase { public FlaxTestHandler() { } public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception { rsp.add(FlaxTest, Hello!); } public String getDescription() { return Flax; } public String getSourceId() { return Flax; } public String getSource() { return Flax; } public String getVersion() { return Flax; } } On 10 August 2011 16:43, simon mtnes...@gmail.com wrote: Th attachment isn't showing up (in gmail, at least). Can you inline the relevant bits of code ? On Wed, Aug 10, 2011 at 11:05 AM, Tom Mortimer t...@flax.co.uk wrote: Hi, Apologies if this is really basic. I'm trying to learn how to create a custom request handler, so I wrote the minimal class (attached), compiled and jar'd it, and placed it in example/lib. I added this to solrconfig.xml: requestHandler name=/flaxtest class=FlaxTestHandler / When I started Solr with java -jar start.jar, I got this: ... SEVERE: java.lang.NoClassDefFoundError: org/apache/solr/handler/RequestHandlerBase at java.lang.ClassLoader.defineClass1(Native Method) ... So I copied all the dist/*.jar files into lib and tried again. This time it seemed to start ok, but browsing to http://localhost:8983/solr/displayed this: org.apache.solr.common.SolrException: Error Instantiating Request Handler, FlaxTestHandler is not a org.apache.solr.request.SolrRequestHandler at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:410) ... Any ideas? thanks, Tom
Re: Building a facet query in SolrJ
Try making your queries, manually, to see this closer in action... q=MyField:uri and see what you get. In this case, because your URI contains characters that make the default query parser unhappy, do this sort of query instead: {!term f=MyField}uri That way the query is parsed properly into a single term query. I am a little confused below since you're faceting on MyField entirely (addFacetField) where you'd get the values of each URI facet query in that list anyway. Erik On Aug 10, 2011, at 13:42 , Simon, Richard T wrote: I take it back. I didn't find it. I corrected my values and the facet queries still don't find what I want. The values I'm looking for are URIs, so they look like: http://place.org/abc/def I add the facet query like so: query.addFacetQuery(MyField + : + \ + uri + \); I print the query, just to see what it is: Facet Query: MyField: : http://place.org/abc/def; But when I examine queryResponse.getFacetFields, it's an empty list, if I do not set the facet field. If I set the facet field to MyField, then I get facets for ALL the values of MyField, not just the ones in the facet queries. Can anyone help here? Thanks. From: Simon, Richard T Sent: Wednesday, August 10, 2011 11:07 AM To: Simon, Richard T; solr-user@lucene.apache.org Subject: RE: Building a facet query in SolrJ Oops. I think I found it. My desiredFieldValues list has the wrong info. Knew there was something simple wrong. From: Simon, Richard T Sent: Wednesday, August 10, 2011 10:55 AM To: solr-user@lucene.apache.org Cc: Simon, Richard T Subject: Building a facet query in SolrJ Hi - I'm trying to do a (I think) simple facet query, but I'm not getting the results I expect. I have a field, MyField, and I want to get facets for specific values of that field. That is, I want a FacetField if MyField is ABC, DEF, etc. (a specific list of values), but not if MyField is any other value. If I build my query like this: SolrQuery query = new SolrQuery( luceneQueryStr ); query.setStart( request.getStartIndex() ); query.setRows( request.getMaxResults() ); query.setFacet(true); query.setFacetMinCount(1); query.addFacetField(MYFIELD); for (String fieldValue : desiredFieldValues) { query.addFacetQuery(MYFIELD + : + fieldValue); } queryResponse.getFacetFields returns facets for ALL values of MyField. I figured that was because setting the facet field with addFacetField caused Solr to examine all values. But, if I take out that line, then getFacetFields returns an empty list. I'm sure I'm doing something simple wrong, but I'm out of ideas right now. -Rich
RE: Building a facet query in SolrJ
Hi -- I do get facets for all the values of MyField when I specify the facet field, but that's not what I want. I just want facets for a subset of the values of MyField. That's why I'm trying to use the facet queries, to just get facets for those values. -Rich -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: Wednesday, August 10, 2011 2:04 PM To: solr-user@lucene.apache.org Subject: Re: Building a facet query in SolrJ Try making your queries, manually, to see this closer in action... q=MyField:uri and see what you get. In this case, because your URI contains characters that make the default query parser unhappy, do this sort of query instead: {!term f=MyField}uri That way the query is parsed properly into a single term query. I am a little confused below since you're faceting on MyField entirely (addFacetField) where you'd get the values of each URI facet query in that list anyway. Erik On Aug 10, 2011, at 13:42 , Simon, Richard T wrote: I take it back. I didn't find it. I corrected my values and the facet queries still don't find what I want. The values I'm looking for are URIs, so they look like: http://place.org/abc/def I add the facet query like so: query.addFacetQuery(MyField + : + \ + uri + \); I print the query, just to see what it is: Facet Query: MyField: : http://place.org/abc/def; But when I examine queryResponse.getFacetFields, it's an empty list, if I do not set the facet field. If I set the facet field to MyField, then I get facets for ALL the values of MyField, not just the ones in the facet queries. Can anyone help here? Thanks. From: Simon, Richard T Sent: Wednesday, August 10, 2011 11:07 AM To: Simon, Richard T; solr-user@lucene.apache.org Subject: RE: Building a facet query in SolrJ Oops. I think I found it. My desiredFieldValues list has the wrong info. Knew there was something simple wrong. From: Simon, Richard T Sent: Wednesday, August 10, 2011 10:55 AM To: solr-user@lucene.apache.org Cc: Simon, Richard T Subject: Building a facet query in SolrJ Hi - I'm trying to do a (I think) simple facet query, but I'm not getting the results I expect. I have a field, MyField, and I want to get facets for specific values of that field. That is, I want a FacetField if MyField is ABC, DEF, etc. (a specific list of values), but not if MyField is any other value. If I build my query like this: SolrQuery query = new SolrQuery( luceneQueryStr ); query.setStart( request.getStartIndex() ); query.setRows( request.getMaxResults() ); query.setFacet(true); query.setFacetMinCount(1); query.addFacetField(MYFIELD); for (String fieldValue : desiredFieldValues) { query.addFacetQuery(MYFIELD + : + fieldValue); } queryResponse.getFacetFields returns facets for ALL values of MyField. I figured that was because setting the facet field with addFacetField caused Solr to examine all values. But, if I take out that line, then getFacetFields returns an empty list. I'm sure I'm doing something simple wrong, but I'm out of ideas right now. -Rich
RE: query time problem
Thanks Simon for these tracks. Here's my answers : Can you tell if GC is happening more frequently than usual/expected ? GC is OK. Is the index optimized - if not, how many segments ? According to the statistics page from the admin : One shard (master/slave) has 10 segments The other shard (master/slave) has 13 segments Is this ok ? The optimize job is running each day during the night. It's possible that one of the shards is behind a flaky network connection. Will check ... Is the 10s performance just for the Solr query or wallclock time at the browser ? Both You can monitor cache statistics from the admin console 'statistics' page Thanks Are you seeing anything untoward in the solr logs ? I see stacktrace : Aug 10, 2011 1:49:13 PM org.apache.solr.common.SolrException log SEVERE: ClientAbortException: java.net.SocketException: Broken pipe at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:358) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:325) at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:381) at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:370) at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:89) at org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:183) at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:89) at org.apache.solr.request.BinaryResponseWriter.write(BinaryResponseWriter.java:48) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:322) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:740) at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:434) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:349) at org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite(InternalOutputBuffer.java:764) at org.apache.coyote.http11.filters.IdentityOutputFilter.doWrite(IdentityOutputFilter.java:127) at org.apache.coyote.http11.InternalOutputBuffer.doWrite(InternalOutputBuffer.java:573) at org.apache.coyote.Response.doWrite(Response.java:560) at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:353) ... 21 more Charles-André Martin 800 Square Victoria Montréal (Québec) H4Z 0A3 Tél : (514) 504-2703 -Message d'origine- De : simon [mailto:mtnes...@gmail.com] Envoyé : August-10-11 1:52 PM À : solr-user@lucene.apache.org Objet : Re: query time problem Off the top of my head ... Can you tell if GC is happening more frequently than usual/expected ? Is the index optimized - if not, how many segments ? It's possible that one of the shards is behind a flaky network connection. Is the 10s performance just for the Solr query or wallclock time at the browser ? You can monitor cache statistics from the admin console 'statistics' page Are you seeing anything untoward in the solr logs ? -Simon On Wed, Aug 10, 2011 at 1:11 PM, Charles-Andre Martin charles-andre.mar...@sunmedia.ca wrote: Hi, I've noticed poor performance for my solr queries in the past few days. Queries of that type : http://server:5000/solr/select?q=story_search_field_en:(water boston) OR story_search_field_fr:(water boston)rows=350start=0sort=r_modify_date
Can't mix Synonyms with Shingles?
I would like to combine the ShingleFilterFactory with a SynonymFilterFactory in a field type. I've looked at something like this using the analysis.jsp tool: fieldType name=TestTerm class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 stemEnglishPosessive=1/ filter class=solr.ShingleFilterFactory tokenSeparator= / filter class=solr.SynonymFilterFactory synonyms=synonyms.BusinessNames.txt ignoreCase=true expand=true/ ... /analyzer analyzer type=query ... /analyzer /fieldType However, when a ShingleFilterFactory is applied first, the SynonymFilterFactory appears to do nothing. I haven't found any documentation or other warnings against this combination, and I don't want to apply shingles after synonyms (this works) because multi-word synonyms then cause severe term expansion. I don't really mind if the synonyms fail to match shingles, (although I'd prefer they succeed) but I'd at least expect that synonyms would continue to match on the original tokens, as they do if I remove the ShingleFilterFactory. I'm using Solr 3.3, any clarification would be appreciated. Thanks, -Jeff Wartes
Re: Error loading a custom request handler in Solr 4.0
: custom request handler, so I wrote the minimal class (attached), compiled : and jar'd it, and placed it in example/lib. I added this to solrconfig.xml: that's the crux of hte issue. example/lib is where the jetty libraries live -- not solr plugins. you should either put your custom jar's in the lib dir of your solr home (ie: example/solr/lib) or put it in a directory of your choice that you refer to from your solrconfig.xml file using a lib/ directive. : So I copied all the dist/*.jar files into lib and tried again. This time it ouch ... make sure you remove *all* of those, or you will have no end of random obscure classpath issues at random times as jars are sometimes loaded from the war and sometimes loaded from that directory. -Hoss
RE: Can't mix Synonyms with Shingles?
Hi Jeff, Hi Jeff, You have configured ShingleFilterFactory with a token separator of , so e.g. International Corporation will output the shingle InternationalCorporation. If this is the form you want to use for synonym matching, it must exist in your synonym file. Does it? Steve -Original Message- From: Jeff Wartes [mailto:jwar...@whitepages.com] Sent: Wednesday, August 10, 2011 3:43 PM To: solr-user@lucene.apache.org Subject: Can't mix Synonyms with Shingles? I would like to combine the ShingleFilterFactory with a SynonymFilterFactory in a field type. I've looked at something like this using the analysis.jsp tool: fieldType name=TestTerm class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 stemEnglishPosessive=1/ filter class=solr.ShingleFilterFactory tokenSeparator= / filter class=solr.SynonymFilterFactory synonyms=synonyms.BusinessNames.txt ignoreCase=true expand=true/ ... /analyzer analyzer type=query ... /analyzer /fieldType However, when a ShingleFilterFactory is applied first, the SynonymFilterFactory appears to do nothing. I haven't found any documentation or other warnings against this combination, and I don't want to apply shingles after synonyms (this works) because multi-word synonyms then cause severe term expansion. I don't really mind if the synonyms fail to match shingles, (although I'd prefer they succeed) but I'd at least expect that synonyms would continue to match on the original tokens, as they do if I remove the ShingleFilterFactory. I'm using Solr 3.3, any clarification would be appreciated. Thanks, -Jeff Wartes
RE: Building a facet query in SolrJ
: query.addFacetQuery(MyField + : + \ + uri + \); ... : But when I examine queryResponse.getFacetFields, it's an empty list, if facet.query constraints+counts do not come back in the facet.field section of hte response. they come back in the facet.query section of the response (look at the XML in your browser and you'll see what i mean)... https://lucene.apache.org/solr/api/org/apache/solr/client/solrj/response/QueryResponse.html#getFacetQuery%28%29 -Hoss
Re: Example Solr Config on EC2
If I were to build a master with multiple slaves, is it possible to promote a slave to be the new master if the original master fails? Will all the slaves pickup right where they left off, or any time the master fails will we need to completely regenerate all the data? If this is possible, are there any examples of this being automated? Especially on Win2k3. Matthew Shields Owner BeanTown Host - Web Hosting, Domain Names, Dedicated Servers, Colocation, Managed Services www.beantownhost.com www.sysadminvalley.com www.jeeprally.com On Mon, Aug 8, 2011 at 5:34 PM, mboh...@yahoo.com wrote: Matthew, Here's another resource: http://www.lucidimagination.com/blog/2010/02/01/solr-shines-through-the-cloud-lucidworks-solr-on-ec2/ Michael Bohlig Lucid Imagination - Original Message From: Matt Shields m...@mattshields.org To: solr-user@lucene.apache.org Sent: Mon, August 8, 2011 2:03:20 PM Subject: Example Solr Config on EC2 I'm looking for some examples of how to setup Solr on EC2. The configuration I'm looking for would have multiple nodes for redundancy. I've tested in-house with a single master and slave with replication running in Tomcat on Windows Server 2003, but even if I have multiple slaves the single master is a single point of failure. Any suggestions or example configurations? The project I'm working on is a .NET setup, so ideally I'd like to keep this search cluster on Windows Server, even though I prefer Linux. Matthew Shields Owner BeanTown Host - Web Hosting, Domain Names, Dedicated Servers, Colocation, Managed Services www.beantownhost.com www.sysadminvalley.com www.jeeprally.com
Problem with xinclude in solrconfig.xml
Hi, Guys, Based on the document below, I should be able to include a file under the same directory by specifying relative path via xinclude in solrconfig.xml: http://wiki.apache.org/solr/SolrConfigXml However I am getting the following error when I use relative path (absolute path works fine though): SEVERE: org.xml.sax.SAXParseException: Error attempting to parse XML file Any ideas? Thanks, YH
Re: Problem with xinclude in solrconfig.xml
Sorry for the spam. I just figured it out. Thanks. On Wed, Aug 10, 2011 at 2:17 PM, Way Cool way1.wayc...@gmail.com wrote: Hi, Guys, Based on the document below, I should be able to include a file under the same directory by specifying relative path via xinclude in solrconfig.xml: http://wiki.apache.org/solr/SolrConfigXml However I am getting the following error when I use relative path (absolute path works fine though): SEVERE: org.xml.sax.SAXParseException: Error attempting to parse XML file Any ideas? Thanks, YH
RE: Can't mix Synonyms with Shingles?
Hi Steven, The token separator was certainly a deliberate choice, are you saying that after applying shingles, synonyms can only match shingled terms? The term analysis suggests the original tokens still exist. You've made me realize that only certain synonyms seem to have problems though, so it's not a blanket failure. Take this synonym definition: wamu, washington mutual bank, washington mutual Indexing wamu looks like it'll work fine - there are no shingles, and all three synonym expansions appear to get indexed. (expand=true) However, indexing washington mutual applies the shingles correctly, (adds washingtonmutual to position 1) but the synonym expansion does not happen. I would still expect the synonym definition to match the original terms and index 'wamu' along with the other stuff. Thanks. -Original Message- From: Steven A Rowe [mailto:sar...@syr.edu] Sent: Wednesday, August 10, 2011 12:54 PM To: solr-user@lucene.apache.org Subject: RE: Can't mix Synonyms with Shingles? Hi Jeff, Hi Jeff, You have configured ShingleFilterFactory with a token separator of , so e.g. International Corporation will output the shingle InternationalCorporation. If this is the form you want to use for synonym matching, it must exist in your synonym file. Does it? Steve -Original Message- From: Jeff Wartes [mailto:jwar...@whitepages.com] Sent: Wednesday, August 10, 2011 3:43 PM To: solr-user@lucene.apache.org Subject: Can't mix Synonyms with Shingles? I would like to combine the ShingleFilterFactory with a SynonymFilterFactory in a field type. I've looked at something like this using the analysis.jsp tool: fieldType name=TestTerm class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 stemEnglishPosessive=1/ filter class=solr.ShingleFilterFactory tokenSeparator= / filter class=solr.SynonymFilterFactory synonyms=synonyms.BusinessNames.txt ignoreCase=true expand=true/ ... /analyzer analyzer type=query ... /analyzer /fieldType However, when a ShingleFilterFactory is applied first, the SynonymFilterFactory appears to do nothing. I haven't found any documentation or other warnings against this combination, and I don't want to apply shingles after synonyms (this works) because multi-word synonyms then cause severe term expansion. I don't really mind if the synonyms fail to match shingles, (although I'd prefer they succeed) but I'd at least expect that synonyms would continue to match on the original tokens, as they do if I remove the ShingleFilterFactory. I'm using Solr 3.3, any clarification would be appreciated. Thanks, -Jeff Wartes
Solr 3.3: DIH configuration for Oracle
Hello, all! I want to create a good DIH configuration for my Oracle database with deltas support. Unfortunately I am not able to do it well as DIH has the strange restrictions. I want to explain a problem on a simple example. In a reality my database has very difficult structure. Initial conditions: Two tables with following easy structure: Table1 - ID_RECORD(Primary key) - DATA_FIELD1 - .. - DATA_FIELD2 - LAST_CHANGE_TIME Table2 - ID_RECORD(Primary key) - PARENT_ID_RECORD (Foreign key to Table1.ID_RECORD) - DATA_FIELD1 - .. - DATA_FIELD2 - LAST_CHANGE_TIME In performance reasons it is necessary to do selection of the given tables by means of one request (via inner join). My db-data-config.xml file: ?xml version=1.0 encoding=UTF-8? dataConfig dataSource jndiName=jdbc/DB1 type=JdbcDataSource user= password=/ document entity name=ent pk=T1_ID_RECORD, T2_ID_RECORD query=select * from TABLE1 t1 inner join TABLE2 t2 on t1.ID_RECORD = t2.PARENT_ID_RECORD deltaQuery=select t1.ID_RECORD T1_ID_RECORD, t1.ID_RECORD T2_ID_RECORD from TABLE1 t1 inner join TABLE2 t2 on t1.ID_RECORD = t2.PARENT_ID_RECORD where TABLE1.LAST_CHANGE_TIME to_date('${dataimporter.last_index_time}', '-MM-DD HH24:MI:SS') or TABLE2.LAST_CHANGE_TIME to_date('${dataimporter.last_index_time}', '-MM-DD HH24:MI:SS') deltaImportQuery=select * from TABLE1 t1 inner join TABLE2 t2 on t1.ID_RECORD = t2.PARENT_ID_RECORD where t1.ID_RECORD = ${dataimporter.delta.T1_ID_RECORD} and t2.ID_RECORD = ${dataimporter.delta.T2_ID_RECORD} / /document /dataConfig In result I have following error: java.lang.IllegalArgumentException: deltaQuery has no column to resolve to declared primary key pk='T1_ID_RECORD, T2_ID_RECORD' I have analyzed the source code of DIH. I found that in the DocBuilder class collectDelta() method works with value of entity attribute pk as with simple string. But in my case this is array with two values: T1_ID_RECORD, T2_ID_RECORD What do I do wrong? Thanks, Eugeny
Increasing the highlight snippet size
Hi, I have been trying to increase the size of the highlight snippets using hl.fragSize parameter, without much success. It seems that hl.fragSize is not making any difference at all in terms of snippet size. For example, compare the following two set of query/results: http://10.1.1.51:8983/solr/select?q=%28bookCode%3abarglewargle+AND+content%3awriting+AND+id:6970%29rows=1sort=id+ascfl=id%2cbookCode%2cnavPointId%2csectionTitlehl=truehl.fl=contenthl.snippets=100hl.fragSize=10hl.maxAnalyzedChars=-1version=2.2 /spanspan id=w20422 class=werd to/spanspan id=w20423 class=werd emwrite/em/spanspan id=w20424 class=werd a http://10.1.1.51:8983/solr/select?q=%28bookCode%3abarglewargle+AND+content%3awriting+AND+id:6970%29rows=1sort=id+ascfl=id%2cbookCode%2cnavPointId%2csectionTitlehl=truehl.fl=contenthl.snippets=100hl.fragSize=1000hl.maxAnalyzedChars=-1version=2.2 /spanspan id=w20422 class=werd to/spanspan id=w20423 class=werd emwrite/em/spanspan id=w20424 class=werd a Because of our particular needs, the content has been spanified, each word with its own span id. I do apply HTMLStrip during the index time. What I would like to do is to increase the size of snippet so that the highlighted snippets contain more surrounding words. Although hl.fragSize went from 10 to 1000, the result is the same. This leads me to believe that hl.fragSize might not be the correct parameter to achieve the effect i am looking for. If so, what parameter should I use? Thanks!
Re: Example Solr Config on EC2
Yes you can promote a slave to be master refer http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node In AWS one can use an elastic IP(http://aws.amazon.com/articles/1346) to refer to the master and this can be assigned to slaves as they assume the role of master(in case of failure). All slaves will then refer to this new master and there will be no need to regenerate data. Automation of this maybe possible through CloudWatch alarm-actions. I don't know of any available example automation scripts. Cheers Akshay. On Wed, Aug 10, 2011 at 9:08 PM, Matt Shields m...@mattshields.org wrote: If I were to build a master with multiple slaves, is it possible to promote a slave to be the new master if the original master fails? Will all the slaves pickup right where they left off, or any time the master fails will we need to completely regenerate all the data? If this is possible, are there any examples of this being automated? Especially on Win2k3. Matthew Shields Owner BeanTown Host - Web Hosting, Domain Names, Dedicated Servers, Colocation, Managed Services www.beantownhost.com www.sysadminvalley.com www.jeeprally.com On Mon, Aug 8, 2011 at 5:34 PM, mboh...@yahoo.com wrote: Matthew, Here's another resource: http://www.lucidimagination.com/blog/2010/02/01/solr-shines-through-the-cloud-lucidworks-solr-on-ec2/ Michael Bohlig Lucid Imagination - Original Message From: Matt Shields m...@mattshields.org To: solr-user@lucene.apache.org Sent: Mon, August 8, 2011 2:03:20 PM Subject: Example Solr Config on EC2 I'm looking for some examples of how to setup Solr on EC2. The configuration I'm looking for would have multiple nodes for redundancy. I've tested in-house with a single master and slave with replication running in Tomcat on Windows Server 2003, but even if I have multiple slaves the single master is a single point of failure. Any suggestions or example configurations? The project I'm working on is a .NET setup, so ideally I'd like to keep this search cluster on Windows Server, even though I prefer Linux. Matthew Shields Owner BeanTown Host - Web Hosting, Domain Names, Dedicated Servers, Colocation, Managed Services www.beantownhost.com www.sysadminvalley.com www.jeeprally.com
Re: Increasing the highlight snippet size
an hl.fragsize of 1000 is problematical, as Solr parses that parameter as a 32 bit int... that's several bits more. -Simon On Wed, Aug 10, 2011 at 4:59 PM, Sang Yum sang...@gmail.com wrote: Hi, I have been trying to increase the size of the highlight snippets using hl.fragSize parameter, without much success. It seems that hl.fragSize is not making any difference at all in terms of snippet size. For example, compare the following two set of query/results: http://10.1.1.51:8983/solr/select?q=%28bookCode%3abarglewargle+AND+content%3awriting+AND+id:6970%29rows=1sort=id+ascfl=id%2cbookCode%2cnavPointId%2csectionTitlehl=truehl.fl=contenthl.snippets=100hl.fragSize=10hl.maxAnalyzedChars=-1version=2.2 /spanspan id=w20422 class=werd to/spanspan id=w20423 class=werd emwrite/em/spanspan id=w20424 class=werd a http://10.1.1.51:8983/solr/select?q=%28bookCode%3abarglewargle+AND+content%3awriting+AND+id:6970%29rows=1sort=id+ascfl=id%2cbookCode%2cnavPointId%2csectionTitlehl=truehl.fl=contenthl.snippets=100hl.fragSize=1000hl.maxAnalyzedChars=-1version=2.2 /spanspan id=w20422 class=werd to/spanspan id=w20423 class=werd emwrite/em/spanspan id=w20424 class=werd a Because of our particular needs, the content has been spanified, each word with its own span id. I do apply HTMLStrip during the index time. What I would like to do is to increase the size of snippet so that the highlighted snippets contain more surrounding words. Although hl.fragSize went from 10 to 1000, the result is the same. This leads me to believe that hl.fragSize might not be the correct parameter to achieve the effect i am looking for. If so, what parameter should I use? Thanks!
Re: Increasing the highlight snippet size
I was just trying to set it a ridiculously large number to make it work. What I am seeing is that hl.fragsize doesn't seem to make any difference in term of highlight snippet size... I just tried the query with hl.fragsize set to 1000. Same result as 10. On Wed, Aug 10, 2011 at 2:20 PM, simon mtnes...@gmail.com wrote: an hl.fragsize of 1000 is problematical, as Solr parses that parameter as a 32 bit int... that's several bits more. -Simon On Wed, Aug 10, 2011 at 4:59 PM, Sang Yum sang...@gmail.com wrote: Hi, I have been trying to increase the size of the highlight snippets using hl.fragSize parameter, without much success. It seems that hl.fragSize is not making any difference at all in terms of snippet size. For example, compare the following two set of query/results: http://10.1.1.51:8983/solr/select?q=%28bookCode%3abarglewargle+AND+content%3awriting+AND+id:6970%29rows=1sort=id+ascfl=id%2cbookCode%2cnavPointId%2csectionTitlehl=truehl.fl=contenthl.snippets=100hl.fragSize=10hl.maxAnalyzedChars=-1version=2.2 /spanspan id=w20422 class=werd to/spanspan id=w20423 class=werd emwrite/em/spanspan id=w20424 class=werd a http://10.1.1.51:8983/solr/select?q=%28bookCode%3abarglewargle+AND+content%3awriting+AND+id:6970%29rows=1sort=id+ascfl=id%2cbookCode%2cnavPointId%2csectionTitlehl=truehl.fl=contenthl.snippets=100hl.fragSize=1000hl.maxAnalyzedChars=-1version=2.2 /spanspan id=w20422 class=werd to/spanspan id=w20423 class=werd emwrite/em/spanspan id=w20424 class=werd a Because of our particular needs, the content has been spanified, each word with its own span id. I do apply HTMLStrip during the index time. What I would like to do is to increase the size of snippet so that the highlighted snippets contain more surrounding words. Although hl.fragSize went from 10 to 1000, the result is the same. This leads me to believe that hl.fragSize might not be the correct parameter to achieve the effect i am looking for. If so, what parameter should I use? Thanks! -- http://twitter.com/sangyum
Re: Cache replication
Thanks for the advice paul, but post processing is a must for me given the nature of my application. I haven't had problems yet though. -- View this message in context: http://lucene.472066.n3.nabble.com/Cache-replication-tp3240708p3244202.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Can't mix Synonyms with Shingles?
After some further playing around, I think I understand what's going on. Because the SynonymFilterFactory pays attention to term position when it inserts a multi-word synonym, I had assumed it scanned for matches in a way that respected term position as well. (ie, for a two-word synonym, I assumed it would try to find the second word in position n+1 if it found the first word in position n) This does not appear to be the case. It appears to find multi-word synonym matches by simply walking the list of terms, exhausting all the terms in position one before looking at any terms in position two. The ShingleFilter adds terms to most positions, so that throws off the 'adjacency' of the flattened list of terms. Meaning, a two-word synonym can only match if the synonym consists of the original term (position 1) followed by the added shingle (also in position 1). Perhaps a better description is if you're looking at the analysis.jsp display, it does not scan for multi-word synonym tokens across then down, it scans down then across. It doesn't look like there's a way to do what I'm trying to do (index shingles AND multi-word synonyms in one field) without writing my own filter. -Original Message- From: Jeff Wartes [mailto:jwar...@whitepages.com] Sent: Wednesday, August 10, 2011 1:27 PM To: solr-user@lucene.apache.org Subject: RE: Can't mix Synonyms with Shingles? Hi Steven, The token separator was certainly a deliberate choice, are you saying that after applying shingles, synonyms can only match shingled terms? The term analysis suggests the original tokens still exist. You've made me realize that only certain synonyms seem to have problems though, so it's not a blanket failure. Take this synonym definition: wamu, washington mutual bank, washington mutual Indexing wamu looks like it'll work fine - there are no shingles, and all three synonym expansions appear to get indexed. (expand=true) However, indexing washington mutual applies the shingles correctly, (adds washingtonmutual to position 1) but the synonym expansion does not happen. I would still expect the synonym definition to match the original terms and index 'wamu' along with the other stuff. Thanks. -Original Message- From: Steven A Rowe [mailto:sar...@syr.edu] Sent: Wednesday, August 10, 2011 12:54 PM To: solr-user@lucene.apache.org Subject: RE: Can't mix Synonyms with Shingles? Hi Jeff, Hi Jeff, You have configured ShingleFilterFactory with a token separator of , so e.g. International Corporation will output the shingle InternationalCorporation. If this is the form you want to use for synonym matching, it must exist in your synonym file. Does it? Steve -Original Message- From: Jeff Wartes [mailto:jwar...@whitepages.com] Sent: Wednesday, August 10, 2011 3:43 PM To: solr-user@lucene.apache.org Subject: Can't mix Synonyms with Shingles? I would like to combine the ShingleFilterFactory with a SynonymFilterFactory in a field type. I've looked at something like this using the analysis.jsp tool: fieldType name=TestTerm class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 stemEnglishPosessive=1/ filter class=solr.ShingleFilterFactory tokenSeparator= / filter class=solr.SynonymFilterFactory synonyms=synonyms.BusinessNames.txt ignoreCase=true expand=true/ ... /analyzer analyzer type=query ... /analyzer /fieldType However, when a ShingleFilterFactory is applied first, the SynonymFilterFactory appears to do nothing. I haven't found any documentation or other warnings against this combination, and I don't want to apply shingles after synonyms (this works) because multi-word synonyms then cause severe term expansion. I don't really mind if the synonyms fail to match shingles, (although I'd prefer they succeed) but I'd at least expect that synonyms would continue to match on the original tokens, as they do if I remove the ShingleFilterFactory. I'm using Solr 3.3, any clarification would be appreciated. Thanks, -Jeff Wartes
Re: Increasing the highlight snippet size
Well, only after I posted this question in a public forum, I found the cause of my problem. I was using hl.fragSize, instead of hl.fragsize. After correcting the case, it worked as expected. Thanks. On Wed, Aug 10, 2011 at 3:19 PM, Sang Yum sang...@gmail.com wrote: I was just trying to set it a ridiculously large number to make it work. What I am seeing is that hl.fragsize doesn't seem to make any difference in term of highlight snippet size... I just tried the query with hl.fragsize set to 1000. Same result as 10. On Wed, Aug 10, 2011 at 2:20 PM, simon mtnes...@gmail.com wrote: an hl.fragsize of 1000 is problematical, as Solr parses that parameter as a 32 bit int... that's several bits more. -Simon On Wed, Aug 10, 2011 at 4:59 PM, Sang Yum sang...@gmail.com wrote: Hi, I have been trying to increase the size of the highlight snippets using hl.fragSize parameter, without much success. It seems that hl.fragSize is not making any difference at all in terms of snippet size. For example, compare the following two set of query/results: http://10.1.1.51:8983/solr/select?q=%28bookCode%3abarglewargle+AND+content%3awriting+AND+id:6970%29rows=1sort=id+ascfl=id%2cbookCode%2cnavPointId%2csectionTitlehl=truehl.fl=contenthl.snippets=100hl.fragSize=10hl.maxAnalyzedChars=-1version=2.2 /spanspan id=w20422 class=werd to/spanspan id=w20423 class=werd emwrite/em/spanspan id=w20424 class=werd a http://10.1.1.51:8983/solr/select?q=%28bookCode%3abarglewargle+AND+content%3awriting+AND+id:6970%29rows=1sort=id+ascfl=id%2cbookCode%2cnavPointId%2csectionTitlehl=truehl.fl=contenthl.snippets=100hl.fragSize=1000hl.maxAnalyzedChars=-1version=2.2 /spanspan id=w20422 class=werd to/spanspan id=w20423 class=werd emwrite/em/spanspan id=w20424 class=werd a Because of our particular needs, the content has been spanified, each word with its own span id. I do apply HTMLStrip during the index time. What I would like to do is to increase the size of snippet so that the highlighted snippets contain more surrounding words. Although hl.fragSize went from 10 to 1000, the result is the same. This leads me to believe that hl.fragSize might not be the correct parameter to achieve the effect i am looking for. If so, what parameter should I use? Thanks! -- http://twitter.com/sangyum -- http://twitter.com/sangyum
Re: Can't mix Synonyms with Shingles?
On Wed, Aug 10, 2011 at 7:10 PM, Jeff Wartes jwar...@whitepages.com wrote: After some further playing around, I think I understand what's going on. Because the SynonymFilterFactory pays attention to term position when it inserts a multi-word synonym, I had assumed it scanned for matches in a way that respected term position as well. (ie, for a two-word synonym, I assumed it would try to find the second word in position n+1 if it found the first word in position n) This does not appear to be the case. It appears to find multi-word synonym matches by simply walking the list of terms, exhausting all the terms in position one before looking at any terms in position two. this is correct: and i think it would cause some serious bad performance otherwise: if you have a tokenstream like this: (A B C) (D E F) (G H I) ..., and are matching multiword synonyms, it can potentially explode at least in terms of cpu time and all the state-saving/restoring/copying and stuff it would need to start considering the tokenstream as more of a token-confusion-network, and it gets worse if you think about position increments 1. at least recently in svn, the limitation is documented: http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/java/org/apache/lucene/analysis/synonym/SynonymFilter.java -- lucidimagination.com
Hudson build issues
Whenever I try to build this on our hudson server it says it can't find org.apache.lucene:lucene-xercesImpl:jar:4.0-SNAPSHOT. Is the Apache repo lacking this artifact? -- View this message in context: http://lucene.472066.n3.nabble.com/Hudson-build-issues-tp3244563p3244563.html Sent from the Solr - User mailing list archive at Nabble.com.
LockObtainFailedException
Hi, We are doing streaming update to solr for multiple user, We are getting Aug 10, 2011 11:56:55 AM org.apache.solr.common.SolrException log SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/var/lib/solr/data/index/write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:84) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1097) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:83) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102) at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint Aug 10, 2011 12:00:16 PM org.apache.solr.common.SolrException log SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/var/lib/solr/data/index/write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:84) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1097) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:83) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102) at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662)
Re: Indexing tweet and searching @keyword OR #keyword
Do you really want a search on ipad to *fail* to match input of #ipad? Or vice-versa? My requirement is : I want to search both '#ipad' and 'ipad' for q='ipad' BUT for q='#ipad' I want to search ONLY '#ipad' excluding 'ipad'. On 10 August 2011 19:49, Erick Erickson erickerick...@gmail.com wrote: Please look more carefully at the documentation for WDDF, specifically: split on intra-word delimiters (all non alpha-numeric characters). WordDelimiterFilterFactory will always throw away non alpha-numeric characters, you can't tell it do to otherwise. Try some of the other tokenizers/analyzers to get what you want, and also look at the admin/analysis page to see what the exact effects are of your fieldType definitions. Here's a great place to start: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters You probably want something like WhitespaceTokenizerFactory followed by LowerCaseFilterFactory or some such... But I really question whether this is what you want either. Do you really want a search on ipad to *fail* to match input of #ipad? Or vice-versa? KeywordTokenizerFactory is probably not the place you want to start, the tokenization process doesn't break anything up, you happen to be getting separate tokens because of WDDF, which as you see can't process things the way you want. Best Erick On Wed, Aug 10, 2011 at 3:09 AM, Mohammad Shariq shariqn...@gmail.com wrote: I tried tweaking WordDelimiterFactory but I won't accept # OR @ symbols and it ignored totally. I need solution plz suggest. On 4 August 2011 21:08, Jonathan Rochkind rochk...@jhu.edu wrote: It's the WordDelimiterFactory in your filter chain that's removing the punctuation entirely from your index, I think. Read up on what the WordDelimiter filter does, and what it's settings are; decide how you want things to be tokenized in your index to get the behavior your want; either get WordDelimiter to do it that way by passing it different arguments, or stop using WordDelimiter; come back with any questions after trying that! On 8/4/2011 11:22 AM, Mohammad Shariq wrote: I have indexed around 1 million tweets ( using text dataType). when I search the tweet with # OR @ I dont get the exact result. e.g. when I search for #ipad OR @ipad I get the result where ipad is mentioned skipping the # and @. please suggest me, how to tune or what are filterFactories to use to get the desired result. I am indexing the tweet as text, below is text which is there in my schema.xml. fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.**KeywordTokenizerFactory/ filter class=solr.**CommonGramsFilterFactory words=stopwords.txt minShingleSize=3 maxShingleSize=3 ignoreCase=true/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.**SnowballPorterFilterFactory protected=protwords.txt language=English/ /analyzer analyzer type=query tokenizer class=solr.**KeywordTokenizerFactory/ filter class=solr.**CommonGramsFilterFactory words=stopwords.txt minShingleSize=3 maxShingleSize=3 ignoreCase=true/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.**SnowballPorterFilterFactory protected=protwords.txt language=English/ /analyzer /fieldType -- Thanks and Regards Mohammad Shariq -- Thanks and Regards Mohammad Shariq