Re: Trouble handling Unit symbol
Hi All, I tried to index with UTF-8 encode but the issue is still not fixed. Please see my inputs below. *Indexed XML:* ?xml version=1.0 encoding=UTF-8 ? add doc field name=ID0.100/field field name=BODYµ/field /doc /add *Search Query - * BODY:µ numfound : 0 results obtained. *What can be the reason for this? How do i need to make search query so that the above document is found.* Thanks Regards Regards Rajani 2012/4/2 Rajani Maski rajinima...@gmail.com Thank you for the reply. On Sat, Mar 31, 2012 at 3:38 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : We have data having such symbols like : ต : Indexed data has -Dose:0 ตL : Now , when it is searched as - Dose:0 ตL ... : Query Q value observed : str name=qS257:0 ยตL/injection/str First off: your when searched as example does not match up to your Query Q observed value (ie: field queries, extra /injection text at the end) suggesting that you maybe cut/paste something you didn't mean to -- so take the rest of this advice with a grain of salt. If i ignore your when it is searched as exampleand focus entirely on what you say you've indexed the data as, and the Q value you are sing (in what looks like the echoParams output) then the first thing that jumps out at me is that it looks like your servlet container (or perhaps your web browser if that's where you tested this) is not dealing with the unicode correctly -- because allthough i see a ต in the first three lines i quoted above (UTF8: 0xC2 0xB5) in your value observed i'm seeing it preceeded by a ย (UTF8: 0xC3 0x82) ... suggesting that perhaps the ต did not get URL encoded properly when the request was made to your servlet container? In particular, you might want to take a look at... https://wiki.apache.org/solr/FAQ#Why_don.27t_International_Characters_Work.3F http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config The example/exampledocs/test_utf8.sh script included with solr -Hoss
How to read SOLR cache statistics?
Does anyone explain what does the following parameters mean in SOLR cache statistics? *name*: queryResultCache *class*: org.apache.solr.search.LRUCache *version*: 1.0 *description*: LRU Cache(maxSize=512, initialSize=512) *stats*: lookups : 98 *hits *: 59 *hitratio *: 0.60 *inserts *: 41 *evictions *: 0 *size *: 41 *warmupTime *: 0 *cumulative_lookups *: 98 *cumulative_hits *: 59 *cumulative_hitratio *: 0.60 *cumulative_inserts *: 39 *cumulative_evictions *: 0 AND also this *name*: fieldValueCache *class*: org.apache.solr.search.FastLRUCache *version*: 1.0 *description*: Concurrent LRU Cache(maxSize=1, initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false) *stats*: *lookups *: 8 *hits *: 4 *hitratio *: 0.50 *inserts *: 2 *evictions *: 0 *size *: 2 *warmupTime *: 0 *cumulative_lookups *: 8 *cumulative_hits *: 4 *cumulative_hitratio *: 0.50 *cumulative_inserts *: 2 *cumulative_evictions *: 0 *item_ABC *: {field=ABC,memSize=340592,tindexSize=1192,time=1360,phase1=1344,nTerms=7373,bigTerms=1,termInstances=11513,uses=4} *item_BCD *: {field=BCD,memSize=341248,tindexSize=1952,time=1688,phase1=1688,nTerms=8075,bigTerms=0,termInstances=13510,uses=2} Without understanding these terms i cannot configure server for better cache usage. The point is searches are very slow. These stats were taken when server was down and restarted. I just want to understand what these terms mean actually -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-read-SOLR-cache-statistics-tp3907294p3907294.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Lexical analysis tools for German language data
On Thu, Apr 12, 2012 at 03:46:56PM +, Michael Ludwig wrote: Von: Walter Underwood German noun decompounding is a little more complicated than it might seem. There can be transformations or inflections, like the s in Weinachtsbaum (Weinachten/Baum). I remember from my linguistics studies that the terminus technicus for these is Fugenmorphem (interstitial or joint morpheme) [...] IANAL (I am not a linguist -- pun intended ;) but I've always read that as a genitive. Any pointers? Regards -- Tomás Zerolo Axel Springer AG Axel Springer media Systems BILD Produktionssysteme Axel-Springer-Straße 65 10888 Berlin Tel.: +49 (30) 2591-72875 tomas.zer...@axelspringer.de www.axelspringer.de Axel Springer AG, Sitz Berlin, Amtsgericht Charlottenburg, HRB 4998 Vorsitzender des Aufsichtsrats: Dr. Giuseppe Vita Vorstand: Dr. Mathias Döpfner (Vorsitzender) Jan Bayer, Ralph Büchi, Lothar Lanz, Dr. Andreas Wiele
Re: Solr Scoring
another way is to use payload http://wiki.apache.org/solr/Payloads the advantage of payload is that you only need one field and can make frq file smaller than use two fields. but the disadvantage is payload is stored in prx file, so I am not sure which one is fast. maybe you can try them both. On Fri, Apr 13, 2012 at 8:04 AM, Erick Erickson erickerick...@gmail.comwrote: GAH! I had my head in make this happen in one field when I wrote my response, without being explicit. Of course Walter's solution is pretty much the standard way to deal with this. Best Erick On Thu, Apr 12, 2012 at 5:38 PM, Walter Underwood wun...@wunderwood.org wrote: It is easy. Create two fields, text_exact and text_stem. Don't use the stemmer in the first chain, do use the stemmer in the second. Give the text_exact a bigger weight than text_stem. wunder On Apr 12, 2012, at 4:34 PM, Erick Erickson wrote: No, I don't think there's an OOB way to make this happen. It's a recurring theme, make exact matches score higher than stemmed matches. Best Erick On Thu, Apr 12, 2012 at 5:18 AM, Kissue Kissue kissue...@gmail.com wrote: Hi, I have a field in my index called itemDesc which i am applying EnglishMinimalStemFilterFactory to. So if i index a value to this field containing Edges, the EnglishMinimalStemFilterFactory applies stemming and Edges becomes Edge. Now when i search for Edges, documents with Edge score better than documents with the actual search word - Edges. Is there a way i can make documents with the actual search word in this case Edges score better than document with Edge? I am using Solr 3.5. My field definition is shown below: fieldType name=text_en class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPossessiveFilterFactory/ filter class=solr.EnglishMinimalStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPossessiveFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.EnglishMinimalStemFilterFactory/ /analyzer /fieldType Thanks.
Re: EmbeddedSolrServer and StreamingUpdateSolrServer
Did I get right that you have two separate processes (different app) access the same LuceneDIrectory simultaneously? In this case I suggest to read about Locking mechanism. I'm not really experienced in it. You showed logs from StrUpdHandler failure, it's clear. Can you show logs from Embeded server commit, which is supposed to be successful? On Fri, Apr 13, 2012 at 9:34 AM, pcrao purn...@gmail.com wrote: Hi Shawn, Thanks for sharing your opinion. Mikhail Khludnev, what do you think of Shawn's opinion? Thanks, PC Rao. -- View this message in context: http://lucene.472066.n3.nabble.com/EmbeddedSolrServer-and-StreamingUpdateSolrServer-tp3889073p3907223.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev ge...@yandex.ru http://www.griddynamics.com mkhlud...@griddynamics.com
Re: How to read SOLR cache statistics?
http://wiki.apache.org/solr/SolrCaching On Fri, Apr 13, 2012 at 2:30 PM, Kashif Khan uplink2...@gmail.com wrote: Does anyone explain what does the following parameters mean in SOLR cache statistics? *name*: queryResultCache *class*: org.apache.solr.search.LRUCache *version*: 1.0 *description*: LRU Cache(maxSize=512, initialSize=512) *stats*: lookups : 98 *hits *: 59 *hitratio *: 0.60 *inserts *: 41 *evictions *: 0 *size *: 41 *warmupTime *: 0 *cumulative_lookups *: 98 *cumulative_hits *: 59 *cumulative_hitratio *: 0.60 *cumulative_inserts *: 39 *cumulative_evictions *: 0 AND also this *name*: fieldValueCache *class*: org.apache.solr.search.FastLRUCache *version*: 1.0 *description*: Concurrent LRU Cache(maxSize=1, initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false) *stats*: *lookups *: 8 *hits *: 4 *hitratio *: 0.50 *inserts *: 2 *evictions *: 0 *size *: 2 *warmupTime *: 0 *cumulative_lookups *: 8 *cumulative_hits *: 4 *cumulative_hitratio *: 0.50 *cumulative_inserts *: 2 *cumulative_evictions *: 0 *item_ABC *: {field=ABC,memSize=340592,tindexSize=1192,time=1360,phase1=1344,nTerms=7373,bigTerms=1,termInstances=11513,uses=4} *item_BCD *: {field=BCD,memSize=341248,tindexSize=1952,time=1688,phase1=1688,nTerms=8075,bigTerms=0,termInstances=13510,uses=2} Without understanding these terms i cannot configure server for better cache usage. The point is searches are very slow. These stats were taken when server was down and restarted. I just want to understand what these terms mean actually -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-read-SOLR-cache-statistics-tp3907294p3907294.html Sent from the Solr - User mailing list archive at Nabble.com.
AW: Lexical analysis tools for German language data
Von: Tomas Zerolo There can be transformations or inflections, like the s in Weinachtsbaum (Weinachten/Baum). I remember from my linguistics studies that the terminus technicus for these is Fugenmorphem (interstitial or joint morpheme) [...] IANAL (I am not a linguist -- pun intended ;) but I've always read that as a genitive. Any pointers? Admittedly, that's what you'd think, and despite linguistics telling me otherwise I'd maintain there's some truth in it. For this case, however, consider: die Weihnacht declines like die Nacht, so: nom. die Weihnacht gen. der Weihnacht dat. der Weihnacht akk. die Weihnacht As you can see, there's no s to be found anywhere, not even in the genitive. But my gut feeling, like yours, is that this should indicate genitive, and I would make a point of well-argued gut feeling being at least as relevant as formalist analysis. Michael
Re: two structures in solr
Thank you very much Erick for your reply! So should it go something like the following: http://lucene.472066.n3.nabble.com/file/n3907393/solr_index.png sorry for an ugly drawing ;) In this example, the index will have 13 columns: 6 for project, 6 for contractor and one to define the type. Is that right? -- View this message in context: http://lucene.472066.n3.nabble.com/two-structures-in-solr-tp3905143p3907393.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boost differences in two environments for same query and config
Hi Erick, Thanks for your suggestions. I did an optimize on the remote installation and this time with the same number of documents but still face the same issue as seen from the debug output below: 9.950362E-4 = (MATCH) sum of: 9.950362E-4 = (MATCH) weight(RECORD_TYPE:info in 35916), product of: 9.950362E-4 = queryWeight(RECORD_TYPE:info), product of: 1.0 = idf(docFreq=58891, maxDocs=8181811) 9.950362E-4 = queryNorm 1.0 = (MATCH) fieldWeight(RECORD_TYPE:info in 35916), product of: 1.0 = tf(termFreq(RECORD_TYPE:info)=1) 1.0 = idf(docFreq=58891, maxDocs=8181811) 1.0 = fieldNorm(field=RECORD_TYPE, doc=35916) 0.0 = (MATCH) product of: 1.0945399 = (MATCH) sum of: 0.99503624 = (MATCH) weight(CD:ee123^1000.0 in 35916), product of: 0.99503624 = queryWeight(CD:ee123^1000.0), product of: 1000.0 = boost 1.0 = idf(docFreq=1, maxDocs=8181811) 9.950362E-4 = queryNorm 1.0 = (MATCH) fieldWeight(CD:ee123 in 35916), product of: 1.0 = tf(termFreq(CD:ee123)=1) 1.0 = idf(docFreq=1, maxDocs=8181811) 1.0 = fieldNorm(field=CD, doc=35916) 0.09950362 = (MATCH) ConstantScoreQuery(QueryWrapperFilter(CD:ee123 CD:ee123c CD:ee123c. CD:ee123dc CD:ee123e CD:ee123e. CD:ee123en CD:ee123fx CD:ee123g CD:ee123g.1 CD:ee123g1 CD:ee123ee123 CD:ee123l.1 CD:ee123l1 CD:ee123ll CD:ee123lr CD:ee123m.z CD:ee123mg CD:ee123mz CD:ee123na CD:ee123nx CD:ee123ol CD:ee123op CD:ee123p CD:ee123p.1 CD:ee123p1 CD:ee123pn CD:ee123r.1 CD:ee123r1 CD:ee123s CD:ee123s.z CD:ee123sm CD:ee123sn CD:ee123sp CD:ee123ss CD:ee123sz)), product of: 100.0 = boost 9.950362E-4 = queryNorm 0.0 = coord(2/3) So I got the conf folder from the remote server location and replaced my local conf folder with this one to see if the indexes were formed differently but my local installation continues to work.I would expect to see the same behaviour as on the remote installation but it did not happen. (The only difference on the remote installation is that there are cores while my local installation has no cores). Anything else I could try? Thanks for your help. On 4/11/12, Erick Erickson erickerick...@gmail.com wrote: Well, you're matching a different number of records, so I have to assume your indexes are different on the two machines. Here is one case where doing an optimize might make sense, that'll purge the data associated with any deleted records from the index which should make comparisons better Additionally, you have to insure that your request handler is identical on both, have you made any changes to solrconfig.xml? About the coord (2/3), I'm pretty clueless. But also insure that your parsed query is identical on both, which is an additional check on whether you've changed something on one server and not the other. Best Erick On Wed, Apr 11, 2012 at 8:19 AM, Kerwin kerwin...@gmail.com wrote: Hi All, I am firing the following Solr query against installations on two environments one on my local Windows machine and the other on Unix (Remote). RECORD_TYPE:info AND (NAME:ee123* OR CD:ee123^1000 OR CD:ee123*^100) There are no differences in the DataImportHandler configuration , Schema and Solrconfig for both these installations. The correct expected result is given by the local installation of Solr which also gives scores as expected for the boosts. CORRECT/Expected: Debug query output for local installation: 10.822258 = (MATCH) sum of: 0.002170282 = (MATCH) weight(RECORD_TYPE:info in 35916), product of: 3.65739E-4 = queryWeight(RECORD_TYPE:info), product of: 5.933964 = idf(docFreq=58891, maxDocs=8181811) 6.1634855E-5 = queryNorm 5.933964 = (MATCH) fieldWeight(RECORD_TYPE:info in 35916), product of: 1.0 = tf(termFreq(RECORD_TYPE:info)=1) 5.933964 = idf(docFreq=58891, maxDocs=8181811) 1.0 = fieldNorm(field=RECORD_TYPE, doc=35916) 10.820087 = (MATCH) product of: 16.230131 = (MATCH) sum of: 16.223969 = (MATCH) weight(CD:ee123^1000.0 in 35916), product of: 0.81 = queryWeight(CD:ee123^1000.0), product of: 1000.0 = boost 16.224277 = idf(docFreq=1, maxDocs=8181811)
Re: Solr Scoring
Thanks a lot. I had already implemented Walter's solution and was wondering if this was the right way to deal with it. This has now given me the confidence to go with the solution. Many thanks. On Fri, Apr 13, 2012 at 1:04 AM, Erick Erickson erickerick...@gmail.comwrote: GAH! I had my head in make this happen in one field when I wrote my response, without being explicit. Of course Walter's solution is pretty much the standard way to deal with this. Best Erick On Thu, Apr 12, 2012 at 5:38 PM, Walter Underwood wun...@wunderwood.org wrote: It is easy. Create two fields, text_exact and text_stem. Don't use the stemmer in the first chain, do use the stemmer in the second. Give the text_exact a bigger weight than text_stem. wunder On Apr 12, 2012, at 4:34 PM, Erick Erickson wrote: No, I don't think there's an OOB way to make this happen. It's a recurring theme, make exact matches score higher than stemmed matches. Best Erick On Thu, Apr 12, 2012 at 5:18 AM, Kissue Kissue kissue...@gmail.com wrote: Hi, I have a field in my index called itemDesc which i am applying EnglishMinimalStemFilterFactory to. So if i index a value to this field containing Edges, the EnglishMinimalStemFilterFactory applies stemming and Edges becomes Edge. Now when i search for Edges, documents with Edge score better than documents with the actual search word - Edges. Is there a way i can make documents with the actual search word in this case Edges score better than document with Edge? I am using Solr 3.5. My field definition is shown below: fieldType name=text_en class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPossessiveFilterFactory/ filter class=solr.EnglishMinimalStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPossessiveFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.EnglishMinimalStemFilterFactory/ /analyzer /fieldType Thanks.
Re: Facets involving multiple fields
Hi, Thanks for your answer. Yes it works in this case when I know the facet name (Computer). What if I want to automatically compute all facets? facet.query=keyword:* short_title:* doesn't work, right? Marc. On Thu, Apr 12, 2012 at 2:08 PM, Erick Erickson erickerick...@gmail.com wrote: facet.query=keywords:computer short_title:computer seems like what you're asking for. On Thu, Apr 12, 2012 at 3:19 AM, Marc SCHNEIDER marc.schneide...@gmail.com wrote: Hi, Thanks for your answer. Let's say I have to fields : 'keywords' and 'short_title'. For these fields I'd like to make a faceted search : if 'Computer' is stored in at least one of these fields for a document I'd like to get it added in my results. doc1 = keywords : 'Computer' / short_title : 'Computer' doc2 = keywords : 'Computer' doc3 = short_title : 'Computer' In this case I'd like to have : Computer (3) I don't see how to solve this with facet.query. Thanks, Marc. On Wed, Apr 11, 2012 at 5:13 PM, Erick Erickson erickerick...@gmail.com wrote: Have you considered facet.query? You can specify an arbitrary query to facet on which might do what you want. Otherwise, I'm not sure what you mean by faceted search using two fields. How should these fields be combined into a single facet? What that means practically is not at all obvious from your problem statement. Best Erick On Tue, Apr 10, 2012 at 8:55 AM, Marc SCHNEIDER marc.schneide...@gmail.com wrote: Hi, I'd like to make a faceted search using two fields. I want to have a single result and not a result by field (like when using facet.field=f1,facet.field=f2). I don't want to use a copy field either because I want it to be dynamic at search time. As far as I know this is not possible for Solr 3.x... But I saw a new parameter named group.facet for Solr4. Could that solve my problem? If yes could somebody give me an example? Thanks, Marc.
Re: How to read SOLR cache statistics?
Hi Li Li, I have been through that WIKI before but that does not explain what is *evictions*, *inserts*, *cumulative_inserts*, *cumulative_evictions*, *hitratio *and all. These terms are foreign to me. What does the following line mean? *item_ABC : {field=ABC,memSize=340592,tindexSize=1192,time=1360,phase1=1344,nTerms=7373,bigTerms=1,termInstances=11513,uses=4} * I want that kind of explanation. I have read the wiki and the comments in the solrconfig.xml file about all these things but does say how to read the stats which is very *important!!!*. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-read-SOLR-cache-statistics-tp3907294p3907633.html Sent from the Solr - User mailing list archive at Nabble.com.
Issues with language based indexing
Hello, I am new to Solr. it is resulting some docs in my search for Acciones y Valores string. When i go and search for the same word in the given doc manually, i could not find those word. Pls help on what basis the doc is found in the search . Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Issues-with-language-based-indexing-tp3907601p3907601.html Sent from the Solr - User mailing list archive at Nabble.com.
Realtime /get versus SearchHandler
A discussion over on the dev list led me to expect that the by-if field retrievals in a SolrCloud query would come through the get handler. In fact, I've seen them turn up in my search component in the search handler that is configured with my custom QT. (I have a 'prepare' method that sets ShardParams.QT to my QT to get my processing involved in the first of the two queries.) Did I overthink this?
Re: Trouble handling Unit symbol
Please review: http://wiki.apache.org/solr/UsingMailingLists Especially the bit about adding debugQuery=on and showing the results. You're asking people to guess at solutions without providing much in the way of context. You might try looking at your index with Luke to see what's actually in your index, or perhaps TermsComponent Best Erick On Fri, Apr 13, 2012 at 2:29 AM, Rajani Maski rajinima...@gmail.com wrote: Hi All, I tried to index with UTF-8 encode but the issue is still not fixed. Please see my inputs below. *Indexed XML:* ?xml version=1.0 encoding=UTF-8 ? add doc field name=ID0.100/field field name=BODYµ/field /doc /add *Search Query - * BODY:µ numfound : 0 results obtained. *What can be the reason for this? How do i need to make search query so that the above document is found.* Thanks Regards Regards Rajani 2012/4/2 Rajani Maski rajinima...@gmail.com Thank you for the reply. On Sat, Mar 31, 2012 at 3:38 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : We have data having such symbols like : ต : Indexed data has - Dose:0 ตL : Now , when it is searched as - Dose:0 ตL ... : Query Q value observed : str name=qS257:0 ยตL/injection/str First off: your when searched as example does not match up to your Query Q observed value (ie: field queries, extra /injection text at the end) suggesting that you maybe cut/paste something you didn't mean to -- so take the rest of this advice with a grain of salt. If i ignore your when it is searched as exampleand focus entirely on what you say you've indexed the data as, and the Q value you are sing (in what looks like the echoParams output) then the first thing that jumps out at me is that it looks like your servlet container (or perhaps your web browser if that's where you tested this) is not dealing with the unicode correctly -- because allthough i see a ต in the first three lines i quoted above (UTF8: 0xC2 0xB5) in your value observed i'm seeing it preceeded by a ย (UTF8: 0xC3 0x82) ... suggesting that perhaps the ต did not get URL encoded properly when the request was made to your servlet container? In particular, you might want to take a look at... https://wiki.apache.org/solr/FAQ#Why_don.27t_International_Characters_Work.3F http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config The example/exampledocs/test_utf8.sh script included with solr -Hoss
Re: two structures in solr
bq: Is that right? I don't know, does it work G? You'll probably want an additional field for unique id (just named id in the example) that should be disjoint between your types. Best Erick On Fri, Apr 13, 2012 at 3:41 AM, tkoomzaaskz tomasz.du...@gmail.com wrote: Thank you very much Erick for your reply! So should it go something like the following: http://lucene.472066.n3.nabble.com/file/n3907393/solr_index.png sorry for an ugly drawing ;) In this example, the index will have 13 columns: 6 for project, 6 for contractor and one to define the type. Is that right? -- View this message in context: http://lucene.472066.n3.nabble.com/two-structures-in-solr-tp3905143p3907393.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boost differences in two environments for same query and config
Well, next thing I'd do is just copy your entire solr home directory to the remote machine and try that. If that gives identical results on both, then try moving just your solr home/data directory to the remote machine. I suspect that you've done something different between the two machines that's leading to this, but haven't a clue what. If you copy your entire Solr installation over and _still_ get this kind of thing, we're into whether the JVM or op system are somehow changing things, which would surprise me a lot. Best Erick On Fri, Apr 13, 2012 at 4:24 AM, Kerwin kerwin...@gmail.com wrote: Hi Erick, Thanks for your suggestions. I did an optimize on the remote installation and this time with the same number of documents but still face the same issue as seen from the debug output below: 9.950362E-4 = (MATCH) sum of: 9.950362E-4 = (MATCH) weight(RECORD_TYPE:info in 35916), product of: 9.950362E-4 = queryWeight(RECORD_TYPE:info), product of: 1.0 = idf(docFreq=58891, maxDocs=8181811) 9.950362E-4 = queryNorm 1.0 = (MATCH) fieldWeight(RECORD_TYPE:info in 35916), product of: 1.0 = tf(termFreq(RECORD_TYPE:info)=1) 1.0 = idf(docFreq=58891, maxDocs=8181811) 1.0 = fieldNorm(field=RECORD_TYPE, doc=35916) 0.0 = (MATCH) product of: 1.0945399 = (MATCH) sum of: 0.99503624 = (MATCH) weight(CD:ee123^1000.0 in 35916), product of: 0.99503624 = queryWeight(CD:ee123^1000.0), product of: 1000.0 = boost 1.0 = idf(docFreq=1, maxDocs=8181811) 9.950362E-4 = queryNorm 1.0 = (MATCH) fieldWeight(CD:ee123 in 35916), product of: 1.0 = tf(termFreq(CD:ee123)=1) 1.0 = idf(docFreq=1, maxDocs=8181811) 1.0 = fieldNorm(field=CD, doc=35916) 0.09950362 = (MATCH) ConstantScoreQuery(QueryWrapperFilter(CD:ee123 CD:ee123c CD:ee123c. CD:ee123dc CD:ee123e CD:ee123e. CD:ee123en CD:ee123fx CD:ee123g CD:ee123g.1 CD:ee123g1 CD:ee123ee123 CD:ee123l.1 CD:ee123l1 CD:ee123ll CD:ee123lr CD:ee123m.z CD:ee123mg CD:ee123mz CD:ee123na CD:ee123nx CD:ee123ol CD:ee123op CD:ee123p CD:ee123p.1 CD:ee123p1 CD:ee123pn CD:ee123r.1 CD:ee123r1 CD:ee123s CD:ee123s.z CD:ee123sm CD:ee123sn CD:ee123sp CD:ee123ss CD:ee123sz)), product of: 100.0 = boost 9.950362E-4 = queryNorm 0.0 = coord(2/3) So I got the conf folder from the remote server location and replaced my local conf folder with this one to see if the indexes were formed differently but my local installation continues to work.I would expect to see the same behaviour as on the remote installation but it did not happen. (The only difference on the remote installation is that there are cores while my local installation has no cores). Anything else I could try? Thanks for your help. On 4/11/12, Erick Erickson erickerick...@gmail.com wrote: Well, you're matching a different number of records, so I have to assume your indexes are different on the two machines. Here is one case where doing an optimize might make sense, that'll purge the data associated with any deleted records from the index which should make comparisons better Additionally, you have to insure that your request handler is identical on both, have you made any changes to solrconfig.xml? About the coord (2/3), I'm pretty clueless. But also insure that your parsed query is identical on both, which is an additional check on whether you've changed something on one server and not the other. Best Erick On Wed, Apr 11, 2012 at 8:19 AM, Kerwin kerwin...@gmail.com wrote: Hi All, I am firing the following Solr query against installations on two environments one on my local Windows machine and the other on Unix (Remote). RECORD_TYPE:info AND (NAME:ee123* OR CD:ee123^1000 OR CD:ee123*^100) There are no differences in the DataImportHandler configuration , Schema and Solrconfig for both these installations. The correct expected result is given by the local installation of Solr which also gives scores as expected for the boosts. CORRECT/Expected: Debug query output for local installation: 10.822258 = (MATCH) sum of: 0.002170282 = (MATCH) weight(RECORD_TYPE:info in 35916), product of: 3.65739E-4 = queryWeight(RECORD_TYPE:info), product of: 5.933964 = idf(docFreq=58891, maxDocs=8181811) 6.1634855E-5 = queryNorm 5.933964 = (MATCH) fieldWeight(RECORD_TYPE:info in 35916), product of:
Re: Trouble handling Unit symbol
Fine. Thank you. I will look at it. On Fri, Apr 13, 2012 at 5:21 PM, Erick Erickson erickerick...@gmail.comwrote: Please review: http://wiki.apache.org/solr/UsingMailingLists Especially the bit about adding debugQuery=on and showing the results. You're asking people to guess at solutions without providing much in the way of context. You might try looking at your index with Luke to see what's actually in your index, or perhaps TermsComponent Best Erick On Fri, Apr 13, 2012 at 2:29 AM, Rajani Maski rajinima...@gmail.com wrote: Hi All, I tried to index with UTF-8 encode but the issue is still not fixed. Please see my inputs below. *Indexed XML:* ?xml version=1.0 encoding=UTF-8 ? add doc field name=ID0.100/field field name=BODYµ/field /doc /add *Search Query - * BODY:µ numfound : 0 results obtained. *What can be the reason for this? How do i need to make search query so that the above document is found.* Thanks Regards Regards Rajani 2012/4/2 Rajani Maski rajinima...@gmail.com Thank you for the reply. On Sat, Mar 31, 2012 at 3:38 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : We have data having such symbols like : ต : Indexed data has -Dose:0 ตL : Now , when it is searched as - Dose:0 ตL ... : Query Q value observed : str name=qS257:0 ยตL/injection/str First off: your when searched as example does not match up to your Query Q observed value (ie: field queries, extra /injection text at the end) suggesting that you maybe cut/paste something you didn't mean to -- so take the rest of this advice with a grain of salt. If i ignore your when it is searched as exampleand focus entirely on what you say you've indexed the data as, and the Q value you are sing (in what looks like the echoParams output) then the first thing that jumps out at me is that it looks like your servlet container (or perhaps your web browser if that's where you tested this) is not dealing with the unicode correctly -- because allthough i see a ต in the first three lines i quoted above (UTF8: 0xC2 0xB5) in your value observed i'm seeing it preceeded by a ย (UTF8: 0xC3 0x82) ... suggesting that perhaps the ต did not get URL encoded properly when the request was made to your servlet container? In particular, you might want to take a look at... https://wiki.apache.org/solr/FAQ#Why_don.27t_International_Characters_Work.3F http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config The example/exampledocs/test_utf8.sh script included with solr -Hoss
Re: Facets involving multiple fields
Nope. Information about your higher level use-case would probably be a good thing, this is starting to smell like an XY problem. Best Erick On Fri, Apr 13, 2012 at 5:48 AM, Marc SCHNEIDER marc.schneide...@gmail.com wrote: Hi, Thanks for your answer. Yes it works in this case when I know the facet name (Computer). What if I want to automatically compute all facets? facet.query=keyword:* short_title:* doesn't work, right? Marc. On Thu, Apr 12, 2012 at 2:08 PM, Erick Erickson erickerick...@gmail.com wrote: facet.query=keywords:computer short_title:computer seems like what you're asking for. On Thu, Apr 12, 2012 at 3:19 AM, Marc SCHNEIDER marc.schneide...@gmail.com wrote: Hi, Thanks for your answer. Let's say I have to fields : 'keywords' and 'short_title'. For these fields I'd like to make a faceted search : if 'Computer' is stored in at least one of these fields for a document I'd like to get it added in my results. doc1 = keywords : 'Computer' / short_title : 'Computer' doc2 = keywords : 'Computer' doc3 = short_title : 'Computer' In this case I'd like to have : Computer (3) I don't see how to solve this with facet.query. Thanks, Marc. On Wed, Apr 11, 2012 at 5:13 PM, Erick Erickson erickerick...@gmail.com wrote: Have you considered facet.query? You can specify an arbitrary query to facet on which might do what you want. Otherwise, I'm not sure what you mean by faceted search using two fields. How should these fields be combined into a single facet? What that means practically is not at all obvious from your problem statement. Best Erick On Tue, Apr 10, 2012 at 8:55 AM, Marc SCHNEIDER marc.schneide...@gmail.com wrote: Hi, I'd like to make a faceted search using two fields. I want to have a single result and not a result by field (like when using facet.field=f1,facet.field=f2). I don't want to use a copy field either because I want it to be dynamic at search time. As far as I know this is not possible for Solr 3.x... But I saw a new parameter named group.facet for Solr4. Could that solve my problem? If yes could somebody give me an example? Thanks, Marc.
Solr data export to CSV File
Hi Team, A very-very thanks to you guy who had developed such a nice product. I have one query regarding solr that I have app 36 Million data in my solr and I wants to export all the data to a csv file but I have found nothing on the same so please help me on this topic . Regards Pavnesh
Re: How to read SOLR cache statistics?
Well, the place to start is here: *stats*: lookups : 98 *hits *: 59 *hitratio *: 0.60 *inserts *: 41 *evictions *: 0 *size *: 41 the important bits are hitratio and evictions. Caches only really start to show their stuff when the hit ratio is quite high. That's the percentage of requests that are satisfied by entries already in the cache. You want this number to be as high as possible, +0.90. evictions are the number of entries that have been removed from the cache. The pre-configured number is usually 512, so when the 513th entry is inserted in the cache, some are removed to make room and tallied in the evictions section. Do note that some of the caches (documentCache in particular) will rarely have a huge hit ratio due to its nature, ditto with queryResultCache so you can temporarily ignore those. Best Erick On Fri, Apr 13, 2012 at 6:28 AM, Kashif Khan uplink2...@gmail.com wrote: Hi Li Li, I have been through that WIKI before but that does not explain what is *evictions*, *inserts*, *cumulative_inserts*, *cumulative_evictions*, *hitratio *and all. These terms are foreign to me. What does the following line mean? *item_ABC : {field=ABC,memSize=340592,tindexSize=1192,time=1360,phase1=1344,nTerms=7373,bigTerms=1,termInstances=11513,uses=4} * I want that kind of explanation. I have read the wiki and the comments in the solrconfig.xml file about all these things but does say how to read the stats which is very *important!!!*. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-read-SOLR-cache-statistics-tp3907294p3907633.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: performance impact using string or float when querying ranges
Well, I guess my first question is whether using stirngs is fast enough, in which case there's little reason to make your life more complex. But yes, range queries will be significantly faster with any of the Trie types than with strings. Trie types are all numeric types. Best Erick On Fri, Apr 13, 2012 at 3:49 AM, crive marco.cr...@gmail.com wrote: Hi All, is there a big difference in terms of performances when querying a range like [50.0 TO *] on a string field compared to a float field? At the moment I am using a dynamic field of type string to map some values coming from our database and their type can vary depending on the context (float/integer/string); it easier to use a dynamic field other than having to create a bespoke field for each type of value. Marco
Re: Issues with language based indexing
Please review: http://wiki.apache.org/solr/UsingMailingLists there's so little information to go on here that I really can't say anything that isn't a guess. At a minimum we need the raw input, the fieldType definitions from your schema, the results of adding debugQuery=on to your URL Best Erick On Fri, Apr 13, 2012 at 6:04 AM, JGar jyothi.garladi...@citi.com wrote: Hello, I am new to Solr. it is resulting some docs in my search for Acciones y Valores string. When i go and search for the same word in the given doc manually, i could not find those word. Pls help on what basis the doc is found in the search . Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Issues-with-language-based-indexing-tp3907601p3907601.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr data export to CSV File
Does this help? http://wiki.apache.org/solr/CSVResponseWriter Best Erick On Fri, Apr 13, 2012 at 7:59 AM, Pavnesh pavnesh.ku...@altruistindia.com wrote: Hi Team, A very-very thanks to you guy who had developed such a nice product. I have one query regarding solr that I have app 36 Million data in my solr and I wants to export all the data to a csv file but I have found nothing on the same so please help me on this topic . Regards Pavnesh
RE: Realtime /get versus SearchHandler
Yes brbrbr--- Original Message --- On 4/13/2012 06:25 AM Benson Margulies wrote:brA discussion over on the dev list led me to expect that the by-if brfield retrievals in a SolrCloud query would come through the get brhandler. In fact, I've seen them turn up in my search component in the brsearch handler that is configured with my custom QT. (I have a br'prepare' method that sets ShardParams.QT to my QT to get my brprocessing involved in the first of the two queries.) Did I overthink brthis? br br
RE: Solr data export to CSV File
A combination of the CSV response writer and SOLRJ to page through all of the results sending it to something like apache commons fileutils: FileUtils.writeStringToFile(new File(output.csv), outputLine (line.separator), true); Would be quiet quick to knock up in Java. Thanks Ben -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 13 April 2012 13:28 To: solr-user@lucene.apache.org Subject: Re: Solr data export to CSV File Does this help? http://wiki.apache.org/solr/CSVResponseWriter Best Erick On Fri, Apr 13, 2012 at 7:59 AM, Pavnesh pavnesh.ku...@altruistindia.com wrote: Hi Team, A very-very thanks to you guy who had developed such a nice product. I have one query regarding solr that I have app 36 Million data in my solr and I wants to export all the data to a csv file but I have found nothing on the same so please help me on this topic . Regards Pavnesh This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.
Re: searching across multiple fields using edismax - am i setting this up right?
thank you for the response. it seems to be working well ;) 1) i tried your suggestion about removing the qt parameter - *somecore/partItemNoSearch*q=dishwasherdebugQuery=onrows=10 but this results in a 404 error message - is there some configuration i am missing to support this short-hand syntax for specifying the requestHandler in the url ? 2) ok - good suggestion. 3) yes it looks like it IS searching across all three (3) fields. i noticed that for the itemNo field, it reduced the search string from dishwasher to dishwash - it this because of stemming on the field type, used for the itemNo field? lst name=debugstr name=rawquerystringdishwasher/strstr name=querystringdishwasher/strstr name=parsedquery+DisjunctionMaxQuery((brand:dishwasher^0.5 | *itemNo:dishwash* | productType:dishwasher^0.8))/strstr name=parsedquery_toString+(brand:dishwasher^0.5 | itemNo:dishwash | productType:dishwasher^0.8)/str -- View this message in context: http://lucene.472066.n3.nabble.com/searching-across-multiple-fields-using-edismax-am-i-setting-this-up-right-tp3906334p3907875.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: searching across multiple fields using edismax - am i setting this up right?
as to 1) you have to define your request handler with a leading /, as in name= /partItemNoSearch. Don't forget to restart your server. 3) Of course. The input terms MUST be run through the associated analysis chain to have any hope of matching correctly. Best Erick On Fri, Apr 13, 2012 at 8:36 AM, geeky2 gee...@hotmail.com wrote: thank you for the response. it seems to be working well ;) 1) i tried your suggestion about removing the qt parameter - *somecore/partItemNoSearch*q=dishwasherdebugQuery=onrows=10 but this results in a 404 error message - is there some configuration i am missing to support this short-hand syntax for specifying the requestHandler in the url ? 2) ok - good suggestion. 3) yes it looks like it IS searching across all three (3) fields. i noticed that for the itemNo field, it reduced the search string from dishwasher to dishwash - it this because of stemming on the field type, used for the itemNo field? lst name=debugstr name=rawquerystringdishwasher/strstr name=querystringdishwasher/strstr name=parsedquery+DisjunctionMaxQuery((brand:dishwasher^0.5 | *itemNo:dishwash* | productType:dishwasher^0.8))/strstr name=parsedquery_toString+(brand:dishwasher^0.5 | itemNo:dishwash | productType:dishwasher^0.8)/str -- View this message in context: http://lucene.472066.n3.nabble.com/searching-across-multiple-fields-using-edismax-am-i-setting-this-up-right-tp3906334p3907875.html Sent from the Solr - User mailing list archive at Nabble.com.
Errors during indexing
Hello We have just switched to Solr4 as we needed the ability to return geodist() along with our results. I use a simple multithreaded java app and solr to ingest the data. We keep seeing the following: 13-Apr-2012 15:50:10 org.apache.solr.common.SolrException log SEVERE: null:org.apache.solr.common.SolrException: Error handling 'status' action at org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:546) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:156) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:175) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.FileNotFoundException: /usr/solr4/data/index/_2jb.fnm (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccaessFile.java:216) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:219) at org.apache.lucene.codecs.lucene40.Lucene40FieldInfosReader.read(Lucene40FieldInfosReader.java:47) at org.apache.lucene.index.SegmentInfo.loadFieldInfos(SegmentInfo.java:201) at org.apache.lucene.index.SegmentInfo.getFieldInfos(SegmentInfo.java:227) at org.apache.lucene.index.SegmentInfo.files(SegmentInfo.java:415) at org.apache.lucene.index.SegmentInfos.files(SegmentInfos.java:756) at org.apache.lucene.index.StandardDirectoryReader$ReaderCommit.init(StandardDirectoryReader.java:369) at org.apache.lucene.index.StandardDirectoryReader.getIndexCommit(StandardDirectoryReader.java:354) at org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:558) at org.apache.solr.handler.admin.CoreAdminHandler.getCoreStatus(CoreAdminHandler.java:816) at org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:537) ... 16 more This seems to happen when were using the new admin tool. Im checking on the autocommit handler. Has anyone seen anything similar? Thanks Ben This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.
RE: solr 3.5 taking long to index
Hi Shawn, Thanks for the information, let me give this a try, since this is a live box I will try it during the weekend and update you. Regards, Rohit Mobile: +91-9901768202 About Me: http://about.me/rohitg -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: 13 April 2012 11:01 To: solr-user@lucene.apache.org Subject: Re: solr 3.5 taking long to index On 4/12/2012 8:42 PM, Rohit wrote: The machine has a total ram of around 46GB. My Biggest concern is Solr index time gradually increasing and then the commit stops because of timeouts, out commit rate is very high, but I am not able to find the root cause of the issue. For good performance, Solr relies on the OS having enough free RAM to keep critical portions of the index in the disk cache. Some numbers that I have collected from your information so far are listed below. Please let me know if I've got any of this wrong: 46GB total RAM 36GB RAM allocated to Solr 300GB total index size This leaves only 10GB of RAM free to cache 300GB of index, assuming that this server is dedicated to Solr. The critical portions of your index are very likely considerably larger than 10GB, which causes constant reading from the disk for queries and updates. With a high commit rate and a relatively low mergeFactor of 10, your index will be doing a lot of merging during updates, and some of those merges are likely to be quite large, further complicating the I/O situation. Another thing that can lead to increasing index update times is cache warming, also greatly affected by high I/O levels. If you visit the /solr/corename/admin/stats.jsp#cache URL, you can see the warmupTime for each cache in milliseconds. Adding more memory to the server would probably help things. You'll want to carefully check all the server and Solr statistics you can to make sure that memory is the root of problem, before you actually spend the money. At the server level, look for things like a high iowait CPU percentage. For Solr, you can turn the logging level up to INFO in the admin interface as well as turn on the infostream in solrconfig.xml for extensive debugging. I hope this is helpful. If not, I can try to come up with more specific things you can look at. Thanks, Shawn
Solr is not extracting the CDATA part of xml
I am trying to use method that is suggested in solr forum to remove CDATA part of xml. but it is not working. result show whole xml content instead of CDATA part. schema.xml fieldType name=text_ws2 class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ charFilter class=solr.HTMLStripCharFilterFactory/ charFilter class=solr.MappingCharFilterFactory mapping=mappings.txt/ /analyzer /fieldType mappings.txt = my xml content body /body -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908317.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr is not extracting the CDATA part of xml
not sure why CDATA part did not get interpreted. this is how xml content looks like. I added quotes just to present the exact content xml content. body/body -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908341.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: performance impact using string or float when querying ranges
On Fri, Apr 13, 2012 at 8:11 AM, Erick Erickson erickerick...@gmail.com wrote: Well, I guess my first question is whether using stirngs is fast enough, in which case there's little reason to make your life more complex. But yes, range queries will be significantly faster with any of the Trie types than with strings. To elaborate on this point a bit... range queries on strings will be the same speed as a numeric field with precisionStep=0. You need a precisionStep 0 (so the number will be indexed in multiple parts) to speed up range queries on numeric fields. (See int vs tint in the solr schema). -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10 Trie types are all numeric types. Best Erick On Fri, Apr 13, 2012 at 3:49 AM, crive marco.cr...@gmail.com wrote: Hi All, is there a big difference in terms of performances when querying a range like [50.0 TO *] on a string field compared to a float field? At the moment I am using a dynamic field of type string to map some values coming from our database and their type can vary depending on the context (float/integer/string); it easier to use a dynamic field other than having to create a bespoke field for each type of value. Marco
mergePolicy element format change in 3.6 vs 3.5?
Trying to maintain the Drupal integration module across multiple versions of 3.x, we've gotten a bug report suggesting that Solr 3.6 needs this change to solrconfig: - mergePolicyorg.apache.lucene.index.LogByteSizeMergePolicy/mergePolicy +mergePolicy class=org.apache.lucene.index.LogByteSizeMergePolicy / I don't see this mentioned in the release notes - is the second format useable with 3.5, 3.4, etc? -- Peter M. Wolanin, Ph.D. : Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com : 781-313-8322 Get a free, hosted Drupal 7 site: http://www.drupalgardens.com;
RE: mergePolicy element format change in 3.6 vs 3.5?
It looks like the first format was removed in 3.6 as part of https://issues.apache.org/jira/browse/SOLR-1052. The second format works in all 3.x versions. -Michael -Original Message- From: Peter Wolanin [mailto:peter.wola...@acquia.com] Sent: Friday, April 13, 2012 12:32 PM To: solr-user@lucene.apache.org Subject: mergePolicy element format change in 3.6 vs 3.5? Trying to maintain the Drupal integration module across multiple versions of 3.x, we've gotten a bug report suggesting that Solr 3.6 needs this change to solrconfig: - mergePolicyorg.apache.lucene.index.LogByteSizeMergePolicy/mergePolicy +mergePolicy class=org.apache.lucene.index.LogByteSizeMergePolicy / I don't see this mentioned in the release notes - is the second format useable with 3.5, 3.4, etc?
Re: mergePolicy element format change in 3.6 vs 3.5?
Ok, thanks for the info. As long as the second one works, we can just use that. I just verified that it works for 3.5 at least. -Peter On Fri, Apr 13, 2012 at 1:12 PM, Michael Ryan mr...@moreover.com wrote: It looks like the first format was removed in 3.6 as part of https://issues.apache.org/jira/browse/SOLR-1052. The second format works in all 3.x versions. -Michael -Original Message- From: Peter Wolanin [mailto:peter.wola...@acquia.com] Sent: Friday, April 13, 2012 12:32 PM To: solr-user@lucene.apache.org Subject: mergePolicy element format change in 3.6 vs 3.5? Trying to maintain the Drupal integration module across multiple versions of 3.x, we've gotten a bug report suggesting that Solr 3.6 needs this change to solrconfig: - mergePolicyorg.apache.lucene.index.LogByteSizeMergePolicy/mergePolicy +mergePolicy class=org.apache.lucene.index.LogByteSizeMergePolicy / I don't see this mentioned in the release notes - is the second format useable with 3.5, 3.4, etc? -- Peter M. Wolanin, Ph.D. : Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com : 781-313-8322 Get a free, hosted Drupal 7 site: http://www.drupalgardens.com;
Re: Solr is not extracting the CDATA part of xml
Solr does not index arbitrary XML content. There is and XML form of a solr document that can be sent to Solr, but it is a specific form of XML. An example of the XML you're trying to index and what you mean by not working would be helpful. Best Erick On Fri, Apr 13, 2012 at 11:50 AM, srini softtec...@gmail.com wrote: not sure why CDATA part did not get interpreted. this is how xml content looks like. I added quotes just to present the exact content xml content. body/body -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908341.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr is not extracting the CDATA part of xml
Erick, Thanks for your reply. when you say Solr does not index arbitery xml document, then below is the way my xml document looks like which is sitting in oracle. Could you suggest the best of indexing it ? which method should I follow? Should I use XPathEntityProcessor? ?xml version=1.0 encoding=UTF-8 ? message xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xmlns=someurl xmlns:csp=someurl.xsd xsi:schemaLocation=somelocation jar: id=002 message-type=create content dsp:row dsp:channel100/dsp:channel dsp:role115/dsp:role /dsp:row /body/content/message Thanks in Advance Erick Erickson wrote Solr does not index arbitrary XML content. There is and XML form of a solr document that can be sent to Solr, but it is a specific form of XML. An example of the XML you're trying to index and what you mean by not working would be helpful. Best Erick On Fri, Apr 13, 2012 at 11:50 AM, srini lt;softtech88@gt; wrote: not sure why CDATA part did not get interpreted. this is how xml content looks like. I added quotes just to present the exact content xml content. body/body -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908341.html Sent from the Solr - User mailing list archive at Nabble.com. Erick Erickson wrote Solr does not index arbitrary XML content. There is and XML form of a solr document that can be sent to Solr, but it is a specific form of XML. An example of the XML you're trying to index and what you mean by not working would be helpful. Best Erick On Fri, Apr 13, 2012 at 11:50 AM, srini lt;softtech88@gt; wrote: not sure why CDATA part did not get interpreted. this is how xml content looks like. I added quotes just to present the exact content xml content. body/body -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908341.html Sent from the Solr - User mailing list archive at Nabble.com. Erick Erickson wrote Solr does not index arbitrary XML content. There is and XML form of a solr document that can be sent to Solr, but it is a specific form of XML. An example of the XML you're trying to index and what you mean by not working would be helpful. Best Erick On Fri, Apr 13, 2012 at 11:50 AM, srini lt;softtech88@gt; wrote: not sure why CDATA part did not get interpreted. this is how xml content looks like. I added quotes just to present the exact content xml content. body/body -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908341.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908791.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr is not extracting the CDATA part of xml
Right, that will not work at all for direct transmission to Solr. You could write a Java program that parses this and sends it to Solr via SolrJ. Personally I haven't connected a database to Solr with XPathEntityProcessor in the mix, but I believe I've seen messages go by with this configuration. You might want to search the mail archive... Best Erick On Fri, Apr 13, 2012 at 3:13 PM, srini softtec...@gmail.com wrote: Erick, Thanks for your reply. when you say Solr does not index arbitery xml document, then below is the way my xml document looks like which is sitting in oracle. Could you suggest the best of indexing it ? which method should I follow? Should I use XPathEntityProcessor? ?xml version=1.0 encoding=UTF-8 ? message xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xmlns=someurl xmlns:csp=someurl.xsd xsi:schemaLocation=somelocation jar: id=002 message-type=create content dsp:row dsp:channel100/dsp:channel dsp:role115/dsp:role /dsp:row /body/content/message Thanks in Advance Erick Erickson wrote Solr does not index arbitrary XML content. There is and XML form of a solr document that can be sent to Solr, but it is a specific form of XML. An example of the XML you're trying to index and what you mean by not working would be helpful. Best Erick On Fri, Apr 13, 2012 at 11:50 AM, srini lt;softtech88@gt; wrote: not sure why CDATA part did not get interpreted. this is how xml content looks like. I added quotes just to present the exact content xml content. body/body -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908341.html Sent from the Solr - User mailing list archive at Nabble.com. Erick Erickson wrote Solr does not index arbitrary XML content. There is and XML form of a solr document that can be sent to Solr, but it is a specific form of XML. An example of the XML you're trying to index and what you mean by not working would be helpful. Best Erick On Fri, Apr 13, 2012 at 11:50 AM, srini lt;softtech88@gt; wrote: not sure why CDATA part did not get interpreted. this is how xml content looks like. I added quotes just to present the exact content xml content. body/body -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908341.html Sent from the Solr - User mailing list archive at Nabble.com. Erick Erickson wrote Solr does not index arbitrary XML content. There is and XML form of a solr document that can be sent to Solr, but it is a specific form of XML. An example of the XML you're trying to index and what you mean by not working would be helpful. Best Erick On Fri, Apr 13, 2012 at 11:50 AM, srini lt;softtech88@gt; wrote: not sure why CDATA part did not get interpreted. this is how xml content looks like. I added quotes just to present the exact content xml content. body/body -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908341.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908791.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr is not extracting the CDATA part of xml
Hi This is not solr format. You must re-format your XML into solr XML. you may find examples on solr wiki or in solr examples dir. Best Regards Alexander Aristov On 13 April 2012 23:13, srini softtec...@gmail.com wrote: Erick, Thanks for your reply. when you say Solr does not index arbitery xml document, then below is the way my xml document looks like which is sitting in oracle. Could you suggest the best of indexing it ? which method should I follow? Should I use XPathEntityProcessor? ?xml version=1.0 encoding=UTF-8 ? message xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xmlns=someurl xmlns:csp=someurl.xsd xsi:schemaLocation=somelocation jar: id=002 message-type=create content dsp:row dsp:channel100/dsp:channel dsp:role115/dsp:role /dsp:row /body/content/message Thanks in Advance Erick Erickson wrote Solr does not index arbitrary XML content. There is and XML form of a solr document that can be sent to Solr, but it is a specific form of XML. An example of the XML you're trying to index and what you mean by not working would be helpful. Best Erick On Fri, Apr 13, 2012 at 11:50 AM, srini lt;softtech88@gt; wrote: not sure why CDATA part did not get interpreted. this is how xml content looks like. I added quotes just to present the exact content xml content. body/body -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908341.html Sent from the Solr - User mailing list archive at Nabble.com. Erick Erickson wrote Solr does not index arbitrary XML content. There is and XML form of a solr document that can be sent to Solr, but it is a specific form of XML. An example of the XML you're trying to index and what you mean by not working would be helpful. Best Erick On Fri, Apr 13, 2012 at 11:50 AM, srini lt;softtech88@gt; wrote: not sure why CDATA part did not get interpreted. this is how xml content looks like. I added quotes just to present the exact content xml content. body/body -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908341.html Sent from the Solr - User mailing list archive at Nabble.com. Erick Erickson wrote Solr does not index arbitrary XML content. There is and XML form of a solr document that can be sent to Solr, but it is a specific form of XML. An example of the XML you're trying to index and what you mean by not working would be helpful. Best Erick On Fri, Apr 13, 2012 at 11:50 AM, srini lt;softtech88@gt; wrote: not sure why CDATA part did not get interpreted. this is how xml content looks like. I added quotes just to present the exact content xml content. body/body -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908341.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908791.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr is not extracting the CDATA part of xml
Thanks Again for quick reply. Little curious about the procedure you suggested. I thought of using same procedure as you suggested. Like writing a java program to fetch xml record from db and parse the content hand it to Solr for indexing. but what if my database content get changed? should I re run my java program to fetch xml and add to solr for re indexing? the content of xml format does not match to solr example xml formats. Any suggestions here? when I import xml records from oracle and add it to solr and search for a word, solr is displaying whole xml doc which has that word. what is wrong with this procedure( I do see my search word in the content of xml, only bad part is it is displaying whole doc instead CDATA part of it). Please suggest if there is better of doing this task other than SolrJ Thanks in Advance Srini -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908825.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boosting StandardQuery scores with a subquery?
: I'm having some trouble wrapping my head around boosting StandardQueries. : It looks like the function: query(subquery, default) : http://wiki.apache.org/solr/FunctionQuery#query is what I want, but the : examples seem to focus on just returning a score (e.g. product of popularity : and the score of the subquery). I assume my difficulty stems from the fact : that I'd like to retrieve highlighting from one query, but impact score and : 'relevance' by a different (sub)query. if your primary concern is just having highlighting on some words, while lots of otherwords contribute to the score, then you should take a look at the hl.q param introduced in Solr 3.5... http://wiki.apache.org/solr/HighlightingParameters#hl.q That lets you completley seperate the two if you'd like. you cna even use local param syntax to reduce duplication... q={!v=$qq} qq=content:(roi return on investment return investment~5) hl.q={!v=$qq} fq=extension:(pdf doc) boost=keywords:(financial investment profit loss) title:(financial investment profit loss) url:(investment investor relations phoenix) ...should work i think. -Hoss
Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment
Hi, For a web crawl+search like this you will probably need a lot of additional Big Data crunching, so a Hadoop based solution is wise. In addition to those products mentioned we also now have Amazon's own CloudSearch http://aws.amazon.com/cloudsearch/ It's new, is not as cool as Solr (not even Lucene based), but gives you the elasticity you request I guess. If you run your Hadoop cluster in EC2 already it would be quite efficient to batch-load the crawled and processed data into a SearchDomain in the same availability zone. However, both cost and features may prohibit this as a realistic choice for you. It would be cool to explore a Hadoop/HDFS + SolrCloud integration. SolrCloud would not build the indexes, but be pulling pre-built indexes from HDFS down to local disk every time it's told to. Or perhaps the SolrCloud nodes could be part of the hadoop cluster, being responsible for the Reduce part building the indexes? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 13. apr. 2012, at 04:23, Otis Gospodnetic wrote: Hello Ali, I'm trying to setup a large scale *Crawl + Index + Search *infrastructure using Nutch and Solr/Lucene. The targeted scale is *5 Billion web pages*, crawled + indexed every *4 weeks, *with a search latency of less than 0.5 seconds. That's fine. Whether it's doable with any tech will depend on how much hardware you give it, among other things. Needless to mention, the search index needs to scale to 5Billion pages. It is also possible that I might need to store multiple indexes -- one for crawled content, and one for ancillary data that is also very large. Each of these indices would likely require a logically distributed and replicated index. Yup, OK. However, I would like for such a system to be homogenous with the Hadoop infrastructure that is already installed on the cluster (for the crawl). In other words, I would much prefer if the replication and distribution of the Solr/Lucene index be done automagically on top of Hadoop/HDFS, instead of using another scalability framework (such as SolrCloud). In addition, it would be ideal if this environment was flexible enough to be dynamically scaled based on the size requirements of the index and the search traffic at the time (i.e. if it is deployed on an Amazon cluster, it should be easy enough to automatically provision additional processing power into the cluster without requiring server re-starts). There is no such thing just yet. There is no Search+Hadoop/HDFS in a box just yet. There was an attempt to automatically index HBase content, but that was either not completed or not committed into HBase. However, I'm not sure which Solr-based tool in the Hadoop ecosystem would be ideal for this scenario. I've heard mention of Solr-on-HBase, Solandra, Lily, ElasticSearch, IndexTank etc, but I'm really unsure which of these is mature enough and would be the right architectural choice to go along with a Nutch crawler setup, and to also satisfy the dynamic/auto-scaling aspects above. Here is a summary on all of them: * Search on HBase - I assume you are referring to the same thing I mentioned above. Not ready. * Solandra - uses Cassandra+Solr, plus DataStax now has a different (commercial) offering that combines search and Cassandra. Looks good. * Lily - data stored in HBase cluster gets indexed to a separate Solr instance(s) on the side. Not really integrated the way you want it to be. * ElasticSearch - solid at this point, the most dynamic solution today, can scale well (we are working on a mny-B documents index and hundreds of nodes with ElasticSearch right now), etc. But again, not integrated with Hadoop the way you want it. * IndexTank - has some technical weaknesses, not integrated with Hadoop, not sure about its future considering LinkedIn uses Zoie and Sensei already. * And there is SolrCloud, which is coming soon and will be solid, but is again not integrated. If I were you and I had to pick today - I'd pick ElasticSearch if I were completely open. If I had Solr bias I'd give SolrCloud a try first. Lastly, how much hardware (assuming a medium sized EC2 instance) would you estimate my needing with this setup, for regular web-data (HTML text) at this scale? I don't know off the topic of my head, but I'm guessing several hundred for serving search requests. HTH, Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Scalable Performance Monitoring - http://sematext.com/spm/index.html Any architectural guidance would be greatly appreciated. The more details provided, the wider my grin :). Many many thanks in advance. Thanks, Safdar
Re: Post Sorting hook before the doc slicing.
: Basically, I need to find item X in the result set and return say N items : before and N items after. : : - N items -- Item X --- N items ... : So I might be wrong, but it looks like the only way would be to create a : custom SolrIndexSearcher which will find the offset and create the related : docslice. That slicing part doesn't seem to be well factored that I can : see, so it seems to imply copy/pasting a significant chunk off the code. Am : I looking at the wrong place ? trying to do this as a hook into the SolrIndexSearcher would definitley be complicated ... laregley because of how matches are collected. the most straight forward way i can think of to get the data you want is to consider what you are sorting on, and use that as a range filter, ie... 1) do your search, and filter on id:X 2) look at the values X has in the fields you are sorting on 3) search again, this time filter on those fields, asking for the first N docs with values greater then whatever id:X has 4) search again, this time reverse your sort, and reverse your filters (docs with values less hten whatever id:X has) and get the first N docs. ...even if your sort is score you can use the frange parser to filter (not usually recommended for score, but possible) -Hoss
Re: Can I discover what part of a score is attributable to a subquery?
On Fri, Apr 13, 2012 at 2:40 PM, Benson Margulies bimargul...@gmail.com wrote: Given a query including a subquery, is there any way for me to learn that subquery's contribution to the overall document score? I can provide 'why on earth would anyone ...' if someone wants to know. Have you tried debugQuery=true? http://wiki.apache.org/solr/CommonQueryParameters#debugQuery The 'explain' field of the result explains the scoring of each document.
Re: two structures in solr
: I need to store *two big structures* in SOLR: projects and contractors. : Contractors will search for available projects and project owners will : search for contractors who would do it for them. http://wiki.apache.org/solr/MultipleIndexes : that *I want to have two structures*. I guess running two parallel solr : instances is not the idea. I took a look at there's nothing wrong with it, the real question is wether you ever need to do things with both sets of documents at once. if contractors only ever search for projects, and project owners only ever serach for contractors, and no one ever searches for a mix of projects and contractors at the same time, then i would just suggest using multiple SolrCores... http://wiki.apache.org/solr/MultipleIndexes#MultiCore http://wiki.apache.org/solr/CoreAdmin -Hoss
Re: term frequency outweighs exact phrase match
Hello Hoss, Here are the explain tags for two doc str name=a0127d8e70a6d523 0.021646015 = (MATCH) sum of: 0.021646015 = (MATCH) sum of: 0.02141003 = (MATCH) max plus 0.01 times others of: 2.84194E-4 = (MATCH) weight(content:apache^0.5 in 3578), product of: 0.0029881175 = queryWeight(content:apache^0.5), product of: 0.5 = boost 4.3554416 = idf(docFreq=126092, maxDocs=3613605) 0.0013721307 = queryNorm 0.09510804 = (MATCH) fieldWeight(content:apache in 3578), product of: 2.236068 = tf(termFreq(content:apache)=5) 4.3554416 = idf(docFreq=126092, maxDocs=3613605) 0.009765625 = fieldNorm(field=content, doc=3578) 0.021407187 = (MATCH) weight(title:apache^1.2 in 3578), product of: 0.01371095 = queryWeight(title:apache^1.2), product of: 1.2 = boost 8.327043 = idf(docFreq=2375, maxDocs=3613605) 0.0013721307 = queryNorm 1.5613205 = (MATCH) fieldWeight(title:apache in 3578), product of: 1.0 = tf(termFreq(title:apache)=1) 8.327043 = idf(docFreq=2375, maxDocs=3613605) 0.1875 = fieldNorm(field=title, doc=3578) 2.359865E-4 = (MATCH) max plus 0.01 times others of: 2.359865E-4 = (MATCH) weight(content:solr^0.5 in 3578), product of: 0.004071705 = queryWeight(content:solr^0.5), product of: 0.5 = boost 5.9348645 = idf(docFreq=25986, maxDocs=3613605) 0.0013721307 = queryNorm 0.05795766 = (MATCH) fieldWeight(content:solr in 3578), product of: 1.0 = tf(termFreq(content:solr)=1) 5.9348645 = idf(docFreq=25986, maxDocs=3613605) 0.009765625 = fieldNorm(field=content, doc=3578) /strstr name=d89380e313c64aa5 0.021465056 = (MATCH) sum of: 1.8154096E-4 = (MATCH) sum of: 6.354771E-5 = (MATCH) max plus 0.01 times others of: 6.354771E-5 = (MATCH) weight(content:apache^0.5 in 638040), product of: 0.0029881175 = queryWeight(content:apache^0.5), product of: 0.5 = boost 4.3554416 = idf(docFreq=126092, maxDocs=3613605) 0.0013721307 = queryNorm 0.021266805 = (MATCH) fieldWeight(content:apache in 638040), product of: 1.0 = tf(termFreq(content:apache)=1) 4.3554416 = idf(docFreq=126092, maxDocs=3613605) 0.0048828125 = fieldNorm(field=content, doc=638040) 1.1799325E-4 = (MATCH) max plus 0.01 times others of: 1.1799325E-4 = (MATCH) weight(content:solr^0.5 in 638040), product of: 0.004071705 = queryWeight(content:solr^0.5), product of: 0.5 = boost 5.9348645 = idf(docFreq=25986, maxDocs=3613605) 0.0013721307 = queryNorm 0.02897883 = (MATCH) fieldWeight(content:solr in 638040), product of: 1.0 = tf(termFreq(content:solr)=1) 5.9348645 = idf(docFreq=25986, maxDocs=3613605) 0.0048828125 = fieldNorm(field=content, doc=638040) 0.021283515 = (MATCH) weight(content:apache solr~1^30.0 in 638040), product of: 0.42358932 = queryWeight(content:apache solr~1^30.0), product of: 30.0 = boost 10.290306 = idf(content: apache=126092 solr=25986) 0.0013721307 = queryNorm 0.050245635 = fieldWeight(content:apache solr in 638040), product of: 1.0 = tf(phraseFreq=1.0) 10.290306 = idf(content: apache=126092 solr=25986) 0.0048828125 = fieldNorm(field=content, doc=638040) /str Although the second doc has exact match it is placed after the first one which does not have exact match. I use the following request handler requestHandler name=search class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qfhost^30 content^0.5 title^1.2 anchor^1.2/str str name=pfcontent^30/str str name=flurl,id, site ,title/str str name=mm2lt;-1 5lt;-2 6lt;90%/str int name=ps1/int bool name=hltrue/bool str name=q.alt*:*/str str name=hl.flcontent/str str name=f.title.hl.fragsize0/str str name=hl.fragsize165/str str name=f.title.hl.alternateFieldtitle/str str name=f.url.hl.fragsize0/str str name=f.url.hl.alternateFieldurl/str str name=f.content.hl.fragmenterregex/str str name=spellchecktrue/str str name=spellcheck.collatetrue/str str name=spellcheck.count5/str str name=grouptrue/str str name=group.fieldsite/str str name=group.ngroupstrue/str /lst arr name=last-components strspellcheck/str /arr /requestHandler and the query is as follows http://localhost:8983/solr/select/?q=apache solrversion=2.2start=0rows=10indent=onqt=searchdebugQuery=true Thanks. Alex. -Original Message- From: Chris Hostetter hossman_luc...@fucit.org To: solr-user solr-user@lucene.apache.org Sent: Thu, Apr 12, 2012 7:43 pm Subject: Re: term frequency outweighs exact phrase match : I use solr 3.5 with edismax. I have the following issue with phrase : search. For example if I have three documents with content like : : 1.apache apache : 2. solr solr :
Re: Can I discover what part of a score is attributable to a subquery?
On Fri, Apr 13, 2012 at 6:43 PM, John Chee johnc...@mylife.com wrote: On Fri, Apr 13, 2012 at 2:40 PM, Benson Margulies bimargul...@gmail.com wrote: Given a query including a subquery, is there any way for me to learn that subquery's contribution to the overall document score? I need this number to be available in a SearchComponent that runs after QueryComponent. I can provide 'why on earth would anyone ...' if someone wants to know. Have you tried debugQuery=true? http://wiki.apache.org/solr/CommonQueryParameters#debugQuery The 'explain' field of the result explains the scoring of each document.
Re: Can I discover what part of a score is attributable to a subquery?
: Given a query including a subquery, is there any way for me to learn : that subquery's contribution to the overall document score? You have to just execute the subquery itself ... doc collection and score calculation doesn't keep track the subscores. you could do this using functions in the fl but since you mentioned wanting this in SearchCOmponent just pass the subquery to SolrIndexSeracher using a DocSet filter of the current page (ie: make your own DocSet based on the current DocList) -Hoss
Re: Solr is not extracting the CDATA part of xml
This all comes from a database? Here is what you want. The DataImportHandler includes a toolkit for doing full and incremental loading from databases. Read this first: http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/solr/DIHQuickStart Then these: http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/solr/DataImportHandlerFaq http://lucidworks.lucidimagination.com/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler After you try the procedure in QuickStart and read the other two, if you still have questions please ask. Cheers! On Fri, Apr 13, 2012 at 12:34 PM, srini softtec...@gmail.com wrote: Thanks Again for quick reply. Little curious about the procedure you suggested. I thought of using same procedure as you suggested. Like writing a java program to fetch xml record from db and parse the content hand it to Solr for indexing. but what if my database content get changed? should I re run my java program to fetch xml and add to solr for re indexing? the content of xml format does not match to solr example xml formats. Any suggestions here? when I import xml records from oracle and add it to solr and search for a word, solr is displaying whole xml doc which has that word. what is wrong with this procedure( I do see my search word in the content of xml, only bad part is it is displaying whole doc instead CDATA part of it). Please suggest if there is better of doing this task other than SolrJ Thanks in Advance Srini -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908825.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: Can I discover what part of a score is attributable to a subquery?
On Fri, Apr 13, 2012 at 7:07 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Given a query including a subquery, is there any way for me to learn : that subquery's contribution to the overall document score? You have to just execute the subquery itself ... doc collection and score calculation doesn't keep track the subscores. you could do this using functions in the fl but since you mentioned wanting this in SearchCOmponent just pass the subquery to SolrIndexSeracher using a DocSet filter of the current page (ie: make your own DocSet based on the current DocList) I get it. Some fairly intricate dancing then can ensue with SolrCloud. Thanks. -Hoss
Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment
Thanks Otis. I really appreciate the details offered here. This was very helpful information. I'm going to go through Solandra and Elastic Search and see if those make sense. I was also given a suggestion to use SolrCloud on FuseDFS (that's two recommendations for SolrCloud so far), so I will give that a shot when it is available. However, do you know when SolrCloud IS expected to be available? Thanks again! Warm regards, Safdar On Fri, Apr 13, 2012 at 5:23 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hello Ali, I'm trying to setup a large scale *Crawl + Index + Search *infrastructure using Nutch and Solr/Lucene. The targeted scale is *5 Billion web pages*, crawled + indexed every *4 weeks, *with a search latency of less than 0.5 seconds. That's fine. Whether it's doable with any tech will depend on how much hardware you give it, among other things. Needless to mention, the search index needs to scale to 5Billion pages. It is also possible that I might need to store multiple indexes -- one for crawled content, and one for ancillary data that is also very large. Each of these indices would likely require a logically distributed and replicated index. Yup, OK. However, I would like for such a system to be homogenous with the Hadoop infrastructure that is already installed on the cluster (for the crawl). In other words, I would much prefer if the replication and distribution of the Solr/Lucene index be done automagically on top of Hadoop/HDFS, instead of using another scalability framework (such as SolrCloud). In addition, it would be ideal if this environment was flexible enough to be dynamically scaled based on the size requirements of the index and the search traffic at the time (i.e. if it is deployed on an Amazon cluster, it should be easy enough to automatically provision additional processing power into the cluster without requiring server re-starts). There is no such thing just yet. There is no Search+Hadoop/HDFS in a box just yet. There was an attempt to automatically index HBase content, but that was either not completed or not committed into HBase. However, I'm not sure which Solr-based tool in the Hadoop ecosystem would be ideal for this scenario. I've heard mention of Solr-on-HBase, Solandra, Lily, ElasticSearch, IndexTank etc, but I'm really unsure which of these is mature enough and would be the right architectural choice to go along with a Nutch crawler setup, and to also satisfy the dynamic/auto-scaling aspects above. Here is a summary on all of them: * Search on HBase - I assume you are referring to the same thing I mentioned above. Not ready. * Solandra - uses Cassandra+Solr, plus DataStax now has a different (commercial) offering that combines search and Cassandra. Looks good. * Lily - data stored in HBase cluster gets indexed to a separate Solr instance(s) on the side. Not really integrated the way you want it to be. * ElasticSearch - solid at this point, the most dynamic solution today, can scale well (we are working on a mny-B documents index and hundreds of nodes with ElasticSearch right now), etc. But again, not integrated with Hadoop the way you want it. * IndexTank - has some technical weaknesses, not integrated with Hadoop, not sure about its future considering LinkedIn uses Zoie and Sensei already. * And there is SolrCloud, which is coming soon and will be solid, but is again not integrated. If I were you and I had to pick today - I'd pick ElasticSearch if I were completely open. If I had Solr bias I'd give SolrCloud a try first. Lastly, how much hardware (assuming a medium sized EC2 instance) would you estimate my needing with this setup, for regular web-data (HTML text) at this scale? I don't know off the topic of my head, but I'm guessing several hundred for serving search requests. HTH, Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Scalable Performance Monitoring - http://sematext.com/spm/index.html Any architectural guidance would be greatly appreciated. The more details provided, the wider my grin :). Many many thanks in advance. Thanks, Safdar
dynamic analyzer based on condition
Hi, I want to pick different analyzers for the same field for different languages. I can determine the language from a different field. I would have different fieldTypes defined in my schema.xml such as text_en, text_de, text_fr, etc where i specify which analyzer and filter to use during indexing and query time. fieldType name=text_en class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPossessiveFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPossessiveFilterFactory/ /analyzer /fieldType but i would like to define the field dynamically. for e.g if lang==en field name=description type=text_en indexed=true stored=true / else if lang==de field name=description type=text_de indexed=true stored=true / ... Can I achieve this somehow ? If this approach cannot be done then i can just create one field for every language. Thanks Srini -- View this message in context: http://lucene.472066.n3.nabble.com/dynamic-analyzer-based-on-condition-tp3909345p3909345.html Sent from the Solr - User mailing list archive at Nabble.com.
remoteLink that change it's text
Hi! I have the following gsp code... g:each in=${productInstanceList} status=i var=productInstance !-- display product properties ommited -- g:remoteLink action=addaction id=${i} update=[success:'what-to-put-here',failure:'error'] on404=alert('not found'); Select this product /g:remoteLink /g:each How to have each remoteLink to change it's Select this product text to what addaction renders? The problem I'm facing is that I don't know what to put in 'what-to-put-here ' in order to achieve that. Of course, I'm new to gsp tags. Any idea? Thanks in advance, Marcelo Carvalho Fernandes +55 21 8272-7970 +55 21 2205-2786
Re: remoteLink that change it's text
Sorry! Wrong list! Marcelo Carvalho Fernandes +55 21 8272-7970 +55 21 2205-2786 On Fri, Apr 13, 2012 at 10:54 PM, Marcelo Carvalho Fernandes mcf2...@gmail.com wrote: Hi! I have the following gsp code... g:each in=${productInstanceList} status=i var=productInstance !-- display product properties ommited -- g:remoteLink action=addaction id=${i} update=[success:'what-to-put-here',failure:'error'] on404=alert('not found'); Select this product /g:remoteLink /g:each How to have each remoteLink to change it's Select this product text to what addaction renders? The problem I'm facing is that I don't know what to put in ' what-to-put-here' in order to achieve that. Of course, I'm new to gsp tags. Any idea? Thanks in advance, Marcelo Carvalho Fernandes +55 21 8272-7970 +55 21 2205-2786