Re: Two instances of solr - the same datadir?
I have spent lot of time in the past day playing with this setup, and made it work finally, here are few bits of interest: - solr v40 - linux, java7, local filesystem - big index, 1 RW instance + 2 RO instances (sharing the same index) lock is acquired when solr is writing data - if you happen to be starting your RO instance at this moment and you are using 'native' lock, it will fail. However, when using RW instance with 'native' lock, and 2 RO instances 'single' lock, the RO instances can start, but they will eventually get into troubles too - our index is too big and so when core RELOAD is called and indexing is under way, the RO instances time out. core reload, when using 'native' lock, seems to work fine - if you were lucky and all instances managed to start - HOWEVER, the core is unresponsive until fully loaded (makes sense), but this is actually terrible - your search is gone for seconds/minutes the best setup is as described in my original post - RO instances MUST NOT commit anything - neither use reload (because during reload solr tries to acquire lock). Instead, they should just reopen the searcher - i repeat: you should make sure that nothing is every going to write on the RO instance. And because there is no public api for reopening the searcher, I wrote a simple handler which just calls: req.getCore().getSearcher(true, false, null, false); when called, the RO instances continue to handle requests using the old searcher, warming in the background, once ready, the new searcher takes over [to repeat: i am triggering this refresh from the RW instance, it does 'curl http://foo/solr/myhandler?command=reopenSearcher] the bad thing: when the RO instance dies (eg OOM error) and the RW is just in the middle of writing data, you can't restart RO instance (unless you use lock 'single' or some other lock) HTH, roman On Tue, Jul 2, 2013 at 5:35 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Wouldn't it be better to do a RELOAD? http://wiki.apache.org/solr/CoreAdmin#RELOAD Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w: appinions.com http://www.appinions.com/ On Tue, Jul 2, 2013 at 5:05 PM, Peter Sturge peter.stu...@gmail.com wrote: The RO instance commit isn't (or shouldn't be) doing any real writing, just an empty commit to force new searchers, autowarm/refresh caches etc. Admittedly, we do all this on 3.6, so 4.0 could have different behaviour in this area. As long as you don't have autocommit in solrconfig.xml, there wouldn't be any commits 'behind the scenes' (we do all our commits via a local solrj client so it can be fully managed). The only caveat might be NRT/soft commits, but I'm not too familiar with this in 4.0. In any case, your RO instance must be getting updated somehow, otherwise how would it know your write instance made any changes? Perhaps your write instance notifies the RO instance externally from Solr? (a perfectly valid approach, and one that would allow a 'single' lock to work without contention) On Tue, Jul 2, 2013 at 7:59 PM, Roman Chyla roman.ch...@gmail.com wrote: Interesting, we are running 4.0 - and solr will refuse the start (or reload) the core. But from looking at the code I am not seeing it is doing any writing - but I should digg more... Are you sure it needs to do writing? Because I am not calling commits, in fact I have deactivated *all* components that write into index, so unless there is something deep inside, which automatically calls the commit, it should never happen. roman On Tue, Jul 2, 2013 at 2:54 PM, Peter Sturge peter.stu...@gmail.com wrote: Hmmm, single lock sounds dangerous. It probably works ok because you've been [un]lucky. For example, even with a RO instance, you still need to do a commit in order to reload caches/changes from the other instance. What happens if this commit gets called in the middle of the other instance's commit? I've not tested this scenario, but it's very possible with a 'single' lock the results are indeterminate. If the 'single' lock mechanism is making assumptions e.g. no other process will interfere, and then one does, the Lucene index could very well get corrupted. For the error you're seeing using 'native', we use native lockType for both write and RO instances, and it works fine - no contention. Which version of Solr are you using? Perhaps there's been a change in behaviour? Peter On Tue, Jul 2, 2013 at 7:30 PM, Roman Chyla roman.ch...@gmail.com wrote: as i discovered, it is not good to use 'native' locktype in this scenario, actually there is a note in the
[Solr 4.2] deleteInstanceDir is added to CoreAdminHandler but is not supported in Unload CoreAdminRequest
Hi, I need to unload core with deleting instance directory of the core. According to code of Solr4.2 I don't see the support for this parameter in solrj. Is there the fix or open issue for this? Best regards, Lyuba
Re: Joins with SolrCloud
Hi Yonik, Thanks for the reply. It was very helpful. This may be a newb question but will this work on a individual rows of a query or do all the queries' results need to be on the same shard. ex. if the main query would return - user15 (shard 1) - user16 (shard 2) - user17 (shard 3) is it acceptable to have doc1 (shard 1) whatever (shard 2) yeah (shard 3) for a join of - user15, doc1 - user16, whatever - user17, yeah or do all the results of the main query need to reside on the same shard as all the results of join. Hopefully that's an understandable question. Thanks, slevytam -- View this message in context: http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4073199p4075408.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Simple Moving Average of Query Durations
I started some work on https://issues.apache.org/jira/browse/SOLR-4735, which may help here. Have been pulled away onto other things, but I want to get back to it soon. Alan Woodward www.flax.co.uk On 3 Jul 2013, at 23:54, Otis Gospodnetic wrote: Hi Jan, http://search-lucene.com/?q=percentilefc_project=Solrfc_type=issue - SOLR-1792? Otis -- Performance Monitoring -- http://sematext.com/spm Solr ElasticSearch Support -- http://sematext.com/ On Wed, Jul 3, 2013 at 5:59 PM, Jan Morlock jan.morl...@googlemail.com wrote: Hi, we would like to observe the mean value of the average time per request for the last N (e.g. 20) queries (a.k.a. simple moving average) of our Solr server using Nagios. Does anybody know if such an observable is already implemented. If not, I think the perfect place for it would be the getStatistics() method inside solr/core/src/java/org/apache/solr/handler/RequestHandlerBase.java. Would you agree? Thank you very much. Best regards Jan -- View this message in context: http://lucene.472066.n3.nabble.com/Simple-Moving-Average-of-Query-Durations-tp4075312.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Advice for performance issues with group.facet=true
Many thanks for your response Otis - I had feared as much, but it's good to have it confirmed. Best wishes, Daniel On 03/07/2013 17:05, Otis Gospodnetic wrote: Hi, I think nobody in the community is focused on field collapsing/grouping, so I suspect there won't be a fix until somebody gets a strong-enough itch or business requires it so much that it decides it pays to invests in the contribution. Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Wed, Jul 3, 2013 at 5:54 AM, Daniel Bryant daniel.bry...@tai-dev.co.uk wrote: Hi everyone, I'm seeing very bad performance when grouping (field collapsing) using group.facet=true with a large result set. - I have an index with 2 million documents, and I query with five facet fields (each with 30+ groups) - If I set group.facet=false the query can take 2000ms on first run, but no more than 250ms on subsequent execution - If I set group.facet=true it takes on average 18000ms on the first run, and the same time on all subsequent runs (suggesting to me that a cache is not being used) I've checked the Solr Jira and several others are experiencing the same issue: https://issues.apache.org/jira/browse/SOLR-4763 Could anyone offer any advice or suggestions please? This is becoming a blocking issue for us, and I'm very curious if this will be fixed in the near future? Best wishes, Daniel -- *Daniel Bryant | Software Development Consultant | www.tai-dev.co.uk http://www.tai-dev.co.uk/* daniel.bry...@tai-dev.co.uk mailto:daniel.bry...@tai-dev.co.uk | +44 (0) 7799406399 | Twitter: @taidevcouk https://twitter.com/taidevcouk -- *Daniel Bryant | Software Development Consultant | www.tai-dev.co.uk http://www.tai-dev.co.uk/* daniel.bry...@tai-dev.co.uk mailto:daniel.bry...@tai-dev.co.uk | +44 (0) 7799406399 | Twitter: @taidevcouk https://twitter.com/taidevcouk
Re: Moving from single Solr instance to Solr Cloud
Which version of Solr you are using? 2013/7/4 Ali, Saqib docbook@gmail.com We have single Solr instance with lot of indexed document. Now we would like to move to SolrCloud implementation. Can we move the existing index to SolrCloud? If so, how? Or do we need to reindex our data in SolrCloud? Thanks, Saqib
Re: PropagateServer Implementation for Solr
Ok, in the scenario where the calling app uses SolrJ and creates a CloudSolrServer to send all its requests in. In that case, yes I can see the logic that says CloudSolrServer shouldn't load balance that (its not that type of request), it should forward it on to all the servers in the cloud. What will happen to the responses, do you get N (independent) responses back or do you plan to do some kind of aggregation? I confess we don't use SolrJ (our clients are C++), so we just manually send the request to all the servers in the cloud (will integrate with ZK when we work out that interface) so it would be nice if HTTP callers could do the same (maybe something like distrib=true|false on the LukeRequest as a shot in the dark, caller can request details from 1 server, or from the cloud as a whole?) Is there a way to send the Threads (/admin/threads) and stats requests (/admin/mbeans)? We also use them for monitoring (we can't deploy the web-based monitoring tools for various internal reasons which I won't bore you with!), but I can't see a request in SolrJ that would map to them? On 3 July 2013 22:08, Furkan KAMACI furkankam...@gmail.com wrote: Hi; I've written an e-mail at dev list and I want to share same e-mail here. I've opened two issues at Jira and I want to get feedback of community. First issue is: https://issues.apache.org/jira/browse/SOLR-4995 Currently Solr servers are interacting with only one Solr node. I think that there should be an implementation that propagates requests into multiple Solr nodes. For example when Solr is used as SolrCloud sending a LukeRequest should be made to one node at each shard. First patch will be related to implementing a PropagateServer for Solr. Second issue is related to first one: https://issues.apache.org/jira/browse/SOLR-4996 Let's assume that you are using Solr as SolrCloud and you have more than one shard. Let's assume that there are 20 docs at shard_1 and 15 docs at shard_2. When using CloudSolrServer if you make a LukeRequest it uses LBHttpSolrServer internally and it sends request to just one Solr Node (via HttpSolrServer) as round robin. So you may get 20 docs as a result at first request and if you send same request you may get 15 docs as a result too. Using a PropagateServer inside CloudSolrServer will fix that bug. I've made initial patchs for them and I will change/add code to them after getting feedback from community (i.e. first patch does not make multi threaded requests at PropagateServer, I just want to get feedbacks of community after that I will add other features) Thanks; Furkan KAMACI
Surprising score?
Hi Solr people! querying for series:RCWP returns me the response below. Why does RCWP Moisture Resistant score worse than D/CRCW-P e3 with the field definition below? OK, we are ignoring dashes and spaces, but I would have expected that matches towards the beginning score better. Can I change this behavior (in Solr 4)? -- result doc str name=seriesRCWP/str float name=score3.2698402/float /doc doc str name=seriesD/CRCW-P e3/str float name=score1.3624334/float /doc doc str name=seriesRCWP Moisture Resistant/str float name=score0.5449734/float /doc /result -- fieldType name=series class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=[\-\s]+ replacement=/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.NGramFilterFactory minGramSize=2 maxGramSize=50/ /analyzer analyzer type=query charFilter class=solr.PatternReplaceCharFilterFactory pattern=[\-\s]+ replacement=/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Thanks, Alexander
Re: Surprising score?
Hi Alexander, This is because you have length normalization enabled for that field. http://ir.dcs.gla.ac.uk/wiki/Length_Normalisation If you want it disabled set the following: fieldType name=series class=solr.TextField positionIncrementGap=100 omitNorms=true Jeroen On 4-7-2013 11:10, Lochschmied, Alexander wrote: Hi Solr people! querying for series:RCWP returns me the response below. Why does RCWP Moisture Resistant score worse than D/CRCW-P e3 with the field definition below? OK, we are ignoring dashes and spaces, but I would have expected that matches towards the beginning score better. Can I change this behavior (in Solr 4)? -- result doc str name=seriesRCWP/str float name=score3.2698402/float /doc doc str name=seriesD/CRCW-P e3/str float name=score1.3624334/float /doc doc str name=seriesRCWP Moisture Resistant/str float name=score0.5449734/float /doc /result -- fieldType name=series class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=[\-\s]+ replacement=/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.NGramFilterFactory minGramSize=2 maxGramSize=50/ /analyzer analyzer type=query charFilter class=solr.PatternReplaceCharFilterFactory pattern=[\-\s]+ replacement=/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Thanks, Alexander
Re: PropagateServer Implementation for Solr
Here is an example how I use PropagateServer inside CloudSolrServer: public static ListCloudStatistics customListStatistics(CloudSolrServer solrServer) { NamedListObject namedList = new SimpleOrderedMapObject(); try { namedList = solrServer.request(new LukeRequest()); } catch (SolrServerException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } ListNamedListObject all = (ListNamedListObject) namedList.get(all); ListCloudStatistics cloudStatisticsList = new ArrayListCloudStatistics(); for (NamedListObject namedListSlice : all) { cloudStatisticsList.add(new CloudStatistics((NamedListObject) namedListSlice.get(index))); } return cloudStatisticsList; } PS: CloudStatistics is a class implemented by me and holds statistics metrics. 2013/7/4 Daniel Collins danwcoll...@gmail.com Ok, in the scenario where the calling app uses SolrJ and creates a CloudSolrServer to send all its requests in. In that case, yes I can see the logic that says CloudSolrServer shouldn't load balance that (its not that type of request), it should forward it on to all the servers in the cloud. What will happen to the responses, do you get N (independent) responses back or do you plan to do some kind of aggregation? I confess we don't use SolrJ (our clients are C++), so we just manually send the request to all the servers in the cloud (will integrate with ZK when we work out that interface) so it would be nice if HTTP callers could do the same (maybe something like distrib=true|false on the LukeRequest as a shot in the dark, caller can request details from 1 server, or from the cloud as a whole?) Is there a way to send the Threads (/admin/threads) and stats requests (/admin/mbeans)? We also use them for monitoring (we can't deploy the web-based monitoring tools for various internal reasons which I won't bore you with!), but I can't see a request in SolrJ that would map to them? On 3 July 2013 22:08, Furkan KAMACI furkankam...@gmail.com wrote: Hi; I've written an e-mail at dev list and I want to share same e-mail here. I've opened two issues at Jira and I want to get feedback of community. First issue is: https://issues.apache.org/jira/browse/SOLR-4995 Currently Solr servers are interacting with only one Solr node. I think that there should be an implementation that propagates requests into multiple Solr nodes. For example when Solr is used as SolrCloud sending a LukeRequest should be made to one node at each shard. First patch will be related to implementing a PropagateServer for Solr. Second issue is related to first one: https://issues.apache.org/jira/browse/SOLR-4996 Let's assume that you are using Solr as SolrCloud and you have more than one shard. Let's assume that there are 20 docs at shard_1 and 15 docs at shard_2. When using CloudSolrServer if you make a LukeRequest it uses LBHttpSolrServer internally and it sends request to just one Solr Node (via HttpSolrServer) as round robin. So you may get 20 docs as a result at first request and if you send same request you may get 15 docs as a result too. Using a PropagateServer inside CloudSolrServer will fix that bug. I've made initial patchs for them and I will change/add code to them after getting feedback from community (i.e. first patch does not make multi threaded requests at PropagateServer, I just want to get feedbacks of community after that I will add other features) Thanks; Furkan KAMACI
SOLR 4.0 frequent admin problem
Hi, About once a week the admin system comes up with SolrCore Initialization Failures. There's nothing in the logs and SOLR continues to work in the application it's supporting and in the 'direct access' mode (i.e. http://123.465.789.100:8080/solr/collection1/select?q=bingo:*). The cure is to restart Jetty (8.1.7) and then we can use the admin system again via pc's. However, a colleague can get into admin on an iPad with no trouble when no browser on a pc can! Anyone any ideas? It's really frustrating! Best regards, DQ
ClassNotFoundException regarding SolrInfoMBean under Tomcat 7
Hi everyone, I'm trying to get the CMS TYPO3 connected with Solr 3.6.2. By now I followed the installation at http://wiki.apache.org/solr/SolrTomcat except that I didn't copy the .war-file into the $SOLR_HOME but referencing to it at a different location via Tomcat Context fragment file. Until then the Solr-Server works – I can reach the GUI via URL. To get Solr connected with the CMS I then created a new core-folder (btw. can anybody give me kind of a live example, when to use different cores? Until now I still don't really understand the concept of cores ..) by duplicating the example-folder in which I overwrote some files (especially solrconfig.xml) with files offered by the TYPO3-community. I also moved the file solr.xml one level up and edited it (added core-fragment and especially adjusted instanceDir) to get a correct multicore-setup like in the example multicore-setup within the downloaded solr-tgz-package. But now I get the Java-exception java.lang.NoClassDefFoundError: org/apache/solr/core/SolrInfoMBean at java.lang.ClassLoader.defineClass1(Native Method) In the Tomcat-log file it is said additionally: Caused by: java.lang.ClassNotFoundException: org.apache.solr.core.SolrInfoMBean. My guess is, that within the new solrconfig.xml there are calls to classes which aren't included correctly. There are some libs, which are included at the top of this file but the paths of the references should be ok as I checked them via Bash: At http://wiki.apache.org/solr/SolrConfigXml it is said that the lib dir= directory is relative to the instanceDir, so this is what I've checked. I also inserted absolute paths but this wasn't successful either. Can anybody give me a hint how to solve this problem? Would be great :) Cheers, Michael
Re: Surprising score?
And be sure to re-index your content. Upayavira On Thu, Jul 4, 2013, at 11:28 AM, Jeroen Steggink wrote: Hi Alexander, This is because you have length normalization enabled for that field. http://ir.dcs.gla.ac.uk/wiki/Length_Normalisation If you want it disabled set the following: fieldType name=series class=solr.TextField positionIncrementGap=100 omitNorms=true Jeroen On 4-7-2013 11:10, Lochschmied, Alexander wrote: Hi Solr people! querying for series:RCWP returns me the response below. Why does RCWP Moisture Resistant score worse than D/CRCW-P e3 with the field definition below? OK, we are ignoring dashes and spaces, but I would have expected that matches towards the beginning score better. Can I change this behavior (in Solr 4)? -- result doc str name=seriesRCWP/str float name=score3.2698402/float /doc doc str name=seriesD/CRCW-P e3/str float name=score1.3624334/float /doc doc str name=seriesRCWP Moisture Resistant/str float name=score0.5449734/float /doc /result -- fieldType name=series class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=[\-\s]+ replacement=/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.NGramFilterFactory minGramSize=2 maxGramSize=50/ /analyzer analyzer type=query charFilter class=solr.PatternReplaceCharFilterFactory pattern=[\-\s]+ replacement=/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Thanks, Alexander
Find related words
How might one find the top related words for a given word in a Solr index? For instance, given the following single-field documents: 1: I love chocolate 2: I love Solr 3: I eat chocolate cake 4: You will eat chocolate candy Thus, given the word Chocolate Solr might find these top words: I (3 times matched) eat (2 times matched) love, cake, you, will, candy (1 time each) Thanks! -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Moving from single Solr instance to Solr Cloud
Hello Furkan, We are using Solr 4.3 Thanks On Thu, Jul 4, 2013 at 1:43 AM, Furkan KAMACI furkankam...@gmail.comwrote: Which version of Solr you are using? 2013/7/4 Ali, Saqib docbook@gmail.com We have single Solr instance with lot of indexed document. Now we would like to move to SolrCloud implementation. Can we move the existing index to SolrCloud? If so, how? Or do we need to reindex our data in SolrCloud? Thanks, Saqib
Re: Joins with SolrCloud
Yes, joins support distributed search fine, provided that the individual documents that are joined reside on the same shard. For example, if you are modeling blogs and posts (one blog object as many posts) shard1 -- joe!blog_info joe!post1 shard2 -- mary!blog_info mary!post1 So now you can search for post bodies and join to the main blog via {!join from=blog_pointer to=blog_id}post_body:hello If both mary and joe have a post with hello, they will both be found and joined to their main blog info docs with a single distributed search across the collection. -Yonik http://lucidworks.com On Thu, Jul 4, 2013 at 3:37 AM, slevytam developm...@the10thfloor.com wrote: Hi Yonik, Thanks for the reply. It was very helpful. This may be a newb question but will this work on a individual rows of a query or do all the queries' results need to be on the same shard. ex. if the main query would return - user15 (shard 1) - user16 (shard 2) - user17 (shard 3) is it acceptable to have doc1 (shard 1) whatever (shard 2) yeah (shard 3) for a join of - user15, doc1 - user16, whatever - user17, yeah or do all the results of the main query need to reside on the same shard as all the results of join. Hopefully that's an understandable question. Thanks, slevytam -- View this message in context: http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4073199p4075408.html Sent from the Solr - User mailing list archive at Nabble.com.
Total Term Frequency per ResultSet in Solr 4.3 ?
Hi , I have lots of crawled data, indexed in my Solr (4.3.0) and lets say user creates a search criteria 'X1' and he/she wants to know the occurrence of a specific term in the result set of that 'X1' search criteria. And then again he/she creates another search criteria 'X2' and he/she wants to know the occurrence of that same term in the result set of that 'X2' search criteria. At the moment if I give termfreq(field,term) then it gives me the term frequency per document and if I use totaltermfreq(field,term), it gives me the total term frequency in entire index not in the result set of my search criteria. So what I need is your help to find how to how to get total occurrence of a term in query's result set. If this is my result set doc str name=typeMovies/str str name=formatdvd/str str name=productThe Hunger Games/str/doc doc str name=typeBooks/str str name=formatpaperback/str str name=productThe Hunger Book/str/doc And I am looking for term 'hunger' in product field then I want to get value = '2' , and if I am searching for term 'games' in product field I want to get value = '1' . Thanks, Tony
Solr Phonetic Search returning documents but not Highlight Information
We have a pretty simple Solr Schema: fields field name=DocId type=long indexed=true stored=true required=true / field name=DocTitle type=string indexed=true stored=true required=true / field name=Content type=text_general indexed=false stored=true required=true / field name=ContentSearch type=text_general indexed=true stored=false multiValued=true/ field name=ContentSearchStemming type=text_stem indexed=true stored=false multiValued=true/ field name=ContentSearchPhonetic type=text_phonetic indexed=true stored=false multiValued=true/ field name=ContentSearchSynonym type=text_synonym indexed=true stored=false multiValued=true/ field name=_version_ type=long indexed=true stored=true/ /fields uniqueKeyDocId/uniqueKey copyField source=Content dest=ContentSearch/ copyField source=Content dest=ContentSearchStemming/ copyField source=Content dest=ContentSearchPhonetic/ copyField source=Content dest=ContentSearchSynonym/ fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType fieldType name=text_stem class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SnowballPorterFilterFactory/ /analyzer /fieldType fieldType name=text_phonetic class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.PhoneticFilterFactory encoder=Soundex inject=false/ /analyzer /fieldType fieldType name=text_synonym class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ /analyzer /fieldType We are indexing documents in Solr using Solrnet and have a requirement to support Phonetic Search based on the Soundex algorithm. Once we have indexed documents, we can search in the Solr Admin Panel using a Phonetic query and the relevant document is returned in the Search Results but the highlight collection is blank. Eg. Use Case: -- We index a text document which contains the word electromagnetic(Soundex Code: E423) We execute a Search in the Solr Admin Panel using the following query: ContentSearchPhonetic:electing(Soundex Code: E423). The Search shows one document returned but the highlight collection is blank. Solr is definitely using the Phonetic Soundex algorithm to locate the document as the word electing is not present in the document. But somehow it is not being able to return the highlight data. The same schema and config can successfully return documents along with highlight data for other approximate searches like synonym, fuzzy or stemming. Only for phonetic search, we are not getting the highlight data. The screenshot from the Solr Admin Panle is shown below: http://lucene.472066.n3.nabble.com/file/n4075492/HighlightIssue.png -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Phonetic-Search-returning-documents-but-not-Highlight-Information-tp4075492.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Total Term Frequency per ResultSet in Solr 4.3 ?
Sorry, but there is no such feature in Solr at this time - you would have to do it manually, either by retrieving all of the results or by writing a custom value source (function) that does the desired calculation within Solr. Feel free to file a Jira for suggesting such a new feature/improvement. -- Jack Krupansky -Original Message- From: Tony Mullins Sent: Thursday, July 04, 2013 9:45 AM To: solr-user@lucene.apache.org Subject: Total Term Frequency per ResultSet in Solr 4.3 ? Hi , I have lots of crawled data, indexed in my Solr (4.3.0) and lets say user creates a search criteria 'X1' and he/she wants to know the occurrence of a specific term in the result set of that 'X1' search criteria. And then again he/she creates another search criteria 'X2' and he/she wants to know the occurrence of that same term in the result set of that 'X2' search criteria. At the moment if I give termfreq(field,term) then it gives me the term frequency per document and if I use totaltermfreq(field,term), it gives me the total term frequency in entire index not in the result set of my search criteria. So what I need is your help to find how to how to get total occurrence of a term in query's result set. If this is my result set doc str name=typeMovies/str str name=formatdvd/str str name=productThe Hunger Games/str/doc doc str name=typeBooks/str str name=formatpaperback/str str name=productThe Hunger Book/str/doc And I am looking for term 'hunger' in product field then I want to get value = '2' , and if I am searching for term 'games' in product field I want to get value = '1' . Thanks, Tony
Re: SOLR 4.0 frequent admin problem
Yes :-) see SOLR-118, seems an old issue... On 4 Jul 2013 06:43, David Quarterman da...@corexe.com wrote: Hi, About once a week the admin system comes up with SolrCore Initialization Failures. There's nothing in the logs and SOLR continues to work in the application it's supporting and in the 'direct access' mode (i.e. http://123.465.789.100:8080/solr/collection1/select?q=bingo:*). The cure is to restart Jetty (8.1.7) and then we can use the admin system again via pc's. However, a colleague can get into admin on an iPad with no trouble when no browser on a pc can! Anyone any ideas? It's really frustrating! Best regards, DQ
Re: Find related words
You can take a look at the MoreLikeThis/Find Similar feature. That gives you an approximation, but using documents rather than discrete terms. You would have to write a custom component of your own based on logic from MLT. -- Jack Krupansky -Original Message- From: Dotan Cohen Sent: Thursday, July 04, 2013 8:09 AM To: solr-user@lucene.apache.org Subject: Find related words How might one find the top related words for a given word in a Solr index? For instance, given the following single-field documents: 1: I love chocolate 2: I love Solr 3: I eat chocolate cake 4: You will eat chocolate candy Thus, given the word Chocolate Solr might find these top words: I (3 times matched) eat (2 times matched) love, cake, you, will, candy (1 time each) Thanks! -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Total Term Frequency per ResultSet in Solr 4.3 ?
If you just want to retrieve those counts, this seems like simple faceting. q=something facet=true facet.query=product:hunger facet.query=product:games -Yonik http://lucidworks.com On Thu, Jul 4, 2013 at 9:45 AM, Tony Mullins tonymullins...@gmail.com wrote: Hi , I have lots of crawled data, indexed in my Solr (4.3.0) and lets say user creates a search criteria 'X1' and he/she wants to know the occurrence of a specific term in the result set of that 'X1' search criteria. And then again he/she creates another search criteria 'X2' and he/she wants to know the occurrence of that same term in the result set of that 'X2' search criteria. At the moment if I give termfreq(field,term) then it gives me the term frequency per document and if I use totaltermfreq(field,term), it gives me the total term frequency in entire index not in the result set of my search criteria. So what I need is your help to find how to how to get total occurrence of a term in query's result set. If this is my result set doc str name=typeMovies/str str name=formatdvd/str str name=productThe Hunger Games/str/doc doc str name=typeBooks/str str name=formatpaperback/str str name=productThe Hunger Book/str/doc And I am looking for term 'hunger' in product field then I want to get value = '2' , and if I am searching for term 'games' in product field I want to get value = '1' . Thanks, Tony
Re: Find related words
You may want collocations a given word? I've implemented LUCENE-474 for Solr a while ago and I found it worked pretty well. https://issues.apache.org/jira/browse/LUCENE-474 Hope this helps. koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html (13/07/04 21:09), Dotan Cohen wrote: How might one find the top related words for a given word in a Solr index? For instance, given the following single-field documents: 1: I love chocolate 2: I love Solr 3: I eat chocolate cake 4: You will eat chocolate candy Thus, given the word Chocolate Solr might find these top words: I (3 times matched) eat (2 times matched) love, cake, you, will, candy (1 time each) Thanks! -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Auto Soft commit not working !!!
My solr config has : autoCommit maxTime15000/maxTime openSearcherfalse/openSearcher /autoCommit !-- softAutoCommit is like autoCommit except it causes a 'soft' commit which only ensures that changes are visible but does not ensure that data is synced to disk. This is faster and more near-realtime friendly than a hard commit. -- autoSoftCommit maxTime1000/maxTime /autoSoftCommit Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running over tomcat. Still when i am adding documents to solr and searching its returning 0 hits. Its taking long before the document actually starts showing up. Can somebody help. Thanks
Re: Find related words
Thank you Jack and Koji. I will take a look at MLT and also at the .zip files from LUCENE-474. Koji, did you have to modify the code for the latest Solr? -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Total Term Frequency per ResultSet in Solr 4.3 ?
Hi Yonik, With facet it didn't work. Please see the result set doc below http://localhost:8080/solr/collection2/select?fl=*,amazing_freq:termfreq%28product,%27amazing%27%29,spider_freq:termfreq%28product,%27spider%27%29fq=id%3A27q=spiderfl=*df=productwt=xmlindent=truefacet=truefacet.query=product:spiderfacet.query=product:amazingrows=20 doc str name=id27/str str name=typeMovies/str str name=formatdvd/str str name=productThe amazing spider man is amazing spider the spider/str int name=popularity1/int long name=_version_1439641369145507840/long int name=amazing_freq2/int int name=spider_freq3/int /doc /resultlst name=facet_countslst name=facet_queries int name=product:spider1/int int name=product:amazing1/int /lst As you can see facet is actually just returning the no. of docs found against those keywrods not the actual frequency. Actual frequency is returned by the field 'amazing_freq' 'spider_freq' ! So is there any workaround for this to get the total of term-frequency in resultset without any modification to Solr source code ? Thanks, Tony On Thu, Jul 4, 2013 at 7:05 PM, Yonik Seeley yo...@lucidworks.com wrote: If you just want to retrieve those counts, this seems like simple faceting. q=something facet=true facet.query=product:hunger facet.query=product:games -Yonik http://lucidworks.com On Thu, Jul 4, 2013 at 9:45 AM, Tony Mullins tonymullins...@gmail.com wrote: Hi , I have lots of crawled data, indexed in my Solr (4.3.0) and lets say user creates a search criteria 'X1' and he/she wants to know the occurrence of a specific term in the result set of that 'X1' search criteria. And then again he/she creates another search criteria 'X2' and he/she wants to know the occurrence of that same term in the result set of that 'X2' search criteria. At the moment if I give termfreq(field,term) then it gives me the term frequency per document and if I use totaltermfreq(field,term), it gives me the total term frequency in entire index not in the result set of my search criteria. So what I need is your help to find how to how to get total occurrence of a term in query's result set. If this is my result set doc str name=typeMovies/str str name=formatdvd/str str name=productThe Hunger Games/str/doc doc str name=typeBooks/str str name=formatpaperback/str str name=productThe Hunger Book/str/doc And I am looking for term 'hunger' in product field then I want to get value = '2' , and if I am searching for term 'games' in product field I want to get value = '1' . Thanks, Tony
Re: Total Term Frequency per ResultSet in Solr 4.3 ?
Ah, sorry - I thought you were after docfreq, not termfreq. -Yonik http://lucidworks.com On Thu, Jul 4, 2013 at 10:57 AM, Tony Mullins tonymullins...@gmail.com wrote: Hi Yonik, With facet it didn't work. Please see the result set doc below http://localhost:8080/solr/collection2/select?fl=*,amazing_freq:termfreq%28product,%27amazing%27%29,spider_freq:termfreq%28product,%27spider%27%29fq=id%3A27q=spiderfl=*df=productwt=xmlindent=truefacet=truefacet.query=product:spiderfacet.query=product:amazingrows=20 doc str name=id27/str str name=typeMovies/str str name=formatdvd/str str name=productThe amazing spider man is amazing spider the spider/str int name=popularity1/int long name=_version_1439641369145507840/long int name=amazing_freq2/int int name=spider_freq3/int /doc /resultlst name=facet_countslst name=facet_queries int name=product:spider1/int int name=product:amazing1/int /lst As you can see facet is actually just returning the no. of docs found against those keywrods not the actual frequency. Actual frequency is returned by the field 'amazing_freq' 'spider_freq' ! So is there any workaround for this to get the total of term-frequency in resultset without any modification to Solr source code ? Thanks, Tony On Thu, Jul 4, 2013 at 7:05 PM, Yonik Seeley yo...@lucidworks.com wrote: If you just want to retrieve those counts, this seems like simple faceting. q=something facet=true facet.query=product:hunger facet.query=product:games -Yonik http://lucidworks.com On Thu, Jul 4, 2013 at 9:45 AM, Tony Mullins tonymullins...@gmail.com wrote: Hi , I have lots of crawled data, indexed in my Solr (4.3.0) and lets say user creates a search criteria 'X1' and he/she wants to know the occurrence of a specific term in the result set of that 'X1' search criteria. And then again he/she creates another search criteria 'X2' and he/she wants to know the occurrence of that same term in the result set of that 'X2' search criteria. At the moment if I give termfreq(field,term) then it gives me the term frequency per document and if I use totaltermfreq(field,term), it gives me the total term frequency in entire index not in the result set of my search criteria. So what I need is your help to find how to how to get total occurrence of a term in query's result set. If this is my result set doc str name=typeMovies/str str name=formatdvd/str str name=productThe Hunger Games/str/doc doc str name=typeBooks/str str name=formatpaperback/str str name=productThe Hunger Book/str/doc And I am looking for term 'hunger' in product field then I want to get value = '2' , and if I am searching for term 'games' in product field I want to get value = '1' . Thanks, Tony
RE: SOLR 4.0 frequent admin problem
Cheers, Roman! It was a default Jetty set up so now added a 'work' directory and that's in use now. -Original Message- From: Roman Chyla [mailto:roman.ch...@gmail.com] Sent: 04 July 2013 15:00 To: solr-user@lucene.apache.org Subject: Re: SOLR 4.0 frequent admin problem Yes :-) see SOLR-118, seems an old issue... On 4 Jul 2013 06:43, David Quarterman da...@corexe.com wrote: Hi, About once a week the admin system comes up with SolrCore Initialization Failures. There's nothing in the logs and SOLR continues to work in the application it's supporting and in the 'direct access' mode (i.e. http://123.465.789.100:8080/solr/collection1/select?q=bingo:*). The cure is to restart Jetty (8.1.7) and then we can use the admin system again via pc's. However, a colleague can get into admin on an iPad with no trouble when no browser on a pc can! Anyone any ideas? It's really frustrating! Best regards, DQ
Re: Total Term Frequency per ResultSet in Solr 4.3 ?
So what is the workaround for this problem ? Can it be done without changing any source code ? Thanks, Tony On Thu, Jul 4, 2013 at 8:01 PM, Yonik Seeley yo...@lucidworks.com wrote: Ah, sorry - I thought you were after docfreq, not termfreq. -Yonik http://lucidworks.com On Thu, Jul 4, 2013 at 10:57 AM, Tony Mullins tonymullins...@gmail.com wrote: Hi Yonik, With facet it didn't work. Please see the result set doc below http://localhost:8080/solr/collection2/select?fl=*,amazing_freq:termfreq%28product,%27amazing%27%29,spider_freq:termfreq%28product,%27spider%27%29fq=id%3A27q=spiderfl=*df=productwt=xmlindent=truefacet=truefacet.query=product:spiderfacet.query=product:amazingrows=20 doc str name=id27/str str name=typeMovies/str str name=formatdvd/str str name=productThe amazing spider man is amazing spider the spider/str int name=popularity1/int long name=_version_1439641369145507840/long int name=amazing_freq2/int int name=spider_freq3/int /doc /resultlst name=facet_countslst name=facet_queries int name=product:spider1/int int name=product:amazing1/int /lst As you can see facet is actually just returning the no. of docs found against those keywrods not the actual frequency. Actual frequency is returned by the field 'amazing_freq' 'spider_freq' ! So is there any workaround for this to get the total of term-frequency in resultset without any modification to Solr source code ? Thanks, Tony On Thu, Jul 4, 2013 at 7:05 PM, Yonik Seeley yo...@lucidworks.com wrote: If you just want to retrieve those counts, this seems like simple faceting. q=something facet=true facet.query=product:hunger facet.query=product:games -Yonik http://lucidworks.com On Thu, Jul 4, 2013 at 9:45 AM, Tony Mullins tonymullins...@gmail.com wrote: Hi , I have lots of crawled data, indexed in my Solr (4.3.0) and lets say user creates a search criteria 'X1' and he/she wants to know the occurrence of a specific term in the result set of that 'X1' search criteria. And then again he/she creates another search criteria 'X2' and he/she wants to know the occurrence of that same term in the result set of that 'X2' search criteria. At the moment if I give termfreq(field,term) then it gives me the term frequency per document and if I use totaltermfreq(field,term), it gives me the total term frequency in entire index not in the result set of my search criteria. So what I need is your help to find how to how to get total occurrence of a term in query's result set. If this is my result set doc str name=typeMovies/str str name=formatdvd/str str name=productThe Hunger Games/str/doc doc str name=typeBooks/str str name=formatpaperback/str str name=productThe Hunger Book/str/doc And I am looking for term 'hunger' in product field then I want to get value = '2' , and if I am searching for term 'games' in product field I want to get value = '1' . Thanks, Tony
Re: Total Term Frequency per ResultSet in Solr 4.3 ?
These statistics are use for determining document relevance or score for the query itself. As such, they are one of two things: 1) (per field) per document, or for the universe of documents in the collection. That's it, one of the two. You keep referring to ResultSet, but there is no such concept in relevancy or scoring, at least in the Lucene model for relevancy and scoring. If you might more details on Lucene/Solr scoring, see: http://lucene.apache.org/core/4_3_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html Feel free to propose an alternative model to relevancy and scoring, but don't expect an implementation of such a model in the near-term. You might also be able to implement your alternative model for relevance and scoring using a custom Similarity (scoring) plug-in, coupled with custom Value Sources to expose whatever alternative metrics you wish. But, before you embark on such a venture, be aware that the performance of such an alternative relevance model might not be as appealing as you might want. You'll have to do a proof of concept to see how well things actually work out. -- Jack Krupansky -Original Message- From: Tony Mullins Sent: Thursday, July 04, 2013 12:24 PM To: solr-user@lucene.apache.org Subject: Re: Total Term Frequency per ResultSet in Solr 4.3 ? So what is the workaround for this problem ? Can it be done without changing any source code ? Thanks, Tony On Thu, Jul 4, 2013 at 8:01 PM, Yonik Seeley yo...@lucidworks.com wrote: Ah, sorry - I thought you were after docfreq, not termfreq. -Yonik http://lucidworks.com On Thu, Jul 4, 2013 at 10:57 AM, Tony Mullins tonymullins...@gmail.com wrote: Hi Yonik, With facet it didn't work. Please see the result set doc below http://localhost:8080/solr/collection2/select?fl=*,amazing_freq:termfreq%28product,%27amazing%27%29,spider_freq:termfreq%28product,%27spider%27%29fq=id%3A27q=spiderfl=*df=productwt=xmlindent=truefacet=truefacet.query=product:spiderfacet.query=product:amazingrows=20 doc str name=id27/str str name=typeMovies/str str name=formatdvd/str str name=productThe amazing spider man is amazing spider the spider/str int name=popularity1/int long name=_version_1439641369145507840/long int name=amazing_freq2/int int name=spider_freq3/int /doc /resultlst name=facet_countslst name=facet_queries int name=product:spider1/int int name=product:amazing1/int /lst As you can see facet is actually just returning the no. of docs found against those keywrods not the actual frequency. Actual frequency is returned by the field 'amazing_freq' 'spider_freq' ! So is there any workaround for this to get the total of term-frequency in resultset without any modification to Solr source code ? Thanks, Tony On Thu, Jul 4, 2013 at 7:05 PM, Yonik Seeley yo...@lucidworks.com wrote: If you just want to retrieve those counts, this seems like simple faceting. q=something facet=true facet.query=product:hunger facet.query=product:games -Yonik http://lucidworks.com On Thu, Jul 4, 2013 at 9:45 AM, Tony Mullins tonymullins...@gmail.com wrote: Hi , I have lots of crawled data, indexed in my Solr (4.3.0) and lets say user creates a search criteria 'X1' and he/she wants to know the occurrence of a specific term in the result set of that 'X1' search criteria. And then again he/she creates another search criteria 'X2' and he/she wants to know the occurrence of that same term in the result set of that 'X2' search criteria. At the moment if I give termfreq(field,term) then it gives me the term frequency per document and if I use totaltermfreq(field,term), it gives me the total term frequency in entire index not in the result set of my search criteria. So what I need is your help to find how to how to get total occurrence of a term in query's result set. If this is my result set doc str name=typeMovies/str str name=formatdvd/str str name=productThe Hunger Games/str/doc doc str name=typeBooks/str str name=formatpaperback/str str name=productThe Hunger Book/str/doc And I am looking for term 'hunger' in product field then I want to get value = '2' , and if I am searching for term 'games' in product field I want to get value = '1' . Thanks, Tony
Re: Auto Soft commit not working !!!
You should see the commit messages in the solr logs, do they come up at the expected frequency? On 4 July 2013 15:35, Rohit Kumar rohit.kku...@gmail.com wrote: My solr config has : autoCommit maxTime15000/maxTime openSearcherfalse/openSearcher /autoCommit !-- softAutoCommit is like autoCommit except it causes a 'soft' commit which only ensures that changes are visible but does not ensure that data is synced to disk. This is faster and more near-realtime friendly than a hard commit. -- autoSoftCommit maxTime1000/maxTime /autoSoftCommit Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running over tomcat. Still when i am adding documents to solr and searching its returning 0 hits. Its taking long before the document actually starts showing up. Can somebody help. Thanks
Re: Total Term Frequency per ResultSet in Solr 4.3 ?
Hi Tony, Have you seen the TermVectorComponenthttp://wiki.apache.org/solr/TermVectorComponent? It will return the TermVectors for the documents in your result set (note that the rows parameter matters if you want results for the whole set, the default is 10). TermVectors also must be stored for each field that you want term frequency returned for. Suppose you have the query http://localhost:8983/solr/collection1/tvrh?q=cablefl=includestv.tf=true on the example that comes packaged with Solr. Then part of the response is: lst name=termVectors str name=uniqueKeyFieldNameid/str lst name=IW-02 str name=uniqueKeyIW-02/str /lst lst name=9885A004 str name=uniqueKey9885A004/str lst name=includes lst name=32mb int name=tf1/int /lst lst name=av int name=tf1/int /lst lst name=battery int name=tf1/int /lst lst name=cable int name=tf2/int /lst lst name=card int name=tf1/int /lst lst name=sd int name=tf1/int /lst lst name=usb int name=tf1/int /lst /lst /lst lst name=3007WFP str name=uniqueKey3007WFP/str lst name=includes lst name=cable int name=tf1/int /lst lst name=usb int name=tf1/int /lst /lst /lst lst name=MA147LL/A str name=uniqueKeyMA147LL/A/str lst name=includes lst name=cable int name=tf1/int /lst lst name=earbud int name=tf1/int /lst lst name=headphones int name=tf1/int /lst lst name=usb int name=tf1/int /lst /lst /lst /lst Then you can use an XPath query like sum(//lst[@name='cable']/int[@name='tf']) where 'cable' was the term, to calculate the term frequency in the 'includes' field for the whole result set. You could extend this to get the term frequency across all fields for your result set with some alterations to the query and schema.xml configuration. Alternately you could get the response as json (wt=json) and use javascript to sum. I know this is not terribly efficient but, if I'm understanding your request correctly, it's possible. Cheers, Tricia On Thu, Jul 4, 2013 at 10:24 AM, Tony Mullins tonymullins...@gmail.comwrote: So what is the workaround for this problem ? Can it be done without changing any source code ? Thanks, Tony On Thu, Jul 4, 2013 at 8:01 PM, Yonik Seeley yo...@lucidworks.com wrote: Ah, sorry - I thought you were after docfreq, not termfreq. -Yonik http://lucidworks.com On Thu, Jul 4, 2013 at 10:57 AM, Tony Mullins tonymullins...@gmail.com wrote: Hi Yonik, With facet it didn't work. Please see the result set doc below http://localhost:8080/solr/collection2/select?fl=*,amazing_freq:termfreq%28product,%27amazing%27%29,spider_freq:termfreq%28product,%27spider%27%29fq=id%3A27q=spiderfl=*df=productwt=xmlindent=truefacet=truefacet.query=product:spiderfacet.query=product:amazingrows=20 doc str name=id27/str str name=typeMovies/str str name=formatdvd/str str name=productThe amazing spider man is amazing spider the spider/str int name=popularity1/int long name=_version_1439641369145507840/long int name=amazing_freq2/int int name=spider_freq3/int /doc /resultlst name=facet_countslst name=facet_queries int name=product:spider1/int int name=product:amazing1/int /lst As you can see facet is actually just returning the no. of docs found against those keywrods not the actual frequency. Actual frequency is returned by the field 'amazing_freq' 'spider_freq' ! So is there any workaround for this to get the total of term-frequency in resultset without any modification to Solr source code ? Thanks, Tony On Thu, Jul 4, 2013 at 7:05 PM, Yonik Seeley yo...@lucidworks.com wrote: If you just want to retrieve those counts, this seems like simple faceting. q=something facet=true facet.query=product:hunger facet.query=product:games -Yonik http://lucidworks.com On Thu, Jul 4, 2013 at 9:45 AM, Tony Mullins tonymullins...@gmail.com wrote: Hi , I have lots of crawled data, indexed in my Solr (4.3.0) and lets say user creates a search criteria 'X1' and he/she wants to know the occurrence of a specific term in the result set of that 'X1' search criteria. And then again he/she creates another search criteria 'X2' and he/she wants to know the occurrence of that same term in the result set of that 'X2' search criteria. At the moment if I give termfreq(field,term) then it gives me the term frequency per document and if I use totaltermfreq(field,term), it gives me the total term frequency in entire index not in the result set of my search criteria. So what I need is your help to find how to how to get total occurrence of a term in query's result set. If this is my result set doc str name=typeMovies/str str name=formatdvd/str str name=productThe Hunger Games/str/doc doc str name=typeBooks/str str name=formatpaperback/str str name=productThe
Re: Find related words
Hi Dotan, (13/07/04 23:51), Dotan Cohen wrote: Thank you Jack and Koji. I will take a look at MLT and also at the .zip files from LUCENE-474. Koji, did you have to modify the code for the latest Solr? Yes. As the Lucene APIs for accessing index have been changed, I had to modify the code. koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html
Re: Auto Soft commit not working !!!
I checked with the tomcat logs. Although the config says it to commit every 15000ms autoCommit maxTime15000/maxTime openSearcherfalse/openSearcher /autoCommit Strangely there are no commit logs. Did i miss anything? - Having issues in Soft Auto commit (Near Real Time). Am using solr 4.0 on tomcat . The index size is 10.95 GB. With this configuration it takes more than 60 seconds to return the indexed document. When adding documents to solr and searching after soft commit time, its returning 0 hits. Its taking long before the document actually starts showing up, even more than the autoCommit interval. autoCommit maxTime15000/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime1000/maxTime /autoSoftCommit Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running over tomcat. On Fri, Jul 5, 2013 at 12:13 AM, Daniel Collins danwcoll...@gmail.comwrote: You should see the commit messages in the solr logs, do they come up at the expected frequency? On 4 July 2013 15:35, Rohit Kumar rohit.kku...@gmail.com wrote: My solr config has : autoCommit maxTime15000/maxTime openSearcherfalse/openSearcher /autoCommit !-- softAutoCommit is like autoCommit except it causes a 'soft' commit which only ensures that changes are visible but does not ensure that data is synced to disk. This is faster and more near-realtime friendly than a hard commit. -- autoSoftCommit maxTime1000/maxTime /autoSoftCommit Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running over tomcat. Still when i am adding documents to solr and searching its returning 0 hits. Its taking long before the document actually starts showing up. Can somebody help. Thanks
Re: Auto Soft commit not working !!!
1. Do you have an update processor chain that doesn't have RunUpdate in it? 2. Is the updateLog solrconfig directive missing? 3. Is _version_ missing from your schema? -- Jack Krupansky -Original Message- From: Rohit Kumar Sent: Thursday, July 04, 2013 9:22 PM To: solr-user@lucene.apache.org Subject: Re: Auto Soft commit not working !!! I checked with the tomcat logs. Although the config says it to commit every 15000ms autoCommit maxTime15000/maxTime openSearcherfalse/openSearcher /autoCommit Strangely there are no commit logs. Did i miss anything? - Having issues in Soft Auto commit (Near Real Time). Am using solr 4.0 on tomcat . The index size is 10.95 GB. With this configuration it takes more than 60 seconds to return the indexed document. When adding documents to solr and searching after soft commit time, its returning 0 hits. Its taking long before the document actually starts showing up, even more than the autoCommit interval. autoCommit maxTime15000/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime1000/maxTime /autoSoftCommit Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running over tomcat. On Fri, Jul 5, 2013 at 12:13 AM, Daniel Collins danwcoll...@gmail.comwrote: You should see the commit messages in the solr logs, do they come up at the expected frequency? On 4 July 2013 15:35, Rohit Kumar rohit.kku...@gmail.com wrote: My solr config has : autoCommit maxTime15000/maxTime openSearcherfalse/openSearcher /autoCommit !-- softAutoCommit is like autoCommit except it causes a 'soft' commit which only ensures that changes are visible but does not ensure that data is synced to disk. This is faster and more near-realtime friendly than a hard commit. -- autoSoftCommit maxTime1000/maxTime /autoSoftCommit Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running over tomcat. Still when i am adding documents to solr and searching its returning 0 hits. Its taking long before the document actually starts showing up. Can somebody help. Thanks
Re: Moving from single Solr instance to Solr Cloud
Hello, In SolrCloud works Collections (logical indices) have shards and replicas, so you would probably want to create a new Collection with some number of shards and replicas and reindex into it. That would be the cleanest. Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Wed, Jul 3, 2013 at 9:10 PM, Ali, Saqib docbook@gmail.com wrote: We have single Solr instance with lot of indexed document. Now we would like to move to SolrCloud implementation. Can we move the existing index to SolrCloud? If so, how? Or do we need to reindex our data in SolrCloud? Thanks, Saqib
Re: Auto Soft commit not working !!!
1. Do you have an update processor chain that doesn't have RunUpdate in it?*- No * 2. Is the updateLog solrconfig directive missing? - *Bang On. It was still commented !!!* 3. Is _version_ missing from your schema? *Checked it. and its present * *I will test again and update soon . * *Thanks * On Fri, Jul 5, 2013 at 8:30 AM, Jack Krupansky j...@basetechnology.comwrote: 1. Do you have an update processor chain that doesn't have RunUpdate in it? 2. Is the updateLog solrconfig directive missing? 3. Is _version_ missing from your schema? -- Jack Krupansky -Original Message- From: Rohit Kumar Sent: Thursday, July 04, 2013 9:22 PM To: solr-user@lucene.apache.org Subject: Re: Auto Soft commit not working !!! I checked with the tomcat logs. Although the config says it to commit every 15000ms autoCommit maxTime15000/maxTime openSearcherfalse/**openSearcher /autoCommit Strangely there are no commit logs. Did i miss anything? --**--** - Having issues in Soft Auto commit (Near Real Time). Am using solr 4.0 on tomcat . The index size is 10.95 GB. With this configuration it takes more than 60 seconds to return the indexed document. When adding documents to solr and searching after soft commit time, its returning 0 hits. Its taking long before the document actually starts showing up, even more than the autoCommit interval. autoCommit maxTime15000/maxTime openSearcherfalse/**openSearcher /autoCommit autoSoftCommit maxTime1000/maxTime /autoSoftCommit Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running over tomcat. On Fri, Jul 5, 2013 at 12:13 AM, Daniel Collins danwcoll...@gmail.com wrote: You should see the commit messages in the solr logs, do they come up at the expected frequency? On 4 July 2013 15:35, Rohit Kumar rohit.kku...@gmail.com wrote: My solr config has : autoCommit maxTime15000/maxTime openSearcherfalse/**openSearcher /autoCommit !-- softAutoCommit is like autoCommit except it causes a 'soft' commit which only ensures that changes are visible but does not ensure that data is synced to disk. This is faster and more near-realtime friendly than a hard commit. -- autoSoftCommit maxTime1000/maxTime /autoSoftCommit Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running over tomcat. Still when i am adding documents to solr and searching its returning 0 hits. Its taking long before the document actually starts showing up. Can somebody help. Thanks
Early Access Release #2 for Solr 4.x Deep Dive book is now available for download on Lulu.com
Okay, it’s hot off the e-presses: Solr 4.x Deep Dive, Early Access Release #2 is now available for purchase and download as an e-book for $9.99 on Lulu.com at: http://www.lulu.com/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-1/ebook/product-21079719.html (That link says “1”, but it apparently correctly redirects to EAR #2.) My recent blog posts over the past two weeks detailed the changes from EAR#1. A lot of them were formatting and indexing, but a couple more scripting update processor examples, and a new “Solr Hot Spots” preface section to point the reader to interesting sections worth checking out, such as the grammars for the various query parsers, a complete list of functions, and complete lists of char filters, tokenizers, token filters, and update processors. See: http://basetechnology.blogspot.com/ The next EAR will be in approximately two weeks, contents TBD. If you have purchased EAR#1, there is no need to rush out and pick up EAR#2. I mean, the technical content changes were only modest, and EAR#3 will be out in another two weeks anyway. That said, EAR#2 is a significant improvement over EAR#1. -- Jack Krupansky
Re: Concurrent Modification Exception
Can you repeat the test with for example jetty? In case jboss (?) has some issues in the case. What type of query was this? On 2 Jul 2013 19:27, adityab aditya_ba...@yahoo.com wrote: Anyone , any suggestion or pointers for this issue? -- View this message in context: http://lucene.472066.n3.nabble.com/Concurrent-Modification-Exception-tp4074371p4074829.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Total Term Frequency per ResultSet in Solr 4.3 ?
OK. Thanks Tricia , Jack Yonik for your suggestions and time. Regards, Tony. On Fri, Jul 5, 2013 at 1:20 AM, P Williams williams.tricia.l...@gmail.comwrote: Hi Tony, Have you seen the TermVectorComponenthttp://wiki.apache.org/solr/TermVectorComponent? It will return the TermVectors for the documents in your result set (note that the rows parameter matters if you want results for the whole set, the default is 10). TermVectors also must be stored for each field that you want term frequency returned for. Suppose you have the query http://localhost:8983/solr/collection1/tvrh?q=cablefl=includestv.tf=trueon the example that comes packaged with Solr. Then part of the response is: lst name=termVectors str name=uniqueKeyFieldNameid/str lst name=IW-02 str name=uniqueKeyIW-02/str /lst lst name=9885A004 str name=uniqueKey9885A004/str lst name=includes lst name=32mb int name=tf1/int /lst lst name=av int name=tf1/int /lst lst name=battery int name=tf1/int /lst lst name=cable int name=tf2/int /lst lst name=card int name=tf1/int /lst lst name=sd int name=tf1/int /lst lst name=usb int name=tf1/int /lst /lst /lst lst name=3007WFP str name=uniqueKey3007WFP/str lst name=includes lst name=cable int name=tf1/int /lst lst name=usb int name=tf1/int /lst /lst /lst lst name=MA147LL/A str name=uniqueKeyMA147LL/A/str lst name=includes lst name=cable int name=tf1/int /lst lst name=earbud int name=tf1/int /lst lst name=headphones int name=tf1/int /lst lst name=usb int name=tf1/int /lst /lst /lst /lst Then you can use an XPath query like sum(//lst[@name='cable']/int[@name='tf']) where 'cable' was the term, to calculate the term frequency in the 'includes' field for the whole result set. You could extend this to get the term frequency across all fields for your result set with some alterations to the query and schema.xml configuration. Alternately you could get the response as json (wt=json) and use javascript to sum. I know this is not terribly efficient but, if I'm understanding your request correctly, it's possible. Cheers, Tricia On Thu, Jul 4, 2013 at 10:24 AM, Tony Mullins tonymullins...@gmail.com wrote: So what is the workaround for this problem ? Can it be done without changing any source code ? Thanks, Tony On Thu, Jul 4, 2013 at 8:01 PM, Yonik Seeley yo...@lucidworks.com wrote: Ah, sorry - I thought you were after docfreq, not termfreq. -Yonik http://lucidworks.com On Thu, Jul 4, 2013 at 10:57 AM, Tony Mullins tonymullins...@gmail.com wrote: Hi Yonik, With facet it didn't work. Please see the result set doc below http://localhost:8080/solr/collection2/select?fl=*,amazing_freq:termfreq%28product,%27amazing%27%29,spider_freq:termfreq%28product,%27spider%27%29fq=id%3A27q=spiderfl=*df=productwt=xmlindent=truefacet=truefacet.query=product:spiderfacet.query=product:amazingrows=20 doc str name=id27/str str name=typeMovies/str str name=formatdvd/str str name=productThe amazing spider man is amazing spider the spider/str int name=popularity1/int long name=_version_1439641369145507840/long int name=amazing_freq2/int int name=spider_freq3/int /doc /resultlst name=facet_countslst name=facet_queries int name=product:spider1/int int name=product:amazing1/int /lst As you can see facet is actually just returning the no. of docs found against those keywrods not the actual frequency. Actual frequency is returned by the field 'amazing_freq' 'spider_freq' ! So is there any workaround for this to get the total of term-frequency in resultset without any modification to Solr source code ? Thanks, Tony On Thu, Jul 4, 2013 at 7:05 PM, Yonik Seeley yo...@lucidworks.com wrote: If you just want to retrieve those counts, this seems like simple faceting. q=something facet=true facet.query=product:hunger facet.query=product:games -Yonik http://lucidworks.com On Thu, Jul 4, 2013 at 9:45 AM, Tony Mullins tonymullins...@gmail.com wrote: Hi , I have lots of crawled data, indexed in my Solr (4.3.0) and lets say user creates a search criteria 'X1' and he/she wants to know the occurrence of a specific term in the result set of that 'X1' search criteria. And then again he/she creates another search criteria 'X2' and he/she wants to know the occurrence of that same term in the result set of that 'X2' search criteria. At the moment if I give termfreq(field,term) then it gives me the term frequency per document and if I use totaltermfreq(field,term), it gives me the total term frequency in entire index not in the result set of my search criteria. So what I need is your help to