Re: Solr - Make Exact Search on Field with Fuzzy Query
Hi Erickson, Thanks for your valuable reply. Actually we had tried with just storing one field and highlighting on that field all the time , whether we search on it or not. It sometimes occurs issue , like if i search with the term : 'hospitality' . and I use field for highlighting , which having stemming applied. it returns me highlights with 'hospital' , 'hospitality'. whether it should return highlighting only on 'hospitality' as I am doing exact term search, can you suggest anything on this?? If we can eliminate this issue while highlighting on original field (having applied stemming on it). The other solutions are sounds really good, but as you said they are hard to implement and we at this point , wanted to implement inbuilt solutions if possible. Please suggest if we can eliminate above explained issue on highlighting. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Make-Exact-Search-on-Field-with-Fuzzy-Query-tp4012888p4013067.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question about MoreLikeThis query with solrj
Hi, Are you using a correct stopword file for the French language ? It is very importante in order the the MLT component works fine. You should also take a look at this document. http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/ MLT support in SolrJ is a an old story. May be this can help also. https://issues.apache.org/jira/browse/SOLR-1085 Regards -- Dominique www.eolya.fr www.crawl-anywhere.com www.mysolrserver.com Le 02/10/12 18:14, G.Long a écrit : Hi :) I'm using Solr 3.6.1 and i'm trying to use the similarity features of lucene/solr to compare texts. The content of my documents is in french so I defined a field like : field name=content_mlt type=text_fr termVectors=true indexed=true stored=true/ (it uses the default text_fr fieldType provided with the default schema.xml file) i'm using the following method to query my index : SolrQuery sQuery = new SolrQuery(); sQuery.setQueryType(/ + MoreLikeThisParams.MLT); sQuery.set(MoreLikeThisParams.MATCH_INCLUDE, false); sQuery.set(MoreLikeThisParams.MIN_DOC_FREQ, 1); sQuery.set(MoreLikeThisParams.MIN_TERM_FREQ, 1); sQuery.set(MoreLikeThisParams.MAX_QUERY_TERMS, 50); sQuery.set(MoreLikeThisParams.SIMILARITY_FIELDS, field); sQuery.set(fl, *,id,score); sQuery.setRows(5); sQuery.setQuery(content_mlt:/the content to find/); QueryResponse rsp = server.query(sQuery); return rsp.getResults(); The problem is that the returned results and the associated scores look strange to me. I indexed the three following texts : sample 1 : Le 1° de l'article 81 du CGI exige que les allocations pour frais soient utilisées conformément à leur objet pour être affranchies de l'impôt. Lorsque la réalité du versement des allocations est établie, le bénéficiaire doit cependant être en mesure de justifier de leur utilisation; sample 2: Le premier alinéa du 1° de l'article 81 du CGI prévoit que les rémunérations des journalistes, rédacteurs, photographes, directeurs de journaux et critiques dramatiques et musicaux perçues ès qualités constituent des allocations pour frais d'emploi affranchies d'impôt à concurrence de 7 650 EUR.; sample 3: Par ailleurs, lorsque leur montant est fixé par voie législative, les allocations pour frais prévues au 1° de l'article 81 du CGI sont toujours réputées utilisées conformément à leur objet et ne peuvent donner lieu à aucune vérification de la part de l'administration. Il s'agit d'une présomption irréfragable, qui ne peut donc pas être renversée par la preuve contraire qui serait apportée par l'administration d'une utilisation non conforme à son objet de l'allocation concernée. Pour que le deuxième alinéa du 1° de l'article 81 du CGI s'applique, deux conditions doivent être réunies simultanément : - la nature d'allocation spéciale inhérente à la fonction ou à l'emploi résulte directement de la loi ; - son montant est fixé par la loi; I tried to query the index by passing the first sample as the content to query and the result is the following : MLT result: id: dc3 - score: 0.114195324 (correspond to the sample 3) MLT result: id: dc2 - score: 0.035233106 (correspond to the sample 2) The results don't even contain the first sample, although it is exactly the same text as the one put into the query :/ Any idea of why I get these results? Maybe the query parameters are incorrect or there is something to change in the solr config? Thanks :) Gary
Re: Unique terms without faceting
On Wed, 2012-10-10 at 17:45 +0200, Phil Hoy wrote: I know that you can use a facet query to get the unique terms for a field taking account of any q or fq parameters but for our use case the counts are not needed. So is there a more efficient way of finding just unique terms for a field? Short answer: Not at this moment. If the amount of unique terms is large (millions), a fair amount of temporary memory could be spared by just keeping track of matched terms with a boolean vs. the full int for standard faceting. Reduced memory requirements means less garbage collection and faster processing due to better cache utilization. So yes, there is a more efficient way. Guessing from your other posts, you are building a social network and need to query on surnames and similar large fields. Question is of course how large the payoff will be and if it is worth the investment in development hours. I would suggest hacking the current faceting code to use OpenBitSet instead of int[] and doing performance tests on that. PerSegmentSingleValuedFaceting.SegFacet and UnivertedField.getCounts seems to be the right places to look in Solr 4. Regards, Toke Eskildsen, State and University Library, Denmark
Re: segment number during optimize of index
Hi Lance, My earlier point may be misleading 1. Segments are independent sub-indexes in seperate file, while | indexing | its better to create new segment as it doesnt have to modify an | existing | file. where as while searching, *smaller the segment* the better | it is | since | you open x (not exactly x but xn a value proportional to x) | physical | files | to search if you have got x segments in the index. The smallerwas referencing to the segment number rather than segment size. When you said Large Pages does it mean segment size should be less than a threshold for a better performance from OS point of view? My main concern here is what would be the main disadvantage (indexing or searching) if i merge my entire 150 GB index (right now 100 segments) into a single segment ? On 11 October 2012 07:28, Lance Norskog goks...@gmail.com wrote: Study index merging. This is awesome. http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html Jame- opening lots of segments is not a problem. A major performance problem you will find is 'Large Pages'. This is an operating-system strategy for managing servers with 10s of gigabytes of memory. Without it, all large programs run much more slowly than they could. It is not a Solr or JVM problem. - Original Message - | From: jun Wang wangjun...@gmail.com | To: solr-user@lucene.apache.org | Sent: Wednesday, October 10, 2012 6:36:09 PM | Subject: Re: segment number during optimize of index | | I have an other question, does the number of segment affect speed for | update index? | | 2012/10/10 jame vaalet jamevaa...@gmail.com | | Guys, | thanks for all the inputs, I was continuing my research to know | more about | segments in Lucene. Below are my conclusion, please correct me if | am wrong. | | 1. Segments are independent sub-indexes in seperate file, while | indexing | its better to create new segment as it doesnt have to modify an | existing | file. where as while searching, smaller the segment the better | it is | since | you open x (not exactly x but xn a value proportional to x) | physical | files | to search if you have got x segments in the index. | 2. since lucene has memory map concept, for each file/segment in | index a | new m-map file is created and mapped to the physcial file in | disk. Can | someone explain or correct this in detail, i am sure there are | lot many | people wondering how m-map works while you merge or optimze | index | segments. | | | | On 6 October 2012 07:41, Otis Gospodnetic | otis.gospodne...@gmail.com | wrote: | | If I were you and not knowing all your details... | | I would optimize indices that are static (not being modified) and | would optimize down to 1 segment. | I would do it when search traffic is low. | | Otis | -- | Search Analytics - | http://sematext.com/search-analytics/index.html | Performance Monitoring - http://sematext.com/spm/index.html | | | On Fri, Oct 5, 2012 at 4:27 PM, jame vaalet | jamevaa...@gmail.com | wrote: |Hi Eric, |I am in a major dilemma with my index now. I have got 8 cores |each | around |300 GB in size and half of them are deleted documents in it and |above | that |each has got around 100 segments as well. Do i issue a |expungeDelete | and |allow the merge policy to take care of the segments or optimize |them | into |single segment. Search performance is not at par compared to |usual solr |speed. |If i have to optimize what segment number should i choose? my |RAM size |around 120 GB and JVM heap is around 45 GB (oldGen being 30 |GB). Pleas |advice ! | |thanks. | | |On 6 October 2012 00:00, Erick Erickson |erickerick...@gmail.com | wrote: | |because eventually you'd run out of file handles. Imagine a |long-running server with 100,000 segments. Totally |unmanageable. | |I think shawn was emphasizing that RAM requirements don't |depend on the number of segments. There are other |resources that file consume however. | |Best |Erick | |On Fri, Oct 5, 2012 at 1:08 PM, jame vaalet |jamevaa...@gmail.com | wrote: | hi Shawn, | thanks for the detailed explanation. | I have got one doubt, you said it doesn matter how many | segments | index |have | but then why does solr has this merge policy which merges | segments | frequently? why can it leave the segments as it is rather | than | merging | smaller one's into bigger one? | | thanks | . | | On 5 October 2012 05:46, Shawn Heisey s...@elyograg.org | wrote: | | On 10/4/2012 3:22 PM, jame vaalet wrote: | | so
Re: Auto Correction?
so other than commercial solutions, it seems like i need to have plugin right? i couldnt find any open source solutions yet... Yes you need to implement custom SearchComponent (plugin). http://wiki.apache.org/solr/SearchComponent Or alternatively you can re-search suggestions at client time.
SLOR And OpenNlp integration
Hello, I am a new user of apache solr and i have to integrate opennlp avec solr .The problem is that i dont find a tutorial to do this integration .so i am asking if there is someone who can help me to do this integration ? thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/SLOR-And-OpenNlp-integration-tp4013094.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SLOR And OpenNlp integration
Hi - the wiki page will get you up and running quickly: http://wiki.apache.org/solr/OpenNLP -Original message- From:ahmed ahmed.missaoui...@gmail.com Sent: Thu 11-Oct-2012 13:32 To: solr-user@lucene.apache.org Subject: SLOR And OpenNlp integration Hello, I am a new user of apache solr and i have to integrate opennlp avec solr .The problem is that i dont find a tutorial to do this integration .so i am asking if there is someone who can help me to do this integration ? thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/SLOR-And-OpenNlp-integration-tp4013094.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SLOR And OpenNlp integration
Hi, Thanks for reply i fact i tried this tutorial but when i execute 'ant compile' i have probleme taht class not found despite the class a re their.I dont know wats the probleme -- View this message in context: http://lucene.472066.n3.nabble.com/SLOR-And-OpenNlp-integration-tp4013094p4013101.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr - Make Exact Search on Field with Fuzzy Query
Right, and going the other way (storing and highlighting on the non-stemmed field) would be unsatisfactory due because you'd get a hit on hospital in the stemmed field, but wouldn't highlight it if you searched on hospitality. I really don't see a good solution here. Highlighting seems to be one of those things that's easy in concept but has a zillion ways to go wrong. I guess I'd really just go with the copyField approach unless you can prove that it's really a problem. Perhaps lost in my first e-mail is that storing the field twice doesn't really affect search speed or _search_ requirements at all. Take a look here: http://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/fileformats.html#file-names note that the *.fdt and *.fdx files are where the original raw copy goes (i.e. where data gets written when you specify stored=true) and they are completely independent of the files that contain the searchable data. So unless you're disk-space constrained, the additional storage really doesn't cost you much. Best Erick On Thu, Oct 11, 2012 at 2:31 AM, meghana meghana.rav...@amultek.com wrote: Hi Erickson, Thanks for your valuable reply. Actually we had tried with just storing one field and highlighting on that field all the time , whether we search on it or not. It sometimes occurs issue , like if i search with the term : 'hospitality' . and I use field for highlighting , which having stemming applied. it returns me highlights with 'hospital' , 'hospitality'. whether it should return highlighting only on 'hospitality' as I am doing exact term search, can you suggest anything on this?? If we can eliminate this issue while highlighting on original field (having applied stemming on it). The other solutions are sounds really good, but as you said they are hard to implement and we at this point , wanted to implement inbuilt solutions if possible. Please suggest if we can eliminate above explained issue on highlighting. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Make-Exact-Search-on-Field-with-Fuzzy-Query-tp4012888p4013067.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: anyone have any clues about this exception
Well, you'll actually be able to optimize, it's just called forceMerge. But the point is that optimize seems like something that _of course_ you want to do, when in reality it's not something you usually should do at all. Optimize does two things: 1 merges all the segments into one (usually) 2 removes all of the info associated with deleted documents. Of the two, point 2 is the one that really counts and that's done whenever segment merging is done anyway. So unless you have a very large number of deletes (or updates of the same document), optimize buys you very little. You can tell this by the difference between numDocs and maxDoc in the admin page. So what happens if you just don't bother to optimize? Take a look at merge policy to help control how merging happens perhaps as an alternative. Best Erick On Wed, Oct 10, 2012 at 3:04 PM, Petersen, Robert rober...@buy.com wrote: You could be right. Going back in the logs, I noticed it used to happen less frequently and always towards the end of an optimize operation. It is probably my indexer timing out waiting for updates to occur during optimizes. The errors grew recently due to my upping the indexer threadcount to 22 threads, so there's a lot more timeouts occurring now. Also our index has grown to double the old size so the optimize operation has started taking a lot longer, also contributing to what I'm seeing. I have just changed my optimize frequency from three times a day to one time a day after reading the following: Here they are talking about completely deprecating the optimize command in the next version of solr… https://issues.apache.org/jira/browse/SOLR-3141c -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Wednesday, October 10, 2012 11:10 AM To: solr-user@lucene.apache.org Subject: Re: anyone have any clues about this exception Something timed out, the other end closed the connection. This end tried to write to closed pipe and died, something tried to catch that exception and write its own and died even worse? Just making it up really, but sounds good (plus a 3-year Java tech-support hunch). If it happens often enough, see if you can run WireShark on that machine's network interface and catch the whole network conversation in action. Often, there is enough clues there by looking at tcp packets and/or stuff transmitted. WireShark is a power-tool, so takes a little while the first time, but the learning will pay for itself over and over again. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Oct 10, 2012 at 11:31 PM, Petersen, Robert rober...@buy.com wrote: Tomcat localhost log (not the catalina log) for my solr 3.6.1 (master) instance contains lots of these exceptions but solr itself seems to be doing fine... any ideas? I'm not seeing these exceptions being logged on my slave servers btw, just the master where we do our indexing only. Oct 9, 2012 5:34:11 PM org.apache.catalina.core.StandardWrapperValve invoke SEVERE: Servlet.service() for servlet default threw exception java.lang.IllegalStateException at org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:407) at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:389) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:291) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at com.googlecode.psiprobe.Tomcat60AgentValve.invoke(Tomcat60AgentValve.java:30) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Unknown Source)
Re: unsuscribe
Please follow the instructions here: https://wiki.apache.org/solr/Unsubscribing%20from%20mailing%20lists On Wed, Oct 10, 2012 at 6:03 PM, zMk Bnc zig...@hotmail.com wrote: unsuscribe
Re: SLOR And OpenNlp integration
(12/10/11 20:40), ahmed wrote: Hi, Thanks for reply i fact i tried this tutorial but when i execute 'ant compile' i have probleme taht class not found despite the class a re their.I dont know wats the probleme I think if you attach the error you got helps us to understand your problem. Also before then what do you want to do with Solr and OpenNLP integration? koji -- http://soleami.com/blog/starting-lab-work.html
Re: Unique terms without faceting
Hi, Are you lookig for http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/contrib-misc/org/apache/lucene/misc/HighFreqTerms.html ? Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Thu, Oct 11, 2012 at 4:40 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote: On Wed, 2012-10-10 at 17:45 +0200, Phil Hoy wrote: I know that you can use a facet query to get the unique terms for a field taking account of any q or fq parameters but for our use case the counts are not needed. So is there a more efficient way of finding just unique terms for a field? Short answer: Not at this moment. If the amount of unique terms is large (millions), a fair amount of temporary memory could be spared by just keeping track of matched terms with a boolean vs. the full int for standard faceting. Reduced memory requirements means less garbage collection and faster processing due to better cache utilization. So yes, there is a more efficient way. Guessing from your other posts, you are building a social network and need to query on surnames and similar large fields. Question is of course how large the payoff will be and if it is worth the investment in development hours. I would suggest hacking the current faceting code to use OpenBitSet instead of int[] and doing performance tests on that. PerSegmentSingleValuedFaceting.SegFacet and UnivertedField.getCounts seems to be the right places to look in Solr 4. Regards, Toke Eskildsen, State and University Library, Denmark
Re: SLOR And OpenNlp integration
in fact i dowload the source of solr using svn client then, i execute the path of the opennlp then i do ant compile -lib /usr/share/ivy i got the error [javac] public synchronized Span[] splitSentences(String line) { [javac] ^ [javac] /home/pfe/Téléchargements/dev/trunk/solr/contrib/opennlp/src/java/org/apache/solr/analysis/opennlp/NLPTokenizerOp.java:36: cannot find symbol [javac] symbol : class Tokenizer [javac] location: class org.apache.solr.analysis.opennlp.NLPTokenizerOp [javac] private final Tokenizer tokenizer; [javac] ^ [javac] /home/pfe/Téléchargements/dev/trunk/solr/contrib/opennlp/src/java/org/apache/solr/analysis/opennlp/NLPTokenizerOp.java:38: cannot find symbol [javac] symbol : class TokenizerModel [javac] location: class org.apache.solr.analysis.opennlp.NLPTokenizerOp [javac] public NLPTokenizerOp(TokenizerModel model) { [javac] ^ [javac] /home/pfe/Téléchargements/dev/trunk/solr/contrib/opennlp/src/java/org/apache/solr/analysis/opennlp/NLPTokenizerOp.java:46: cannot find symbol [javac] symbol : class Span [javac] location: class org.apache.solr.analysis.opennlp.NLPTokenizerOp [javac] public synchronized Span[] getTerms(String sentence) { [javac] ^ [javac] /home/pfe/Téléchargements/dev/trunk/solr/contrib/opennlp/src/java/org/apache/solr/analysis/OpenNLPTokenizerFactory.java:26: package opennlp.tools.util does not exist [javac] import opennlp.tools.util.InvalidFormatException; [javac] ^ [javac] /home/pfe/Téléchargements/dev/trunk/solr/contrib/opennlp/src/java/org/apache/solr/analysis/opennlp/OpenNLPOpsFactory.java:9: package opennlp.tools.chunker does not exist [javac] import opennlp.tools.chunker.ChunkerModel; [javac] ^ [javac] 100 errors BUILD FAILED /home/pfe/Téléchargements/dev/trunk/build.xml:112: The following error occurred while executing this line: /home/pfe/Téléchargements/dev/trunk/solr/common-build.xml:419: The following error occurred while executing this line: /home/pfe/Téléchargements/dev/trunk/solr/common-build.xml:410: The following error occurred while executing this line: /home/pfe/Téléchargements/dev/trunk/lucene/common-build.xml:418: The following error occurred while executing this line: /home/pfe/Téléchargements/dev/trunk/lucene/common-build.xml:1482: Compile failed; see the compiler error output for details. I want to apply a sematique analyses for the document thet will be indexed using solr .So solr will index and then analyse content using opennlp instead of tika. -- View this message in context: http://lucene.472066.n3.nabble.com/SLOR-And-OpenNlp-integration-tp4013094p4013144.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: displaying search results in map
On 11 October 2012 23:16, Harish Rawat harish.s.ra...@gmail.com wrote: Hi I am working on a project to display the search results on the map. The idea is to divide the map into N*N grids and show counts for each grid and allow users to view top result on each grid any suggestions on how best to accomplish this with solr? Your description is not very clear. What search results are you seeking to display on what kind of a map? Are you talking about a geographical map, or something like a 3D histogram (which is what you N x N grid seems to refer to)? Please clarify. In either case, it is quite unlikely that Solr will handle the presentation for you. Solr is a search engine that will return you desired search results. What to do with the search results is an issue for a presentation layer. Regards, Gora
Re: displaying search results in map
On 11 October 2012 23:55, Harish Rawat harish.s.ra...@gmail.com wrote: Sorry for not being clear. Here are more details 1.) The results are displayed in geographical map 2.) Each document has latitude, longitude field and other fields that can be searched on 3.) The search will be done for all documents within a lat/long range. 4.) The lat/lon range is divided into N*N (lets say 64 grids) and for each grid we want following a) no. of documents in that grid b.) top K documents in that grid c.) avg of latitude and longitude value for all results in that grid In lucene I can implement my own custom collector and do all the calculations listed in #4. I wanted to understand the best way to implement (or use existing if any :) this logic in solr [...] Hmm, I am not that familiar with Lucene, so maybe someone else will chip in with advice. However, what you describe in point 4 seems to be a clustering strategy for geographical points. Typically, we use pre-defined strategies from OpenLayers ( http://openlayers.org ), or custom strategies. Regards, Gora
Re: displaying search results in map
Did you look at http://stackoverflow.com/questions/11319465/geoclusters-in-solr? This sounds similar to what you're asking for based on geohashes of the points of interest. On Thu, Oct 11, 2012 at 2:25 PM, Harish Rawat harish.s.ra...@gmail.com wrote: Sorry for not being clear. Here are more details 1.) The results are displayed in geographical map 2.) Each document has latitude, longitude field and other fields that can be searched on 3.) The search will be done for all documents within a lat/long range. 4.) The lat/lon range is divided into N*N (lets say 64 grids) and for each grid we want following a) no. of documents in that grid b.) top K documents in that grid c.) avg of latitude and longitude value for all results in that grid In lucene I can implement my own custom collector and do all the calculations listed in #4. I wanted to understand the best way to implement (or use existing if any :) this logic in solr Regards Harish On Thu, Oct 11, 2012 at 11:08 AM, Gora Mohanty g...@mimirtech.com wrote: On 11 October 2012 23:16, Harish Rawat harish.s.ra...@gmail.com wrote: Hi I am working on a project to display the search results on the map. The idea is to divide the map into N*N grids and show counts for each grid and allow users to view top result on each grid any suggestions on how best to accomplish this with solr? Your description is not very clear. What search results are you seeking to display on what kind of a map? Are you talking about a geographical map, or something like a 3D histogram (which is what you N x N grid seems to refer to)? Please clarify. In either case, it is quite unlikely that Solr will handle the presentation for you. Solr is a search engine that will return you desired search results. What to do with the search results is an issue for a presentation layer. Regards, Gora
SolrJ, optimize, maxSegments
Currently my indexing code calls optimize. Once a night, one of my six large shards is optimized, so each one only gets optimized once every six days. Here is the SolrJ call, server is an instance of HttpSolrServer: UpdateResponse ur = server.optimize(); I only do this because I want deleted documents regularly removed from the index. Whatever speed gains I might see from getting down to one segment are just an added bonus. After watching all the discussion on the -dev list regarding what to do in Solr due to the Lucene forceMerge rename, I am considering changing this to something like the following: UpdateResponse ur = server.optimize(true, true, 20); What happens with this if I am already below 20 segments? Will it still expunge all of my (typically several thousand) deleted documents? I am hoping that what it will do is rebuild any segment that contains deleted documents and leave the other segments alone. Possibly irrelevant info: I'm using the following MP config: mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnce35/int int name=segmentsPerTier35/int int name=maxMergeAtOnceExplicit105/int /mergePolicy Thanks, Shawn
Issue using SpatialRecursivePrefixTreeFieldType
Hi David, I'm defining my field as such: fieldType name=rectangle class=solr.SpatialRecursivePrefixTreeFieldType geo=false distErrPct=0 maxDetailDist=1 worldBounds=0 0 10916173 20/ When I create a large rectangle, say 10 10 500 11, Solr seems to freeze for quite some time. I haven't looked at your code, but I can imagine the algorithm basically fills in some sort of indexing matrix, and that's what's taking so long for large rectangles? Is there a limit to how big the worldBounds should be?Thanks!Eric.
Open Source Social (London) - 23rd Oct
Hi all, The next Open Source Search Social is on the 23rd Oct at The Plough, in Bloomsbury. We usually get a good mix of regulars and newcomers, and a good mix of backgrounds and experience levels, so please come along if you can. As usual the format is completely open so we'll be talking about whatever is most interesting at any one particular moment... ooo, a shiny thing... Details and RSVP options on the Meetup page: http://www.meetup.com/london-search-social/events/86580442/ Hope to see you there, Richard @richmarr
Re: Custom html headers/footers to solr admin console
Uhhmmm, why do you want to do this? The admin screen is pretty much purely intended for developers/in-house use. Mostly I just want to be sure you aren't thinking about letting users, say, see this page. Consider /update?stream.body=deletequery*:*/querydelete/ Best Erick On Thu, Oct 11, 2012 at 4:57 PM, Billy Newman newman...@gmail.com wrote: Hello all, I was just poking around in my solr distribution and I noticed some files: admin-extra.html admin-extra.menu-top.html admin-extra.menu-bottom.html I was really hoping that that was html inserted into the solr admin page and I could modify the: admin-extra.menu-top.html admin-extra.menu-bottom.html files to make a header/footer. I un-commented out admin-extra.html and can now see that html in the admin extras section for my core so not exactly what I was looking for. Are the top/bottom html files used and are they really inserted at the top and bottom of the page? Any way to get some headers in the static admin page? I would usually just modify the html, but in this case there might already be something I can use. Thanks, Billy
NewSearcher old cache
Hello Everyone, I was configuring a Solr installation and had a few queries about NewSearcher. As I understand a NewSearcher event will be triggered if there is an already existing registered searcher. Q1) As soon as a new searcher is opened, the caches begin populating from the older caches. What happens if the NewSearcher event has queries defined in them? does these queries ignore the old cache altogether and load only results of the queries defined in the listener event? Or do these get added after the new caches have been warmed by old caches? Q2) I am running edismax queries on the Solr Server. Can I specify these queries in NewSearcher and FirstSearcher also? Or are the queries supposed to be simple queries? Thanks. --Shreejay -- View this message in context: http://lucene.472066.n3.nabble.com/NewSearcher-old-cache-tp4013225.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: NewSearcher old cache
Q1) As soon as a new searcher is opened, the caches begin populating from the older caches. What happens if the NewSearcher event has queries defined in them? does these queries ignore the old cache altogether and load only results of the queries defined in the listener event? Or do these get added after the new caches have been warmed by old caches? Those queries are going to be executed after the cache auto-warm and before the searcher is registered. Q2) I am running edismax queries on the Solr Server. Can I specify these queries in NewSearcher and FirstSearcher also? Or are the queries supposed to be simple queries? You can use all the parameters you want here. You can use your custom request handler configuration if you want. With these queries you should try to warm those things that are not warmed in the caches autowarm process, for example a good idea here is to facet in all the fields where your real users will be faceting. The same thing with sorting. Be careful with warming time, in relation to your commit frequency (or open searcher frequency really). If you are going to use NRT, you may not want to warm caches. Also, the whole idea of warming caches is to avoid making your users pay the penalty of searching with empty caches resulting in slow queries, make sure the resources you spend warming are not causing worse query times. Tomás Thanks. --Shreejay -- View this message in context: http://lucene.472066.n3.nabble.com/NewSearcher-old-cache-tp4013225.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrJ, optimize, maxSegments
On 10/11/2012 2:02 PM, Shawn Heisey wrote: UpdateResponse ur = server.optimize(true, true, 20); What happens with this if I am already below 20 segments? Will it still expunge all of my (typically several thousand) deleted documents? I am hoping that what it will do is rebuild any segment that contains deleted documents and leave the other segments alone. I have just tried this on a test system with 11 segments via curl, not SolrJ. I don't expect that it would be any different with SolrJ, though. curl 'http://localhost:8981/solr/s0live/update?optimize=truemaxSegments=20expungeDeletes=truewaitFlush=true' It didn't work. When I changed maxSegments to 10, it did reduce the index from 11 segments to 10, but there are still deleted documents in the index -- maxDoc numDocs on the statistics screen. numDocs : 12782762 maxDoc : 12788156 I don't think expungeDeletes is actually a valid parameter for optimize, but I included it anyway. I also tried doing a commit with expungeDeletes=true and that didn't work either. Is this a bug? The server is 3.5.0. Because I haven't finished getting my configuration worked out, I don't have the ability right now to try this on 4.0.0. Thanks, Shawn
Re: Custom html headers/footers to solr admin console
I take that answer as a no ;) And no admin only page. But you can query from that page. And the data returned could be sensitive. As such our company requires us to flag in a header/footer that the contents of the page could could be sensitive. So even though it will just be for admin access I still need those headers. Sound like I am gonna have to dive into the HTML and make custom changes. Thanks for the quick response. Billy Sent from my iPhone On Oct 11, 2012, at 3:26 PM, Erick Erickson erickerick...@gmail.com wrote: Uhhmmm, why do you want to do this? The admin screen is pretty much purely intended for developers/in-house use. Mostly I just want to be sure you aren't thinking about letting users, say, see this page. Consider /update?stream.body=deletequery*:*/querydelete/ Best Erick On Thu, Oct 11, 2012 at 4:57 PM, Billy Newman newman...@gmail.com wrote: Hello all, I was just poking around in my solr distribution and I noticed some files: admin-extra.html admin-extra.menu-top.html admin-extra.menu-bottom.html I was really hoping that that was html inserted into the solr admin page and I could modify the: admin-extra.menu-top.html admin-extra.menu-bottom.html files to make a header/footer. I un-commented out admin-extra.html and can now see that html in the admin extras section for my core so not exactly what I was looking for. Are the top/bottom html files used and are they really inserted at the top and bottom of the page? Any way to get some headers in the static admin page? I would usually just modify the html, but in this case there might already be something I can use. Thanks, Billy
Any filter to map mutiple tokens into one ?
I am looking for a way to fold a particular sequence of tokens into one token. Concretely, I'd like to detect a three-token sequence of *, : and *, and replace it with a token of the text *:*. I tried SynonymFIlter but it seems it can only deal with a single input token. * : * = *:* seems to be interpreted as one input token of 5 characters *, space, :, space and *. I'm using Solr 3.5. Background: My tokenizer separate the three character sequence *:* into 3 tokens of one character each. The edismax parser, when given the query *:*, i.e. find every doc, seems to pass the entire string *:* to the query analyzer (I suspect a bug.), and feed the tokenized result to DisjunctionMaxQuery object, according to this debug output: lst name=debug str name=rawquerystring*:*/str str name=querystring*:*/str str name=parsedquery+MatchAllDocsQuery(*:*) DisjunctionMaxQuery((body:* : *~100^0.5 | title:* : *~100^1.2)~0.01)/str str name=parsedquery_toString+*:* (body:* : *~100^0.5 | title:* : *~100^1.2)~0.01/str Notice that there is a space between * and : in DisjunctionMaxQuery((body:* : * ) Probably because of this, the hit score is as low as 0.109, while it is 1.000 if an analyzer that doesn't break *:* is used. So I'd like to stitch together *, :, * into *:* again to make DisjunctionMaxQuery happy. Thanks. T. Kuro Kurosaka
Re: Any filter to map mutiple tokens into one ?
The : which normally separates a field name from a term (or quoted string or parenthesized sub-query) is parsed by the query parser before analysis gets called, and *:* is recognized before analysis as well. So, any attempt to recreate *:* in analysis will be too late to affect query parsing and other pre-analysis processing. But, what is it you are really trying to do? What's the real problem? (This sounds like a proverbial XY Problem.) -- Jack Krupansky -Original Message- From: T. Kuro Kurosaka Sent: Thursday, October 11, 2012 7:35 PM To: solr-user@lucene.apache.org Subject: Any filter to map mutiple tokens into one ? I am looking for a way to fold a particular sequence of tokens into one token. Concretely, I'd like to detect a three-token sequence of *, : and *, and replace it with a token of the text *:*. I tried SynonymFIlter but it seems it can only deal with a single input token. * : * = *:* seems to be interpreted as one input token of 5 characters *, space, :, space and *. I'm using Solr 3.5. Background: My tokenizer separate the three character sequence *:* into 3 tokens of one character each. The edismax parser, when given the query *:*, i.e. find every doc, seems to pass the entire string *:* to the query analyzer (I suspect a bug.), and feed the tokenized result to DisjunctionMaxQuery object, according to this debug output: lst name=debug str name=rawquerystring*:*/str str name=querystring*:*/str str name=parsedquery+MatchAllDocsQuery(*:*) DisjunctionMaxQuery((body:* : *~100^0.5 | title:* : *~100^1.2)~0.01)/str str name=parsedquery_toString+*:* (body:* : *~100^0.5 | title:* : *~100^1.2)~0.01/str Notice that there is a space between * and : in DisjunctionMaxQuery((body:* : * ) Probably because of this, the hit score is as low as 0.109, while it is 1.000 if an analyzer that doesn't break *:* is used. So I'd like to stitch together *, :, * into *:* again to make DisjunctionMaxQuery happy. Thanks. T. Kuro Kurosaka
Re: Does Zookeeper notify slave to replication about record update in master
Hi, I could be mistaken, but there is no pull-replication in Solr 4 unless one is trying to catch up using traditional Java replicatoin that pulls from one node to the other. I believe replication is push style, immediate, and replicas don't talk to ZK for that. Master and slaves are also a thing of the past and now we have leaders and replicas. See http://wiki.apache.org/solr/SolrCloud Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Thu, Oct 11, 2012 at 11:10 PM, Zeng Lames lezhi.z...@gmail.com wrote: Dear All, We are POC for Solr 4.0 with Zookeeper, wanna to know that whether Zookeeper will notify slave to pull when master get record update? if no, does it mean there is a time gap of data out-of-sync between master and slave node. thanks a lot! Best Wishes!