Re: solr highlight problem
What do you mean by missing vicky? Did you mean a second fragment? Sent from my mobile device, so please excuse typos and brevity. Maurizio Cucchiara Il giorno 12/ott/2012 08.42, "rayvicky" ha scritto: > sb.append("title:test vicky"); > SolrQuery query = new SolrQuery(); > query.setQuery(sb.toString()); > > query.setHighlight(true); > query.addHighlightField("title"); > query.setHighlightSimplePre(""); > query.setHighlightSimplePost(""); > query.setHighlightSnippets(2); > query.setHighlightFragsize(500); > > title: my name test is vicky > result:my name test is vicky > > why missing vicky? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/solr-highlight-problem-tp4013273.html > Sent from the Solr - User mailing list archive at Nabble.com. >
solr highlight problem
sb.append("title:test vicky"); SolrQuery query = new SolrQuery(); query.setQuery(sb.toString()); query.setHighlight(true); query.addHighlightField("title"); query.setHighlightSimplePre(""); query.setHighlightSimplePost(""); query.setHighlightSnippets(2); query.setHighlightFragsize(500); title: my name test is vicky result:my name test is vicky why missing vicky? -- View this message in context: http://lucene.472066.n3.nabble.com/solr-highlight-problem-tp4013273.html Sent from the Solr - User mailing list archive at Nabble.com.
Search in specific website
Hi, I use nutch to crawl my website and index to solr. However, how can I search for piece of content in a specific website? I use multiple URL's Regards,
It's there any way to specify config name for core in solr.xml?
Hi, all I have two collections, and two machines. So, my deployment is like |machine a |machine b | |core a1 | core a2 | core b1 | core b2| core a1 is for collection 1 shard1, core a2 is for collection 1 shard2. config for collection is config 1. core b1 is for collection 2 shard1, core b2 is for collection 2 shard2. config for collection if config 2. It's there any way to specify core config in solr.xml to start up two shard in every machine whit correct config name? -- from Jun Wang
Re: Does Zookeeper notify slave to replication about record update in master
Hi, I could be mistaken, but there is no pull-replication in Solr 4 unless one is trying to catch up using traditional Java replicatoin that pulls from one node to the other. I believe replication is push style, immediate, and replicas don't talk to ZK for that. Master and slaves are also a thing of the past and now we have leaders and replicas. See http://wiki.apache.org/solr/SolrCloud Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Thu, Oct 11, 2012 at 11:10 PM, Zeng Lames wrote: > Dear All, > > We are POC for Solr 4.0 with Zookeeper, wanna to know that whether > Zookeeper will notify slave to pull when master get record update? if no, > does it mean there is a time gap of data out-of-sync between master and > slave node. > > thanks a lot! > > Best Wishes!
Re: Any filter to map mutiple tokens into one ?
The ":" which normally separates a field name from a term (or quoted string or parenthesized sub-query) is "parsed" by the query parser before analysis gets called, and "*:*" is recognized before analysis as well. So, any attempt to recreate "*:*" in analysis will be too late to affect query parsing and other pre-analysis processing. But, what is it you are really trying to do? What's the real problem? (This sounds like a proverbial "XY Problem".) -- Jack Krupansky -Original Message- From: T. Kuro Kurosaka Sent: Thursday, October 11, 2012 7:35 PM To: solr-user@lucene.apache.org Subject: Any filter to map mutiple tokens into one ? I am looking for a way to fold a particular sequence of tokens into one token. Concretely, I'd like to detect a three-token sequence of "*", ":" and "*", and replace it with a token of the text "*:*". I tried SynonymFIlter but it seems it can only deal with a single input token. "* : * => *:*" seems to be interpreted as one input token of 5 characters "*", space, ":", space and "*". I'm using Solr 3.5. Background: My tokenizer separate the three character sequence "*:*" into 3 tokens of one character each. The edismax parser, when given the query "*:*", i.e. find every doc, seems to pass the entire string "*:*" to the query analyzer (I suspect a bug.), and feed the tokenized result to DisjunctionMaxQuery object, according to this debug output: *:* *:* +MatchAllDocsQuery(*:*) DisjunctionMaxQuery((body:"* : *"~100^0.5 | title:"* : *"~100^1.2)~0.01) +*:* (body:"* : *"~100^0.5 | title:"* : *"~100^1.2)~0.01 Notice that there is a space between * and : in DisjunctionMaxQuery((body:"* : *" ) Probably because of this, the hit score is as low as 0.109, while it is 1.000 if an analyzer that doesn't break "*:*" is used. So I'd like to stitch together "*", ":", "*" into "*:*" again to make DisjunctionMaxQuery happy. Thanks. T. "Kuro" Kurosaka
Any filter to map mutiple tokens into one ?
I am looking for a way to fold a particular sequence of tokens into one token. Concretely, I'd like to detect a three-token sequence of "*", ":" and "*", and replace it with a token of the text "*:*". I tried SynonymFIlter but it seems it can only deal with a single input token. "* : * => *:*" seems to be interpreted as one input token of 5 characters "*", space, ":", space and "*". I'm using Solr 3.5. Background: My tokenizer separate the three character sequence "*:*" into 3 tokens of one character each. The edismax parser, when given the query "*:*", i.e. find every doc, seems to pass the entire string "*:*" to the query analyzer (I suspect a bug.), and feed the tokenized result to DisjunctionMaxQuery object, according to this debug output: *:* *:* +MatchAllDocsQuery(*:*) DisjunctionMaxQuery((body:"* : *"~100^0.5 | title:"* : *"~100^1.2)~0.01) +*:* (body:"* : *"~100^0.5 | title:"* : *"~100^1.2)~0.01 Notice that there is a space between * and : in DisjunctionMaxQuery((body:"* : *" ) Probably because of this, the hit score is as low as 0.109, while it is 1.000 if an analyzer that doesn't break "*:*" is used. So I'd like to stitch together "*", ":", "*" into "*:*" again to make DisjunctionMaxQuery happy. Thanks. T. "Kuro" Kurosaka
Re: Custom html headers/footers to solr admin console
I take that answer as a no ;) And no admin only page. But you can query from that page. And the data returned could be sensitive. As such our company requires us to flag in a header/footer that the contents of the page could could be sensitive. So even though it will just be for admin access I still need those headers. Sound like I am gonna have to dive into the HTML and make custom changes. Thanks for the quick response. Billy Sent from my iPhone On Oct 11, 2012, at 3:26 PM, Erick Erickson wrote: > Uhhmmm, why do you want to do this? The admin screen is pretty > much purely intended for developers/in-house use. Mostly I just > want to be sure you aren't thinking about letting users, say, see > this page. Consider > /update?stream.body=*:* > > Best > Erick > > On Thu, Oct 11, 2012 at 4:57 PM, Billy Newman wrote: >> Hello all, >> >> >> I was just poking around in my solr distribution and I noticed some files: >> admin-extra.html >> admin-extra.menu-top.html >> admin-extra.menu-bottom.html >> >> >> I was really hoping that that was html inserted into the solr admin >> page and I could modify the: >> admin-extra.menu-top.html >> admin-extra.menu-bottom.html >> >> files to make a header/footer. >> >> I un-commented out admin-extra.html and can now see that html in the >> admin extras section for my core so not exactly what I was looking >> for. >> >> Are the top/bottom html files used and are they really inserted at the >> top and bottom of the page? >> >> Any way to get some headers in the static admin page? I would usually >> just modify the html, but in this case there might already be >> something I can use. >> >> Thanks, >> Billy
Re: SolrJ, optimize, maxSegments
On 10/11/2012 2:02 PM, Shawn Heisey wrote: UpdateResponse ur = server.optimize(true, true, 20); What happens with this if I am already below 20 segments? Will it still expunge all of my (typically several thousand) deleted documents? I am hoping that what it will do is rebuild any segment that contains deleted documents and leave the other segments alone. I have just tried this on a test system with 11 segments via curl, not SolrJ. I don't expect that it would be any different with SolrJ, though. curl 'http://localhost:8981/solr/s0live/update?optimize=true&maxSegments=20&expungeDeletes=true&waitFlush=true' It didn't work. When I changed maxSegments to 10, it did reduce the index from 11 segments to 10, but there are still deleted documents in the index -- maxDoc > numDocs on the statistics screen. numDocs : 12782762 maxDoc : 12788156 I don't think expungeDeletes is actually a valid parameter for optimize, but I included it anyway. I also tried doing a commit with expungeDeletes=true and that didn't work either. Is this a bug? The server is 3.5.0. Because I haven't finished getting my configuration worked out, I don't have the ability right now to try this on 4.0.0. Thanks, Shawn
Re: NewSearcher old cache
> > Q1) > As soon as a new searcher is opened, the caches begin populating from the > older caches. What happens if the NewSearcher event has queries defined in > them? does these queries ignore the old cache altogether and load only > results of the queries defined in the listener event? Or do these get added > after the new caches have been warmed by old caches? > Those queries are going to be executed after the cache auto-warm and before the searcher is registered. > > Q2) > I am running edismax queries on the Solr Server. Can I specify these > queries > in NewSearcher and FirstSearcher also? Or are the queries supposed to be > simple queries? > You can use all the parameters you want here. You can use your custom request handler configuration if you want. With these queries you should try to warm those things that are not warmed in the caches "autowarm" process, for example a good idea here is to facet in all the fields where your real users will be faceting. The same thing with sorting. Be careful with warming time, in relation to your commit frequency (or open searcher frequency really). If you are going to use NRT, you may not want to warm caches. Also, the whole idea of warming caches is to avoid making your users pay the penalty of searching with empty caches resulting in slow queries, make sure the resources you spend warming are not causing worse query times. Tomás > Thanks. > > --Shreejay > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/NewSearcher-old-cache-tp4013225.html > Sent from the Solr - User mailing list archive at Nabble.com. >
NewSearcher old cache
Hello Everyone, I was configuring a Solr installation and had a few queries about NewSearcher. As I understand a NewSearcher event will be triggered if there is an already existing registered searcher. Q1) As soon as a new searcher is opened, the caches begin populating from the older caches. What happens if the NewSearcher event has queries defined in them? does these queries ignore the old cache altogether and load only results of the queries defined in the listener event? Or do these get added after the new caches have been warmed by old caches? Q2) I am running edismax queries on the Solr Server. Can I specify these queries in NewSearcher and FirstSearcher also? Or are the queries supposed to be simple queries? Thanks. --Shreejay -- View this message in context: http://lucene.472066.n3.nabble.com/NewSearcher-old-cache-tp4013225.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Custom html headers/footers to solr admin console
Uhhmmm, why do you want to do this? The admin screen is pretty much purely intended for developers/in-house use. Mostly I just want to be sure you aren't thinking about letting users, say, see this page. Consider /update?stream.body=*:* Best Erick On Thu, Oct 11, 2012 at 4:57 PM, Billy Newman wrote: > Hello all, > > > I was just poking around in my solr distribution and I noticed some files: > admin-extra.html > admin-extra.menu-top.html > admin-extra.menu-bottom.html > > > I was really hoping that that was html inserted into the solr admin > page and I could modify the: > admin-extra.menu-top.html > admin-extra.menu-bottom.html > > files to make a header/footer. > > I un-commented out admin-extra.html and can now see that html in the > admin extras section for my core so not exactly what I was looking > for. > > Are the top/bottom html files used and are they really inserted at the > top and bottom of the page? > > Any way to get some headers in the static admin page? I would usually > just modify the html, but in this case there might already be > something I can use. > > Thanks, > Billy
Open Source Social (London) - 23rd Oct
Hi all, The next Open Source Search Social is on the 23rd Oct at The Plough, in Bloomsbury. We usually get a good mix of regulars and newcomers, and a good mix of backgrounds and experience levels, so please come along if you can. As usual the format is completely open so we'll be talking about whatever is most interesting at any one particular moment... ooo, a shiny thing... Details and RSVP options on the Meetup page: http://www.meetup.com/london-search-social/events/86580442/ Hope to see you there, Richard @richmarr
Issue using SpatialRecursivePrefixTreeFieldType
Hi David, I'm defining my field as such: When I create a large rectangle, say "10 10 500 11", Solr seems to freeze for quite some time. I haven't looked at your code, but I can imagine the algorithm basically fills in some sort of indexing matrix, and that's what's taking so long for large rectangles? Is there a limit to how big the worldBounds should be?Thanks!Eric.
SolrJ, optimize, maxSegments
Currently my indexing code calls optimize. Once a night, one of my six large shards is optimized, so each one only gets optimized once every six days. Here is the SolrJ call, server is an instance of HttpSolrServer: UpdateResponse ur = server.optimize(); I only do this because I want deleted documents regularly removed from the index. Whatever speed gains I might see from getting down to one segment are just an added bonus. After watching all the discussion on the -dev list regarding what to do in Solr due to the Lucene forceMerge rename, I am considering changing this to something like the following: UpdateResponse ur = server.optimize(true, true, 20); What happens with this if I am already below 20 segments? Will it still expunge all of my (typically several thousand) deleted documents? I am hoping that what it will do is rebuild any segment that contains deleted documents and leave the other segments alone. Possibly irrelevant info: I'm using the following MP config: 35 35 105 Thanks, Shawn
Re: displaying search results in map
Did you look at http://stackoverflow.com/questions/11319465/geoclusters-in-solr? This sounds similar to what you're asking for based on geohashes of the points of interest. On Thu, Oct 11, 2012 at 2:25 PM, Harish Rawat wrote: > Sorry for not being clear. Here are more details > > 1.) The results are displayed in geographical map > 2.) Each document has latitude, longitude field and other fields that can > be searched on > 3.) The search will be done for all documents within a lat/long range. > 4.) The lat/lon range is divided into N*N (lets say 64 grids) and for each > grid we want following > a) no. of documents in that grid > b.) top K documents in that grid > c.) avg of latitude and longitude value for all results in that grid > > In lucene I can implement my own custom collector and do all the > calculations listed in #4. I wanted to understand the best way to implement > (or use existing if any :) this logic in solr > > Regards > Harish > > > > On Thu, Oct 11, 2012 at 11:08 AM, Gora Mohanty wrote: > >> On 11 October 2012 23:16, Harish Rawat wrote: >> >> > Hi >> > >> > I am working on a project to display the search results on the map. The >> > idea is to divide the map into N*N grids and show counts for each grid >> and >> > allow users to view top result on each grid >> > >> > any suggestions on how best to accomplish this with solr? >> > >> >> Your description is not very clear. What search results >> are you seeking to display on what kind of a map? Are you >> talking about a geographical map, or something like a 3D >> histogram (which is what you N x N grid seems to refer to)? >> Please clarify. >> >> In either case, it is quite unlikely that Solr will handle the >> presentation for you. Solr is a search engine that will return >> you desired search results. What to do with the search results >> is an issue for a presentation layer. >> >> Regards, >> Gora >>
Re: displaying search results in map
On 11 October 2012 23:55, Harish Rawat wrote: > Sorry for not being clear. Here are more details > > 1.) The results are displayed in geographical map > 2.) Each document has latitude, longitude field and other fields that can > be searched on > 3.) The search will be done for all documents within a lat/long range. > 4.) The lat/lon range is divided into N*N (lets say 64 grids) and for each > grid we want following > a) no. of documents in that grid > b.) top K documents in that grid > c.) avg of latitude and longitude value for all results in that grid > > In lucene I can implement my own custom collector and do all the > calculations listed in #4. I wanted to understand the best way to implement > (or use existing if any :) this logic in solr > [...] Hmm, I am not that familiar with Lucene, so maybe someone else will chip in with advice. However, what you describe in point 4 seems to be a clustering strategy for geographical points. Typically, we use pre-defined strategies from OpenLayers ( http://openlayers.org ), or custom strategies. Regards, Gora
Re: displaying search results in map
On 11 October 2012 23:16, Harish Rawat wrote: > Hi > > I am working on a project to display the search results on the map. The > idea is to divide the map into N*N grids and show counts for each grid and > allow users to view top result on each grid > > any suggestions on how best to accomplish this with solr? > Your description is not very clear. What search results are you seeking to display on what kind of a map? Are you talking about a geographical map, or something like a 3D histogram (which is what you N x N grid seems to refer to)? Please clarify. In either case, it is quite unlikely that Solr will handle the presentation for you. Solr is a search engine that will return you desired search results. What to do with the search results is an issue for a presentation layer. Regards, Gora
Re: SLOR And OpenNlp integration
in fact i dowload the source of solr using svn client then, i execute the path of the opennlp then i do ant compile -lib /usr/share/ivy i got the error [javac] public synchronized Span[] splitSentences(String line) { [javac] ^ [javac] /home/pfe/Téléchargements/dev/trunk/solr/contrib/opennlp/src/java/org/apache/solr/analysis/opennlp/NLPTokenizerOp.java:36: cannot find symbol [javac] symbol : class Tokenizer [javac] location: class org.apache.solr.analysis.opennlp.NLPTokenizerOp [javac] private final Tokenizer tokenizer; [javac] ^ [javac] /home/pfe/Téléchargements/dev/trunk/solr/contrib/opennlp/src/java/org/apache/solr/analysis/opennlp/NLPTokenizerOp.java:38: cannot find symbol [javac] symbol : class TokenizerModel [javac] location: class org.apache.solr.analysis.opennlp.NLPTokenizerOp [javac] public NLPTokenizerOp(TokenizerModel model) { [javac] ^ [javac] /home/pfe/Téléchargements/dev/trunk/solr/contrib/opennlp/src/java/org/apache/solr/analysis/opennlp/NLPTokenizerOp.java:46: cannot find symbol [javac] symbol : class Span [javac] location: class org.apache.solr.analysis.opennlp.NLPTokenizerOp [javac] public synchronized Span[] getTerms(String sentence) { [javac] ^ [javac] /home/pfe/Téléchargements/dev/trunk/solr/contrib/opennlp/src/java/org/apache/solr/analysis/OpenNLPTokenizerFactory.java:26: package opennlp.tools.util does not exist [javac] import opennlp.tools.util.InvalidFormatException; [javac] ^ [javac] /home/pfe/Téléchargements/dev/trunk/solr/contrib/opennlp/src/java/org/apache/solr/analysis/opennlp/OpenNLPOpsFactory.java:9: package opennlp.tools.chunker does not exist [javac] import opennlp.tools.chunker.ChunkerModel; [javac] ^ [javac] 100 errors BUILD FAILED /home/pfe/Téléchargements/dev/trunk/build.xml:112: The following error occurred while executing this line: /home/pfe/Téléchargements/dev/trunk/solr/common-build.xml:419: The following error occurred while executing this line: /home/pfe/Téléchargements/dev/trunk/solr/common-build.xml:410: The following error occurred while executing this line: /home/pfe/Téléchargements/dev/trunk/lucene/common-build.xml:418: The following error occurred while executing this line: /home/pfe/Téléchargements/dev/trunk/lucene/common-build.xml:1482: Compile failed; see the compiler error output for details. I want to apply a sematique analyses for the document thet will be indexed using solr .So solr will index and then analyse content using opennlp instead of tika. -- View this message in context: http://lucene.472066.n3.nabble.com/SLOR-And-OpenNlp-integration-tp4013094p4013144.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Unique terms without faceting
Hi, Are you lookig for http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/contrib-misc/org/apache/lucene/misc/HighFreqTerms.html ? Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Thu, Oct 11, 2012 at 4:40 AM, Toke Eskildsen wrote: > On Wed, 2012-10-10 at 17:45 +0200, Phil Hoy wrote: >> I know that you can use a facet query to get the unique terms for a >> field taking account of any q or fq parameters but for our use case the >> counts are not needed. So is there a more efficient way of finding >> just unique terms for a field? > > Short answer: Not at this moment. > > > If the amount of unique terms is large (millions), a fair amount of > temporary memory could be spared by just keeping track of matched terms > with a boolean vs. the full int for standard faceting. Reduced memory > requirements means less garbage collection and faster processing due to > better cache utilization. So yes, there is a more efficient way. > > Guessing from your other posts, you are building a social network and > need to query on surnames and similar large fields. Question is of > course how large the payoff will be and if it is worth the investment in > development hours. I would suggest hacking the current faceting code to > use OpenBitSet instead of int[] and doing performance tests on that. > PerSegmentSingleValuedFaceting.SegFacet and UnivertedField.getCounts > seems to be the right places to look in Solr 4. > > Regards, > Toke Eskildsen, State and University Library, Denmark >
Re: SLOR And OpenNlp integration
(12/10/11 20:40), ahmed wrote: Hi, Thanks for reply i fact i tried this tutorial but when i execute 'ant compile' i have probleme taht class not found despite the class a re their.I dont know wats the probleme I think if you attach the error you got helps us to understand your problem. Also before then what do you want to do with Solr and OpenNLP integration? koji -- http://soleami.com/blog/starting-lab-work.html
Re: unsuscribe
Please follow the instructions here: https://wiki.apache.org/solr/Unsubscribing%20from%20mailing%20lists On Wed, Oct 10, 2012 at 6:03 PM, zMk Bnc wrote: > > unsuscribe
Re: anyone have any clues about this exception
Well, you'll actually be able to optimize, it's just called forceMerge. But the point is that optimize seems like something that _of course_ you want to do, when in reality it's not something you usually should do at all. Optimize does two things: 1> merges all the segments into one (usually) 2> removes all of the info associated with deleted documents. Of the two, point <2> is the one that really counts and that's done whenever segment merging is done anyway. So unless you have a very large number of deletes (or updates of the same document), optimize buys you very little. You can tell this by the difference between numDocs and maxDoc in the admin page. So what happens if you just don't bother to optimize? Take a look at merge policy to help control how merging happens perhaps as an alternative. Best Erick On Wed, Oct 10, 2012 at 3:04 PM, Petersen, Robert wrote: > You could be right. Going back in the logs, I noticed it used to happen less > frequently and always towards the end of an optimize operation. It is > probably my indexer timing out waiting for updates to occur during optimizes. > The errors grew recently due to my upping the indexer threadcount to 22 > threads, so there's a lot more timeouts occurring now. Also our index has > grown to double the old size so the optimize operation has started taking a > lot longer, also contributing to what I'm seeing. I have just changed my > optimize frequency from three times a day to one time a day after reading the > following: > > Here they are talking about completely deprecating the optimize command in > the next version of solr… > https://issues.apache.org/jira/browse/SOLR-3141c > > > -Original Message- > From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] > Sent: Wednesday, October 10, 2012 11:10 AM > To: solr-user@lucene.apache.org > Subject: Re: anyone have any clues about this exception > > Something timed out, the other end closed the connection. This end tried to > write to closed pipe and died, something tried to catch that exception and > write its own and died even worse? Just making it up really, but sounds good > (plus a 3-year Java tech-support hunch). > > If it happens often enough, see if you can run WireShark on that machine's > network interface and catch the whole network conversation in action. Often, > there is enough clues there by looking at tcp packets and/or stuff > transmitted. WireShark is a power-tool, so takes a little while the first > time, but the learning will pay for itself over and over again. > > Regards, >Alex. > > Personal blog: http://blog.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all at once. > Lately, it doesn't seem to be working. (Anonymous - via GTD > book) > > > On Wed, Oct 10, 2012 at 11:31 PM, Petersen, Robert wrote: >> Tomcat localhost log (not the catalina log) for my solr 3.6.1 (master) >> instance contains lots of these exceptions but solr itself seems to be doing >> fine... any ideas? I'm not seeing these exceptions being logged on my slave >> servers btw, just the master where we do our indexing only. >> >> >> >> Oct 9, 2012 5:34:11 PM org.apache.catalina.core.StandardWrapperValve >> invoke >> SEVERE: Servlet.service() for servlet default threw exception >> java.lang.IllegalStateException >> at >> org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:407) >> at >> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:389) >> at >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:291) >> at >> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) >> at >> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) >> at >> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) >> at >> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) >> at >> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) >> at >> com.googlecode.psiprobe.Tomcat60AgentValve.invoke(Tomcat60AgentValve.java:30) >> at >> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) >> at >> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) >> at >> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) >> at >> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) >> at >> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) >> at >> org.apache.tomcat.util.net.JIoEndpoin
Re: Solr - Make Exact Search on Field with Fuzzy Query
Right, and going the other way (storing and highlighting on the non-stemmed field) would be unsatisfactory due because you'd get a hit on "hospital" in the stemmed field, but wouldn't highlight it if you searched on "hospitality". I really don't see a good solution here. Highlighting seems to be one of those things that's easy in concept but has a zillion ways to go wrong. I guess I'd really just go with the copyField approach unless you can prove that it's really a problem. Perhaps lost in my first e-mail is that storing the field twice doesn't really affect search speed or _search_ requirements at all. Take a look here: http://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/fileformats.html#file-names note that the *.fdt and *.fdx files are where the original raw copy goes (i.e. where data gets written when you specify stored="true") and they are completely independent of the files that contain the searchable data. So unless you're disk-space constrained, the additional storage really doesn't cost you much. Best Erick On Thu, Oct 11, 2012 at 2:31 AM, meghana wrote: > Hi Erickson, > > Thanks for your valuable reply. > > Actually we had tried with just storing one field and highlighting on that > field all the time , whether we search on it or not. > > It sometimes occurs issue , like if i search with the term : 'hospitality' . > and I use field for highlighting , which having stemming applied. it returns > me highlights with 'hospital' , 'hospitality'. whether it should return > highlighting only on 'hospitality' as I am doing exact term search, can you > suggest anything on this?? If we can eliminate this issue while highlighting > on original field (having applied stemming on it). > > The other solutions are sounds really good, but as you said they are hard to > implement and we at this point , wanted to implement inbuilt solutions if > possible. > > Please suggest if we can eliminate above explained issue on highlighting. > > Thanks. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-Make-Exact-Search-on-Field-with-Fuzzy-Query-tp4012888p4013067.html > Sent from the Solr - User mailing list archive at Nabble.com.
RE: SLOR And OpenNlp integration
Hi, Thanks for reply i fact i tried this tutorial but when i execute 'ant compile' i have probleme taht class not found despite the class a re their.I dont know wats the probleme -- View this message in context: http://lucene.472066.n3.nabble.com/SLOR-And-OpenNlp-integration-tp4013094p4013101.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SLOR And OpenNlp integration
Hi - the wiki page will get you up and running quickly: http://wiki.apache.org/solr/OpenNLP -Original message- > From:ahmed > Sent: Thu 11-Oct-2012 13:32 > To: solr-user@lucene.apache.org > Subject: SLOR And OpenNlp integration > > Hello, > I am a new user of apache solr and i have to integrate opennlp avec solr > .The problem is that i dont find a tutorial to do this integration .so i am > asking if there is someone who can help me to do this integration ? > thanks, > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/SLOR-And-OpenNlp-integration-tp4013094.html > Sent from the Solr - User mailing list archive at Nabble.com. >
SLOR And OpenNlp integration
Hello, I am a new user of apache solr and i have to integrate opennlp avec solr .The problem is that i dont find a tutorial to do this integration .so i am asking if there is someone who can help me to do this integration ? thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/SLOR-And-OpenNlp-integration-tp4013094.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Auto Correction?
> so other than commercial solutions, > it seems like i need to have plugin > right? i couldnt find any open source solutions yet... Yes you need to implement custom SearchComponent (plugin). http://wiki.apache.org/solr/SearchComponent Or alternatively you can re-search suggestions at client time.
Re: segment number during optimize of index
Hi Lance, My earlier point may be misleading " 1. Segments are independent sub-indexes in seperate file, while | >indexing | >its better to create new segment as it doesnt have to modify an | >existing | >file. where as while searching, *smaller the segment* the better | >it is | > since | >you open x (not exactly x but xn a value proportional to x) | >physical | > files | >to search if you have got x segments in the index." The "smaller"was referencing to the segment number rather than segment size. When you said "Large Pages" does it mean segment size should be less than a threshold for a better performance from OS point of view? My main concern here is what would be the main disadvantage (indexing or searching) if i merge my entire 150 GB index (right now 100 segments) into a single segment ? On 11 October 2012 07:28, Lance Norskog wrote: > Study index merging. This is awesome. > > http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html > > Jame- opening lots of segments is not a problem. A major performance > problem you will find is 'Large Pages'. This is an operating-system > strategy for managing servers with 10s of gigabytes of memory. Without it, > all large programs run much more slowly than they could. It is not a Solr > or JVM problem. > > > - Original Message - > | From: "jun Wang" > | To: solr-user@lucene.apache.org > | Sent: Wednesday, October 10, 2012 6:36:09 PM > | Subject: Re: segment number during optimize of index > | > | I have an other question, does the number of segment affect speed for > | update index? > | > | 2012/10/10 jame vaalet > | > | > Guys, > | > thanks for all the inputs, I was continuing my research to know > | > more about > | > segments in Lucene. Below are my conclusion, please correct me if > | > am wrong. > | > > | >1. Segments are independent sub-indexes in seperate file, while > | >indexing > | >its better to create new segment as it doesnt have to modify an > | >existing > | >file. where as while searching, smaller the segment the better > | >it is > | > since > | >you open x (not exactly x but xn a value proportional to x) > | >physical > | > files > | >to search if you have got x segments in the index. > | >2. since lucene has memory map concept, for each file/segment in > | >index a > | >new m-map file is created and mapped to the physcial file in > | >disk. Can > | >someone explain or correct this in detail, i am sure there are > | >lot many > | >people wondering how m-map works while you merge or optimze > | >index > | > segments. > | > > | > > | > > | > On 6 October 2012 07:41, Otis Gospodnetic > | > | > >wrote: > | > > | > > If I were you and not knowing all your details... > | > > > | > > I would optimize indices that are static (not being modified) and > | > > would optimize down to 1 segment. > | > > I would do it when search traffic is low. > | > > > | > > Otis > | > > -- > | > > Search Analytics - > | > > http://sematext.com/search-analytics/index.html > | > > Performance Monitoring - http://sematext.com/spm/index.html > | > > > | > > > | > > On Fri, Oct 5, 2012 at 4:27 PM, jame vaalet > | > > > | > wrote: > | > > > Hi Eric, > | > > > I am in a major dilemma with my index now. I have got 8 cores > | > > > each > | > > around > | > > > 300 GB in size and half of them are deleted documents in it and > | > > > above > | > > that > | > > > each has got around 100 segments as well. Do i issue a > | > > > expungeDelete > | > and > | > > > allow the merge policy to take care of the segments or optimize > | > > > them > | > into > | > > > single segment. Search performance is not at par compared to > | > > > usual solr > | > > > speed. > | > > > If i have to optimize what segment number should i choose? my > | > > > RAM size > | > > > around 120 GB and JVM heap is around 45 GB (oldGen being 30 > | > > > GB). Pleas > | > > > advice ! > | > > > > | > > > thanks. > | > > > > | > > > > | > > > On 6 October 2012 00:00, Erick Erickson > | > > > > | > wrote: > | > > > > | > > >> because eventually you'd run out of file handles. Imagine a > | > > >> long-running server with 100,000 segments. Totally > | > > >> unmanageable. > | > > >> > | > > >> I think shawn was emphasizing that RAM requirements don't > | > > >> depend on the number of segments. There are other > | > > >> resources that file consume however. > | > > >> > | > > >> Best > | > > >> Erick > | > > >> > | > > >> On Fri, Oct 5, 2012 at 1:08 PM, jame vaalet > | > > >> > | > > wrote: > | > > >> > hi Shawn, > | > > >> > thanks for the detailed explanation. > | > > >> > I have got one doubt, you said it doesn matter how many > | > > >> > segments > | > index > | > > >> have > | > > >> > but then why does solr has this merge policy which merges > | > > >> > segments > | > > >> > frequently? why can it leave the segments as it is rather > | > > >> > than > |
Re: Unique terms without faceting
On Wed, 2012-10-10 at 17:45 +0200, Phil Hoy wrote: > I know that you can use a facet query to get the unique terms for a > field taking account of any q or fq parameters but for our use case the > counts are not needed. So is there a more efficient way of finding > just unique terms for a field? Short answer: Not at this moment. If the amount of unique terms is large (millions), a fair amount of temporary memory could be spared by just keeping track of matched terms with a boolean vs. the full int for standard faceting. Reduced memory requirements means less garbage collection and faster processing due to better cache utilization. So yes, there is a more efficient way. Guessing from your other posts, you are building a social network and need to query on surnames and similar large fields. Question is of course how large the payoff will be and if it is worth the investment in development hours. I would suggest hacking the current faceting code to use OpenBitSet instead of int[] and doing performance tests on that. PerSegmentSingleValuedFaceting.SegFacet and UnivertedField.getCounts seems to be the right places to look in Solr 4. Regards, Toke Eskildsen, State and University Library, Denmark