Re: scheduling imports and heartbeats
On Tue, Nov 9, 2010 at 10:16 PM, Tri Nguyen tringuye...@yahoo.com wrote: Hi, Can I configure solr to schedule imports at a specified time (say once a day, once an hour, etc)? Also, does solr have some sort of heartbeat mechanism? Thanks, Tri Tri, If you use the DataImportHandler (DIH), you can set up a dataimport.properties file that can be configured to import on intervals. http://wiki.apache.org/solr/DataImportHandler#dataimport.properties_example As for heartbeat, you can use the ping handler (default is /admin/ping) to check the status of the servlet. - Ken
Output Search Result in ADD-XML-Format
Dear all, my use case is: Creating an index using DIH where the sub-entity is querying another SOLR index for more fields. As there is a very convenient attribute useSolrAddSchema that would spare me to list all the fields I want to add from the other index, I'm looking for a way to get the search results in the ADD format directly. Before starting on the XSLT file that would transform the regular SOLR result into an SOLR update xml, I just wanted to ask whether there already exists a solution for this. Maybe I missed some request handler that already returns the result in update format? Thanks! Chantal
To cache or to not cache
Hi List, in one of our application's use-case scenarios we create a response from different data sources. In clear words: We combine different responses from different data sources (SQL, another Webservice and Solr) to one response. We would cache this information per request for a couple of minutes or hours outside of solr, since the data to cache does not come only from solr itself. However, I am not sure whether it would make sense to disable Solr's internal cache-mechanisms or at last which cache-mechanisms I can disable, because I am not sure what are the impacts of each cache in the long run. A query is usually type of dismax and uses some functionQueries. We do not sort, but we may use some filterQueries. Furthermore we retrive just one of up to 10 (stored) fields from our index. Most of the time it will be the same field (95-98% of the requests). I think using the filterCache makes sense, but what about documentCache and the others? Since I retrive in 95-98% of all cases the same field from our stored documents, how can I boost retriving that information? Thank you! -- View this message in context: http://lucene.472066.n3.nabble.com/To-cache-or-to-not-cache-tp1875289p1875289.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to Facet on a price range
On 11/9/2010 7:32 PM, Geert-Jan Brits wrote: when you drag the sliders , an update of how many results would match is immediately shown. I really like this. How did you do this? IS this out-of-the-box available with the suggested Facet_by_range patch? Hi, With the range facets you get the facet counts for every discrete step of the slider, these values are requested in the AJAX request whenever search criteria change and then someone uses the sliders we simply check the range that is selected and add the discrete values of that range to get the expected amount of results. So yes it is available, but as Solr is just the search backend the frontend stuff you'll have to write yourself. Regards, gwk
Re: How to Facet on a price range
Ah I see: like you said it's part of the facet range implementation. Frontend is already working, just need the 'update-on-slide' behavior. Thanks Geert-Jan 2010/11/10 gwk g...@eyefi.nl On 11/9/2010 7:32 PM, Geert-Jan Brits wrote: when you drag the sliders , an update of how many results would match is immediately shown. I really like this. How did you do this? IS this out-of-the-box available with the suggested Facet_by_range patch? Hi, With the range facets you get the facet counts for every discrete step of the slider, these values are requested in the AJAX request whenever search criteria change and then someone uses the sliders we simply check the range that is selected and add the discrete values of that range to get the expected amount of results. So yes it is available, but as Solr is just the search backend the frontend stuff you'll have to write yourself. Regards, gwk
Re: scheduling imports and heartbeats
i'm looking for another solution other than cron job. can i configure solr to schedule imports? From: Ranveer Kumar ranveer.s...@gmail.com To: solr-user@lucene.apache.org Sent: Tue, November 9, 2010 8:13:03 PM Subject: Re: scheduling imports and heartbeats You should use cron for that.. On 10 Nov 2010 08:47, Tri Nguyen tringuye...@yahoo.com wrote: Hi, Can I configure solr to schedule imports at a specified time (say once a day, once an hour, etc)? Also, does solr have some sort of heartbeat mechanism? Thanks, Tri
Re: Using Multiple Cores for Multiple Users
Hi, If your index is supposed to handle only public information, i.e. public RSS feeds, then I don't see a need for multiple cores. I would probably try to handle this on the query side only. Imagine this scenario: User A registers RSS-X and RSS-Y (the application starts pulling and indexing these feeds) User B registers RSS-Z (the application starts pulling feed Z) User C registers RSS-X and RSS-Z (the application does nothing, as these are already being indexed) When searching, add a filter to each user's queries. Solr will handle MANY terms in such a filter, and it is not likely that a human user subscribes to more than say a few 100 feeds. So for user C, the query would look like .../solr/select?q=foo barfq=feedID:(RSS-X OR RSS-Z) -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 10. nov. 2010, at 03.00, Adam Estrada wrote: Thanks a lot for all the tips, guys! I think that we may explore both options just to see what happens. I'm sure that scalability will be a huge mess with the core-per-user scenario. I like the idea of creating a user ID field and agree that it's probably the best approach. We'll see...I will be sure to let the list know what I find! Please don't stop posting your comments everyone ;-) My inquiring mind wants to know... Adam On Tue, Nov 9, 2010 at 7:34 PM, Jonathan Rochkind rochk...@jhu.edu wrote: If storing in a single index (possibly sharded if you need it), you can simply include a solr field that specifies the user ID of the saved thing. On the client side, in your application, simply ensure that there is an fq parameter limiting to the current user, if you want to limit to the current user's stuff. Relevancy ranking should work just as if you had 'seperate cores', there is no relevancy issue. It IS true that when your index gets very large, commits will start taking longer, which can be a problem. I don't mean commits will take longer just because there is more stuff to commit -- the larger the index, the longer an update to a single document will take to commit. In general, i suspect that having dozens or hundreds (or thousands!) of cores is not going to scale well, it is not going to make good use of your cpu/ram/hd resources. Not really the intended use case of multiple cores. However, you are probably going to run into some issues with the single index approach too. In general, how to deal with multi-tenancy in Solr is an oft-asked question that there doesn't seem to be any just works and does everything for you without needing to think about it solution for in solr. Judging from past thread. I am not a Solr developer or expert. From: Markus Jelsma [markus.jel...@openindex.io] Sent: Tuesday, November 09, 2010 6:57 PM To: solr-user@lucene.apache.org Cc: Adam Estrada Subject: Re: Using Multiple Cores for Multiple Users Hi, All, I have a web application that requires the user to register and then login to gain access to the site. Pretty standard stuff...Now I would like to know what the best approach would be to implement a customized search experience for each user. Would this mean creating a separate core per user? I think that this is not possible without restarting Solr after each core is added to the multi-core xml file, right? No, you can dynamically manage cores and parts of their configuration. Sometimes you must reindex after a change, the same is true for reloading cores. Check the wiki on this one [1]. My use case is this...User A would like to index 5 RSS feeds and User B would like to index 5 completely different RSS feeds and he is not interested at all in what User A is interested in. This means that they would have to be separate index cores, right? If you view documents within an rss feed as a separate documents, you can assign an user ID to those documents, creating a multi user index with rss documents per user, or group or whatever. Having a core per user isn't a good idea if you have many users. It takes up additional memory and disk space, doesn't share caches etc. There is also more maintenance and your need some support scripts to dynamically create new cores - Solr currently doesn't create a new core directory structure. But, reindexing a very large index takes up a lot more time and resources and relevancy might be an issue depending on the rss feeds' contents. What is the best approach for this kind of thing? I'd usually store the feeds in a single index and shard if it's too many for a single server with your specifications. Unless the demands are too specific. Thanks in advance, Adam [1]: http://wiki.apache.org/solr/CoreAdmin Cheers
Re: facetting when using field collapsing
On 07.11.2010, at 20:13, Lukas Kahwe Smith wrote: Hi, I am pondering making use of field collapsing. I am currently indexing clauses (sections) inside UN documents: http://resolutionfinder.org/search/unifiedResults?q=africa=t[22]=medicationdc=st=clause Now since right now my data set is still fairly small I am doing field collapsing in userland: http://resolutionfinder.org/search/unifiedResults?q=africa=t[22]=medicationdc=st=document However while this works alright (not ideal, since I am fetching essentially the entire result set and not paged as for clauses) etc, I still have no idea how to get the facet filters to display the right counts. So I am wondering if field collapsing in its current form supports faceting, since its not mentioned on the wiki page: http://wiki.apache.org/solr/FieldCollapsing The above wiki page seems to be out of date. Reading the comments in https://issues.apache.org/jira/browse/SOLR-236 it seems like group should be replaced with collapse. regards, Lukas Kahwe Smith m...@pooteeweet.org
RE: Output Search Result in ADD-XML-Format
I'm not sure, but SOLR-1499 might have what you want. https://issues.apache.org/jira/browse/SOLR-1499 James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Chantal Ackermann [mailto:chantal.ackerm...@btelligent.de] Sent: Wednesday, November 10, 2010 5:59 AM To: solr-user@lucene.apache.org Subject: Output Search Result in ADD-XML-Format Dear all, my use case is: Creating an index using DIH where the sub-entity is querying another SOLR index for more fields. As there is a very convenient attribute useSolrAddSchema that would spare me to list all the fields I want to add from the other index, I'm looking for a way to get the search results in the ADD format directly. Before starting on the XSLT file that would transform the regular SOLR result into an SOLR update xml, I just wanted to ask whether there already exists a solution for this. Maybe I missed some request handler that already returns the result in update format? Thanks! Chantal
Re: Next Word - Any Suggestions?
Hi Christopher, I am working my way through trying to implement SpanQueries in Solr (svn trunk). From my lack of progress, I am skeptical that I can help much, but I would be happy to try. I imagine you have already found (either before your message, or after posting it) Grant's lucene, spanquery, and WindowTermVectorMapper overview: http://www.lucidimagination.com/blog/2009/05/26/accessing-words-around-a-positional-match-in-lucene/ I'd be interested in hearing about your progress. Good luck Sean On 10/26/2010 08:26 AM, Christopher Ball wrote: Am about to implement a custom query that is sort of mash-up of Facets, Highlighting, and SpanQuery - but thought I'd see if anyone has done anything similar. In simple words, I need facet on the next word given a target word. For example, if my index only had the following 5 documents (comprised of a sentence each): Doc 1 - The quick brown fox jumped over the fence. Doc 2 - The sly fox skipped over the fence. Doc 3 - The fat fox skipped his afternoon class. Doc 4 - A brown duck and red fox, crashed the party. Doc 5 - Charles Brown! Fox! Crashed my damn car. The query should give the frequency of the distinct terms after the word fox: skipped - 2 crashed - 2 jumped - 1 Long-term, do the opposite - frequency of the distinct terms before the word fox: brown - 2 sly - 1 fat - 1 red - 1 My guess is that either the FastVectorHighlighter or SpanQuery would be a reasonable starting point. I was hoping to take advantage of Vectors as I am storing termVectors, termPositions, and termOffsets for the field in question. Grateful for any thoughts . . . reference implementations . . . words of encouragement . . . free beer - whatever you can offer. Gracias, Christopher
SpanQuery basics in Solr QueryComponent(?)
Hi all, I seem to be lost in the new flex indexing api. In the older api I was able to extend QueryComponent with my custom component, parse a restricted-syntax user query into a SpanQuery, and then grab an IndexReader. From there I worked with the spanquery's spans. For a bit of reference my old QueryComponent code looks something like: @Override public void process(ResponseBuilder rb) throws IOException { SolrQueryRequest req = rb.req; SolrQueryResponse rsp = rb.rsp; SDRQParser qparser = (SDRQParser) rb.getQparser(); SolrIndexSearcher.QueryCommand cmd = rb.getQueryCommand(); // custom parser returns SpanQuery IndexReader reader = req.getSearcher().getReader(); Spans spans = stq.getSpans(reader); // work with spans here... } With the new (1.5?) api, I got the warning about wrapping IndexReader with SlowMultiReaderWrapper, so I changed my approach above to something like: SolrIndexReader fullReader = req.getSearcher().getReader(); IndexReader reader = SlowMultiReaderWrapper.wrap(fullReader); // need help avoiding this...? I then got a NPE on what seems to be EmptyTerms.toString(). For kicks, I noticed that EmpytyTerms did not override its parent (TermSpans) toString() method, which seemed to be the cause of the problems. Overriding that, fixed the NPE, and now I get results (so I will look at filing a bug report unless someone mentions otherwise). Any hints on how I can/should 'properly' work with spans in solr? Also, are there any introductory documents to the MultiFields and sub-indexes stuff? Particularly how to implement MultiFields as a better approach to SlowMultiReaderWrapper (thanks for the warnings about performance). I cannot seem to find the relevant beginner material to avoid using the SMRW. The material I do find seems to require that you pass in a 'found' document, or perhaps walk through all subReaders? And finally: should I be looking at some existing Solr code to lead guide me? I am having trouble finding the highlighter code which I believe uses spans (WeightedSpanTerm??). Is there already code to convert user queries to span queries? Thanks, Sean
Re: Using Multiple Cores for Multiple Users
On Tue, Nov 9, 2010 at 6:00 PM, Adam Estrada estrada.adam.gro...@gmail.comwrote: Thanks a lot for all the tips, guys! I think that we may explore both options just to see what happens. I'm sure that scalability will be a huge mess with the core-per-user scenario. I like the idea of creating a user ID field and agree that it's probably the best approach. We'll see...I will be sure to let the list know what I find! Please don't stop posting your comments everyone ;-) My inquiring mind wants to know... I think it is customary for me to mention the techniques mentioned in LotsOfCores for these kind of questions. The patches are mostly useless at this point but if you are looking for a per-user solution, you will need most of the tricks mentioned on the wiki page. http://wiki.apache.org/solr/LotsOfCores -- Regards, Shalin Shekhar Mangar.
Re: To cache or to not cache
Thank you Shalin. Yes, both - Solr and some other applications could possible run on the same box. I hoped that not storing redundantly in Solr and somewhere else in the RAM would not touch Solr's performance very much. Just to understand Solr'c caching mechanism: My first query is red firefox - all caches were turned on. If I am searching now for red star, does this query makes any usage from the cache, since both share the term red? Kind regards -- View this message in context: http://lucene.472066.n3.nabble.com/To-cache-or-to-not-cache-tp1875289p1876767.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Output Search Result in ADD-XML-Format
Thank you, James. I was looking for something like that (and I remember having stumbled over it, in the past, now that you mention it). I've created an xslt file that transforms the regular result to an update xml document. Seeing that the SolrEntityProcessor is still in development, I will stick to the XSLT solution while we are still using 1.4 but I will add a note that with the new release we should try this SolrEntityProcessor. (Reading through the JIRA issue I'm not sure whether I can simply get all fields from the other index and dump them into the index which is being built. With the XSLT + useSolrAddSchema solution this works just fine without the need to list all the fields. I should try that before the next solr release to be able to give some feedback.) Thanks! Chantal On Wed, 2010-11-10 at 15:13 +0100, Dyer, James wrote: I'm not sure, but SOLR-1499 might have what you want. https://issues.apache.org/jira/browse/SOLR-1499 James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Chantal Ackermann [mailto:chantal.ackerm...@btelligent.de] Sent: Wednesday, November 10, 2010 5:59 AM To: solr-user@lucene.apache.org Subject: Output Search Result in ADD-XML-Format Dear all, my use case is: Creating an index using DIH where the sub-entity is querying another SOLR index for more fields. As there is a very convenient attribute useSolrAddSchema that would spare me to list all the fields I want to add from the other index, I'm looking for a way to get the search results in the ADD format directly. Before starting on the XSLT file that would transform the regular SOLR result into an SOLR update xml, I just wanted to ask whether there already exists a solution for this. Maybe I missed some request handler that already returns the result in update format? Thanks! Chantal
Re: To cache or to not cache
Em wrote: My first query is red firefox - all caches were turned on. If I am searching now for red star, does this query makes any usage from the cache, since both share the term red? I don't believe it does, no. I understand your question -- if your caching things externally anyway, do you need caches in Solr, or is that just redundant? The answer is kind of complicated though -- maybe, maybe not. In some cases having too small Solr caches will make your Solr performance really bad -- if you want to page through Solr results, for instance, the document cache is going to be important. In fact, if Solr can't hold enough for the _current page_ in the cache, that's going to mess up Solr even more, and even returning a single request, Solr functions that want to look at the documents are (in some cases) going to keep retreiving them over and over again, instead of getting them from the cache -- even within a single Solr request-response. I could be wrong about some of those details, this is me kind of hand-waving because I'm not an expert at this stuff. I know just enough to try not to be dangerous (ha), meaning that I am pretty sure that you can't issue a blanket yeah, get rid of Solr caches in your circumstance. There are probably some caches you can make (much) smaller , but it requires kind of complicated Solr-fu to understand which those are. You could certainly keep your caches fairly small, and see what happens, do some benchmarking. Jonathan
Re: To cache or to not cache
On Wed, Nov 10, 2010 at 7:51 AM, Em mailformailingli...@yahoo.de wrote: Thank you Shalin. Yes, both - Solr and some other applications could possible run on the same box. I hoped that not storing redundantly in Solr and somewhere else in the RAM would not touch Solr's performance very much. Just to understand Solr'c caching mechanism: My first query is red firefox - all caches were turned on. If I am searching now for red star, does this query makes any usage from the cache, since both share the term red? Well, we can assume that some documents will be common so the documentCache will be hit. If you are using a sort on fields or function queries, the fieldCache built by lucene (not configurable) will be used. If there are any common fq clauses, those will hit the filterCache. Apart from that, it is difficult to say unless we know the field types and the parsed query. -- Regards, Shalin Shekhar Mangar.
Re: Highlighter - multiple instances of term being combined
Ahh this reconfirms. The analyzers are properly pulling things apart. There are two instances of the query keyword with words between them. But from your last comment, it sounds like the system's not trying to do any sort of phrase highlighting, but is just hitting a weird edge case? I'm seeing this behavior somewhat commonly, so I thought for sure there must be some option that says if two highlighted words are sufficiently close together, highlight them as a single phrase. On Tue, Nov 9, 2010 at 7:11 PM, Lance Norskog goks...@gmail.com wrote: Have you looked at solr/admin/analysis.jsp? This is 'Analysis' link off the main solr admin page. It will show you how text is broken up for both the indexing and query processes. You might get some insight about how these words are torn apart and assigned positions. Trying the different Analyzers and options might get you there. But to be frank- highlighting is a tough problem and has always had a lot of edge cases. On Tue, Nov 9, 2010 at 6:08 PM, Sasank Mudunuri sas...@gmail.com wrote: I'm finding that if a keyword appears in a field multiple times very close together, it will get highlighted as a phrase even though there are other terms between the two instances. So this search: http://localhost:8983/solr/select/? hl=true hl.snippets=1 q=residue hl.fragsize=0 mergeContiguous=false indent=on hl.usePhraseHighlighter=false debugQuery=on hl.fragmenter=gap hl.highlightMultiTerm=false Highlights as: What does low-emresidue mean? Like low-residue/em diet? Trying to get it to highlight as: What does low-emresidue/em mean? Like low-emresidue/em diet? I've tried playing with various combinations of mergeContiguous, highlightMultiTerm, and usePhraseHighlighter, but they all yield the same output. For reference, field type uses a StandardTokenizerFactory and SynonymFilterFactory, StopFilterFactory, StandardFilterFactory and SnowballFilterFactory. I've confirmed that the intermediate words don't appear in either the synonym or the stop words list. I can post the full definition if helpful. Any pointers as to how to debug this would be greatly appreciated! sasank -- Lance Norskog goks...@gmail.com
Is there a way to create multiple doc using DIH and access the data pertaining to a particular doc name ?
Hi, I have a peculiar situation where we are trying to use SOLR for indexing multiple tables (There is no relation between these tables). We are trying to use the SOLR index instead of using the source tables and hence we are trying to create the SOLR index as that of source tables. There are 3 tables which needs to be indexed. Table 1, table 2 and table 3. I am trying to index each table in seperate doc tag with different doc tag name and each table has some of the common field names. For Ex: document name=DataStoreElement entity name=DataStoreElement query= field column=DATA_STOR name=DATA_STO/ /entity /document document name=DataStore entity name=DataStore query= field column=DATA_STOR name=DATA_STO/ /entity /document After indexing is complete, I am interested in searching the DATA_STO present under a particular document(not from both documents). something like /search?q=test AND documentname:DataStoreElement Is it possible to do this using DIH in SOLR? My current approach is to manipulate the source field names to unique names during indexing. One more approach would be to have multi core setup and index each tables seperately in different core.. Please let me know if there is any other suggestion for this issue. Thanks, Barani -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-create-multiple-doc-using-DIH-and-access-the-data-pertaining-to-a-particular-doc-n-tp1877203p1877203.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: spell check vs terms component
Shalin / Ken, Thanks a lot for your suggestions ..I havent tried NGrams filter.. I will try that too.. Thanks, Barani -- View this message in context: http://lucene.472066.n3.nabble.com/spell-check-vs-terms-component-tp1870214p1877233.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: To cache or to not cache
Jonathan, sound like it makes sense. In this case I think it is more important to size the external cache very well, instead of Solr's. Even when 1/5th of the requests are redundant, an external cache could not answer the other 4/5ths and so decreasing Solr's cache would slow down the whole application. Since this is only a conceptual question, I really do not have got any benchmark - data. But if I have some, I will ask if it was possible to publish them. Regards -- View this message in context: http://lucene.472066.n3.nabble.com/To-cache-or-to-not-cache-tp1875289p1877245.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: AW: Some issues concerning SOLR 1.4.1
I know this post was a while ago, but it's the only one I've found which exactly matches what we're seeing with our solr application. We recently upgraded to 1.4.1 and all of the issues you have listed are happening to us. Did you find a solution? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Some-issues-concerning-SOLR-1-4-1-tp930063p1877247.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr optimize operation slows my MySQL serveur
Hello, I've a Solr index with few millions of entries, so an optimize usually takes several minutes (or at least tens of seconds). During the optimize process, there is free RAM and the load average stays under 1. But... the machine also hosts a MySQL server, and during the optimize process, there are a lot of slow mysql queries (4 to 10 seconds), without apparent reason. I don't understand why Solr interferes with MySQL in my case. Do you have an idea ? Thank you ! Godefroy -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-optimize-operation-slows-my-MySQL-serveur-tp1877270p1877270.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: To cache or to not cache
You know, on further reflection, I'd suggest you think (and ideally measure) hard about whether you even need this application-level solr-data-cache. Solr is a caching machine, it's kind of what Solr does, one of the main focuses of Solr. A query to Solr that hits the right caches comes back amazingly fast. With properly turned Solr caches for your use, and sufficient RAM to hold them (possibly less than you think, Solr is pretty efficient), I'm not sure you're going to get any benefit at all from trying to write your own extra cache on top of Solr. Em wrote: Jonathan, sound like it makes sense. In this case I think it is more important to size the external cache very well, instead of Solr's. Even when 1/5th of the requests are redundant, an external cache could not answer the other 4/5ths and so decreasing Solr's cache would slow down the whole application. Since this is only a conceptual question, I really do not have got any benchmark - data. But if I have some, I will ask if it was possible to publish them. Regards
Re: To cache or to not cache
PS: There's also, I think, a way to turn on HTTP-level caching for Solr, which I believe is caching of entire responses that match an exact Solr query, filled without actually touching Solr at all. But I'm not sure about this, because I'm always trying to make sure this HTTP-level cache is turned off because it messes me up, rather than looking into the details of it. In general, I doubt you are going to come up with any external caches that work better for Solr content than the caches in Solr itself, the product of hundreds of developer hours of work focused on Solr specifically. Jonathan Rochkind wrote: You know, on further reflection, I'd suggest you think (and ideally measure) hard about whether you even need this application-level solr-data-cache. Solr is a caching machine, it's kind of what Solr does, one of the main focuses of Solr. A query to Solr that hits the right caches comes back amazingly fast. With properly turned Solr caches for your use, and sufficient RAM to hold them (possibly less than you think, Solr is pretty efficient), I'm not sure you're going to get any benefit at all from trying to write your own extra cache on top of Solr. Em wrote: Jonathan, sound like it makes sense. In this case I think it is more important to size the external cache very well, instead of Solr's. Even when 1/5th of the requests are redundant, an external cache could not answer the other 4/5ths and so decreasing Solr's cache would slow down the whole application. Since this is only a conceptual question, I really do not have got any benchmark - data. But if I have some, I will ask if it was possible to publish them. Regards
Re: Solr optimize operation slows my MySQL serveur
On 2010-11-10 18:08, Skreo wrote: Hello, I've a Solr index with few millions of entries, so an optimize usually takes several minutes (or at least tens of seconds). During the optimize process, there is free RAM and the load average stays under 1. But... the machine also hosts a MySQL server, and during the optimize process, there are a lot of slow mysql queries (4 to 10 seconds), without apparent reason. I don't understand why Solr interferes with MySQL in my case. Do you have an idea ? Thank you ! Memory and disk bandwidth spikes when SolR optimizes an index.
Re: Solr optimize operation slows my MySQL serveur
I just made a test : Before the optimize : # w sk...@gedeon:~$ w 18:55:43 up 22 days, 22:27, 4 users, load average: 0,07, 0,02, 0,00 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT # iostat 10 avg-cpu: %user %nice %system %iowait %steal %idle 0,820,000,350,040,00 98,79 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 15,10 4,80 249,60 48 2496 sda1 3,80 0,0044,00 0440 sda2 8,70 4,80 205,60 48 2056 sda3 0,00 0,00 0,00 0 0 sdc 14,90 3,20 249,60 32 2496 sdc1 3,80 0,0044,00 0440 sdc2 8,50 3,20 205,60 32 2056 sdc3 0,00 0,00 0,00 0 0 sdb 0,00 0,00 0,00 0 0 sdb1 0,00 0,00 0,00 0 0 sdd 0,00 0,00 0,00 0 0 sdd1 0,00 0,00 0,00 0 0 md0 0,00 0,00 0,00 0 0 md2 19,70 8,00 189,60 80 1896 md1 5,10 0,0040,80 0408 dm-0 17,10 8,00 189,60 80 1896 dm-1 0,00 0,00 0,00 0 0 During the optimize : # w 18:57:07 up 22 days, 22:29, 4 users, load average: 1,10, 0,25, 0,08 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT # iostat 10 avg-cpu: %user %nice %system %iowait %steal %idle 12,730,000,570,040,00 86,66 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 13,50 4,00 318,40 40 3184 sda1 0,90 0,0015,20 0152 sda2 9,60 4,00 303,20 40 3032 sda3 0,00 0,00 0,00 0 0 sdc 13,40 3,20 318,40 32 3184 sdc1 0,90 0,0015,20 0152 sdc2 9,50 3,20 303,20 32 3032 sdc3 0,00 0,00 0,00 0 0 sdb 0,00 0,00 0,00 0 0 sdb1 0,00 0,00 0,00 0 0 sdd 0,00 0,00 0,00 0 0 sdd1 0,00 0,00 0,00 0 0 md0 0,00 0,00 0,00 0 0 md2 23,10 7,20 285,60 72 2856 md1 1,50 0,0012,00 0120 dm-0 19,50 7,20 280,80 72 2808 dm-1 0,60 0,00 4,80 0 48 avg-cpu: %user %nice %system %iowait %steal %idle 6,890,002,05 10,140,00 80,92 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 65,00 0,00 58689,60 0 586896 sda1 1,00 0,0016,00 0160 sda2 62,60 0,00 58673,60 0 586736 sda3 0,00 0,00 0,00 0 0 sdc 59,60 0,80 53568,00 8 535680 sdc1 0,80 0,0014,40 0144 sdc2 57,40 0,80 53553,60 8 535536 sdc3 0,00 0,00 0,00 0 0 sdb 0,00 0,00 0,00 0 0 sdb1 0,00 0,00 0,00 0 0 sdd 0,00 0,00 0,00 0 0 sdd1 0,00 0,00 0,00 0 0 md0 0,00 0,00 0,00 0 0 md2 13822,90 0,80110571,20 81105712 md1 1,70 0,0013,60 0136 dm-0 9,30 0,8073,60 8736 dm-1 13812,20 0,00110497,60 01104976 avg-cpu: %user %nice %system %iowait %steal %idle 2,230,001,40 28,110,00 68,26 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 114,10 4,80106738,40 48
Best practice for emailing this list?
How do people email this list without getting spam filter problems?
Search with accent
Hi all, Somebody knows how can I config my solr to make searches with and without accents? for example: pereque and perequê When I do it I need the same result, but its not working. tks --
Re: Solr optimize operation slows my MySQL serveur
On 11/10/2010 11:00 AM, Skreo wrote: I just made a test : snip avg-cpu: %user %nice %system %iowait %steal %idle 2,230,001,40 28,110,00 68,26 Your iowait percentage for that 10 second interval was 28%, which is pretty high. Solr has to make a complete copy of the index, which means a lot of disk I/O. Optimizing an index involves more than just copying it, though - Solr is processing every document in the old index and writing it into the new one. This is even more load on the CPU. Databases like MySQL are also resource intensive, especially in the I/O department. Unless you have enough RAM to cache your MySQL databases as well as your entire Solr index, you're always going to run into this problem. It's strongly recommended that you put Solr on dedicated hardware. If you asked about this on the MySQL list, I imagine they'd make the same recommendation regarding their software. Shawn
Re: scheduling imports and heartbeats
: References: 4cd8fb5a.9040...@srce.hr 001701cb7fe2$58abc660$0a0353...@com : aanlkti=zuypu4d5q3znmob8vst8zezxh9p+cbsalz...@mail.gmail.com : 4cd962b0.2090...@srce.hr : aanlktimvmn7rvjks8cqdkeidyt7dr5qzs5ji4xpdu...@mail.gmail.com : 4cd9ae1e.8080...@srce.hr : aanlktinje_hqrfqf8uu_1k5dd9ejauc7wigmbeh13...@mail.gmail.com : Subject: scheduling imports and heartbeats : In-Reply-To: aanlktinje_hqrfqf8uu_1k5dd9ejauc7wigmbeh13...@mail.gmail.com http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking -Hoss
Re: Best practice for emailing this list?
Hi robo, try to send eMail in plain-text format. This often helps a lot! Regards -- View this message in context: http://lucene.472066.n3.nabble.com/Best-practice-for-emailing-this-list-tp1877693p1877792.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best practice for emailing this list?
I tried that as well but the original email I was trying to send about replication and load balancing was still being marked as spam (5.8 is above threshold). That is when I thought I would try a very simple email such as this one. Is there a list of keywords to avoid? On Wed, Nov 10, 2010 at 10:26 AM, Em mailformailingli...@yahoo.de wrote: Hi robo, try to send eMail in plain-text format. This often helps a lot! Regards -- View this message in context: http://lucene.472066.n3.nabble.com/Best-practice-for-emailing-this-list-tp1877693p1877792.html Sent from the Solr - User mailing list archive at Nabble.com.
Custom Request Handler
I was reading the in Solr Wiki about creating request handlers - http://wiki.apache.org/solr/SolrRequestHandler and saw that there are two different ways to create a handler: 1. Define as requestHandler name=/baz class=my.package.AnotherCustomRequestHandler and call via http://localhost:8983/baz/?.. 2. Define as requestHandler name=baz class=my.package.AnotherCustomRequestHandler and call via http://localhost:8983/select/?qt=baz... So I was wondering one way is preferred over the other? Thanks, Paige Cook
Re: Best practice for emailing this list?
On Wed, Nov 10, 2010 at 1:11 PM, robo - robom...@gmail.com wrote: How do people email this list without getting spam filter problems? Depends on which side of the spam filter that you're referring to. I've found that to keep these emails from entering my spam filter is to add a rule to Gmail that says Never send to spam. As for when I send emails, I make sure that I send my emails as plain text to avoid getting bounce backs. - Ken
Re: Sorting and filtering on fluctuating multi-currency price data?
: ExternalFileField can only be used for boosting. It is not a : first-class field. correct, but it can be used in a FunctionQuery, which means it can be filtered on (using frange) and (on trunk) it can be sorted on, which are the two needs the OP asked about... : : Another approach would be to use ExternalFileField and keep the price data, : : normalized to USD, outside of the index. Every time the currency rates : : changed, we would calculate new normalized prices for every document in the : : index. -Hoss
Re: Best practice for emailing this list?
No matter how much I limit my other email it will not get through the Solr mailing spam filter. This has to be the most frustrating mailing list I have ever tried to work with. All I need are some answers on replication and load balancing but I can't even get it to the list. On Wed, Nov 10, 2010 at 10:17 AM, Ken Stanley doh...@gmail.com wrote: On Wed, Nov 10, 2010 at 1:11 PM, robo - robom...@gmail.com wrote: How do people email this list without getting spam filter problems? Depends on which side of the spam filter that you're referring to. I've found that to keep these emails from entering my spam filter is to add a rule to Gmail that says Never send to spam. As for when I send emails, I make sure that I send my emails as plain text to avoid getting bounce backs. - Ken
Re: Search with accent
I don't understand, when the user search for perequê you want the results for perequê and pereque? If thats the case, any field type with ISOLatin1AccentFilterFactory should work. The accent should be removed at index time and at query time (Make sure the filter is being applied on both cases). Tomás De: Claudio Devecchi cdevec...@gmail.com Para: Lista Solr solr-user@lucene.apache.org Enviado: miércoles, 10 de noviembre, 2010 15:16:24 Asunto: Search with accent Hi all, Somebody knows how can I config my solr to make searches with and without accents? for example: pereque and perequê When I do it I need the same result, but its not working. tks --
Chinese characters - a little OT
Sorry, OT but its driving me nuts. I've indexed a document with chinese characters in its title. When I perform the search (that returns json) I get back the title and using Javascript place it into a variable that ultimately ends up as a dropdown of titles to choose from. The problem is the title contains the literal unicode representation of the chinese characters (#20013; for example). Here's the javascript: var optionObj=document.createElement('option'); menuItem=titleArray[1].title; menuVal=titleArray[1].url; if((menuItem != )(menuItem != )(menuItem != null)) { optionObj.appendChild(document.createTextNode(menuItem)); optionObj.setAttribute('id',optId + optCnt); optionObj.setAttribute('target',_blank); optionObj.setAttribute('value',menuVal); optCnt++; selectObj.appendChild(optionObj); } My hunch is I should utf-8 encode the title and then try and display the result but its nor working. I still am seeing the unicode characters. Does anyone see what I could be doing wrong? TIA - Tod
Re: Best practice for emailing this list?
Mmmm maybe its your mail address? :P Weird, i didn't have any problem with it using gmail... Send in plain text, avoid links or links... maybe that could work... If you want, send me the mail and i will forward it to the list, just to test! On Wed, Nov 10, 2010 at 3:59 PM, robo - robom...@gmail.com wrote: No matter how much I limit my other email it will not get through the Solr mailing spam filter. This has to be the most frustrating mailing list I have ever tried to work with. All I need are some answers on replication and load balancing but I can't even get it to the list. On Wed, Nov 10, 2010 at 10:17 AM, Ken Stanley doh...@gmail.com wrote: On Wed, Nov 10, 2010 at 1:11 PM, robo - robom...@gmail.com wrote: How do people email this list without getting spam filter problems? Depends on which side of the spam filter that you're referring to. I've found that to keep these emails from entering my spam filter is to add a rule to Gmail that says Never send to spam. As for when I send emails, I make sure that I send my emails as plain text to avoid getting bounce backs. - Ken -- __ Ezequiel. Http://www.ironicnet.com
Adding new field after data is already indexed
Hi, I had a few questions regarding Solr. Say my schema file looks like field name=folder_id type=long indexed=true stored=true/ field name=indexed type=boolean indexed=true stored=true/ and i index data on the basis of these fields. Now, incase i need to add a new field, is there a way i can add the field without corrupting the previous data. Is there any feature which adds a new field with a default value to the existing records. 2) Is there any security mechanism/authorization check to prevent url like /admin and /update to only a few users. -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-new-field-after-data-is-already-indexed-tp1862575p1862575.html Sent from the Solr - User mailing list archive at Nabble.com.
Dynamic creating of cores in solr
Hi, I'm not sure this is the right mail to write to, hopefully you can help or direct me to the right person I'm using solr - one master with 17 slaves in the server and using solrj as the java client Currently there's only one core in all of them (master and slaves) - only the cpaCore. I thought about using multi-cores solr, but I have some problems with that. I don't know in advance which cores I'd need - When my java program runs, I call for documents to be index to a certain url, which contains the core name, and I might create a url based on core that is not yet created. For example: Calling to index - http://localhost:8080/cpaCore - existing core, everything as usual Calling to index - http://localhost:8080/newCore - server realizes there's no core newCore, creates it and indexes to it. After that - also creates the new core in the slaves Calling to index - http://localhost:8080/newCore - existing core, everything as usual What I'd like to have on the server side to do is realize by itself if the cores exists or not, and if not - create it One other restriction - I can't change anything in the client side - calling to the server can only make the calls it's doing now - for index and search, and cannot make calls for cores creation via the CoreAdminHandler. All I can do is something in the server itself What can I do to get it done? Write some RequestHandler? REquestProcessor? Any other option? Thanks, nizan
Re: Best practice for emailing this list?
Tried to forward the mail of robomon but had the same error: Delivery to the following recipient failed permanently: solr-user@lucene.apache.org Technical details of permanent failure: Google tried to deliver your message, but it was rejected by the recipient domain. We recommend contacting the other email provider for further information about the cause of this error. The error that the other server returned was: 552 552 spam score (5.8) exceeded threshold (state 18). - Original message - On Wed, Nov 10, 2010 at 4:12 PM, Ezequiel Calderara ezech...@gmail.comwrote: Mmmm maybe its your mail address? :P Weird, i didn't have any problem with it using gmail... Send in plain text, avoid links or links... maybe that could work... If you want, send me the mail and i will forward it to the list, just to test! On Wed, Nov 10, 2010 at 3:59 PM, robo - robom...@gmail.com wrote: No matter how much I limit my other email it will not get through the Solr mailing spam filter. This has to be the most frustrating mailing list I have ever tried to work with. All I need are some answers on replication and load balancing but I can't even get it to the list. On Wed, Nov 10, 2010 at 10:17 AM, Ken Stanley doh...@gmail.com wrote: On Wed, Nov 10, 2010 at 1:11 PM, robo - robom...@gmail.com wrote: How do people email this list without getting spam filter problems? Depends on which side of the spam filter that you're referring to. I've found that to keep these emails from entering my spam filter is to add a rule to Gmail that says Never send to spam. As for when I send emails, I make sure that I send my emails as plain text to avoid getting bounce backs. - Ken -- __ Ezequiel. Http://www.ironicnet.com http://www.ironicnet.com/ -- __ Ezequiel. Http://www.ironicnet.com
Re: Best practice for emailing this list?
Thanks for all your help Ezequiel. I cannot see anything in my email that would make this get marked as spam. Anybody have any ideas on how to get this fixed so I can email my questions? robo On Wed, Nov 10, 2010 at 11:36 AM, Ezequiel Calderara ezech...@gmail.com wrote: Tried to forward the mail of robomon but had the same error: Delivery to the following recipient failed permanently: solr-u...@lucene.apache.org Technical details of permanent failure: Google tried to deliver your message, but it was rejected by the recipient domain. We recommend contacting the other email provider for further information about the cause of this error. The error that the other server returned was: 552 552 spam score (5.8) exceeded threshold (state 18). - Original message - On Wed, Nov 10, 2010 at 4:12 PM, Ezequiel Calderara ezech...@gmail.comwrote: Mmmm maybe its your mail address? :P Weird, i didn't have any problem with it using gmail... Send in plain text, avoid links or links... maybe that could work... If you want, send me the mail and i will forward it to the list, just to test! On Wed, Nov 10, 2010 at 3:59 PM, robo - robom...@gmail.com wrote: No matter how much I limit my other email it will not get through the Solr mailing spam filter. This has to be the most frustrating mailing list I have ever tried to work with. All I need are some answers on replication and load balancing but I can't even get it to the list. On Wed, Nov 10, 2010 at 10:17 AM, Ken Stanley doh...@gmail.com wrote: On Wed, Nov 10, 2010 at 1:11 PM, robo - robom...@gmail.com wrote: How do people email this list without getting spam filter problems? Depends on which side of the spam filter that you're referring to. I've found that to keep these emails from entering my spam filter is to add a rule to Gmail that says Never send to spam. As for when I send emails, I make sure that I send my emails as plain text to avoid getting bounce backs. - Ken -- __ Ezequiel. Http://www.ironicnet.com http://www.ironicnet.com/ -- __ Ezequiel. Http://www.ironicnet.com
Re: scheduling imports and heartbeats
Thanks for the tip Ken. I tried that but don't see the importing happening when I check up on the status. Below is what's in my dataimport.properties. #Wed Nov 10 11:36:28 PST 2010 metadataObject.last_index_time=2010-09-20 11\:12\:47 interval=1 port=8080 server=localhost params=/select?qt\=/dataimportcommand\=full-importclean\=truecommit\=true webapp=solr id.last_index_time=2010-11-10 11\:36\:27 syncEnabled=1 last_index_time=2010-11-10 11\:36\:27 From: Ken Stanley doh...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, November 10, 2010 4:41:17 AM Subject: Re: scheduling imports and heartbeats On Tue, Nov 9, 2010 at 10:16 PM, Tri Nguyen tringuye...@yahoo.com wrote: Hi, Can I configure solr to schedule imports at a specified time (say once a day, once an hour, etc)? Also, does solr have some sort of heartbeat mechanism? Thanks, Tri Tri, If you use the DataImportHandler (DIH), you can set up a dataimport.properties file that can be configured to import on intervals. http://wiki.apache.org/solr/DataImportHandler#dataimport.properties_example As for heartbeat, you can use the ping handler (default is /admin/ping) to check the status of the servlet. - Ken
Re: Default file locking on trunk
: There is now a data/index with a write lock file in it. I have not : attempted to read the index, let alone add something to it. : I start solr again, and it cannot open the index because of the write lock. Lance, i can't reproduce using trunk r1033664 on Linux w/ext4 -- what OS Filesystem are you using? If you load http://localhost:8983/solr/admin/stats.jsp; what does it list for the reader and readerDir in the searcher entry? : Why is there a write lock file when I have not tried to index anything? No idea ... i don't get any write locks until i actually attempt to index something. -Hoss
Re: Core status uptime and startTime
: As far as I know, in the core admin page you can find when was the last time : an index had a modification and was comitted checking the lastModified. : But? what startTime and uptime mean? : Thanks in advance startTime should be when the core was created (ie: when it started) uptime is now-startTime (in ms). -Hoss
RE: Dynamic creating of cores in solr
We also use SolrJ, and have a dynamically created Core capability - where we don't know in advance what the Cores will be that we require. We almost always do a complete index build, and if there's a previous instance of that index, it needs to be available during a complete index build, so we have two cores per index, and switch them as required at the end of an indexing run. Here's a summary of how we do it (we're in an early prototype / implementation right now - this isn't production quality code - as you can tell from our voluminous javadocs on the methods...) 1) Identify if the core exists, and if not, create it: /** * This method instantiates two SolrServer objects, solr and indexCore. It requires that * indexName be set before calling. */ private void initSolrServer() throws IOException { String baseUrl = http://localhost:8983/solr/;; solr = new CommonsHttpSolrServer(baseUrl); String indexCoreName = indexName + SolrConstants.SUFFIX_INDEX; // SUFIX_INDEX = _INDEX String indexCoreUrl = baseUrl + indexCoreName; // Here we create two cores for the indexName, if they don't already exist - the live core used // for searching and a second core used for indexing. After indexing, the two will be switched so the // just-indexed core will become the live core. The way that core swapping works, the live core will always // be named [indexName] and the indexing core will always be named [indexname]_INDEX, but the // dataDir of each core will alternate between [indexName]_1 and [indexName]_2. createCoreIfNeeded(indexName, indexName + _1, solr); createCoreIfNeeded(indexCoreName, indexName + _2, solr); indexCore = new CommonsHttpSolrServer(indexCoreUrl); } /** * Create a core if it does not already exists. Returns true if a new core was created, false otherwise. */ private boolean createCoreIfNeeded(String coreName, String dataDir, SolrServer server) throws IOException { boolean coreExists = true; try { // SolrJ provides no direct method to check if a core exists, but getStatus will // return an empty list for any core that doesn't. CoreAdminResponse statusResponse = CoreAdminRequest.getStatus(coreName, server); coreExists = statusResponse.getCoreStatus(coreName).size() 0; if(!coreExists) { // Create the core LOG.info(Creating Solr core: + coreName); CoreAdminRequest.Create create = new CoreAdminRequest.Create(); create.setCoreName(coreName); create.setInstanceDir(.); create.setDataDir(dataDir); create.process(server); } } catch (SolrServerException e) { e.printStackTrace(); } return !coreExists; } 2) Do the index, clearing it first if it's a complete rebuild: [snip] if (fullIndex) { try { indexCore.deleteByQuery(*:*); } catch (SolrServerException e) { e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates. } } [snip] various logic, then (we submit batches of 100 : [snip] ListSolrInputDocument docList = b.getSolrInputDocumentList(); UpdateResponse rsp; try { rsp = indexCore.add(docList); rsp = indexCore.commit(); } catch (IOException e) { LOG.warn(Error commiting documents, e); } catch (SolrServerException e) { LOG.warn(Error commiting documents, e); } [snip] 3) optimize, then swap cores: private void optimizeCore() { try { indexCore.optimize(); } catch(SolrServerException e) { LOG.warn(Error while optimizing core, e); } catch(IOException e) { LOG.warn(Error while optimizing core, e); } } private void swapCores() { String liveCore = indexName; String indexCore = indexName + SolrConstants.SUFFIX_INDEX; // SUFFIX_INDEX = _INDEX LOG.info(Swapping Solr cores: + indexCore + , + liveCore); CoreAdminRequest request = new CoreAdminRequest(); request.setAction(CoreAdminAction.SWAP); request.setCoreName(indexCore); request.setOtherCoreName(liveCore); try { request.process(solr); } catch (SolrServerException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); }
Re: Search with accent
Tomas, Let me try to explain better. For example. - I have 10 documents, where 7 have the word pereque (without accent) and 3 have the word perequê (with accent) When I do a search pereque, solr is returning just 7, and when I do a search perequê solr is returning 3. But for me, these words are the same, and when I do some search for perequê or pereque, it should show me 10 results. About the ISOLatin you told, do you know how can I enable it? tks, Claudio On Wed, Nov 10, 2010 at 5:00 PM, Tomas Fernandez Lobbe tomasflo...@yahoo.com.ar wrote: I don't understand, when the user search for perequê you want the results for perequê and pereque? If thats the case, any field type with ISOLatin1AccentFilterFactory should work. The accent should be removed at index time and at query time (Make sure the filter is being applied on both cases). Tomás De: Claudio Devecchi cdevec...@gmail.com Para: Lista Solr solr-user@lucene.apache.org Enviado: miércoles, 10 de noviembre, 2010 15:16:24 Asunto: Search with accent Hi all, Somebody knows how can I config my solr to make searches with and without accents? for example: pereque and perequê When I do it I need the same result, but its not working. tks -- -- Claudio Devecchi flickr.com/cdevecchi
Re: Deploying WAR from trunk, exception
: I built the trunk and deploy the war, but cannot access the admin URL : anymore. : : Error loading class : 'org.apache.solr.highlight.MultiColoredScoreOrderFragmentsBuilder : : This class seems to be missing? You appear to be using an old copy of the example config that refrences a class that was never released. (it was added by SOLR-1268, but replaced with something else in SOLR-2030) -Hoss
Re: Search with accent
have you tried using a TokenFilter which removes accents both at indexing and searching time? If you index terms without accents and search the same way you should be able to find all documents as you require. On 10 November 2010 20:08, Claudio Devecchi cdevec...@gmail.com wrote: Tomas, Let me try to explain better. For example. - I have 10 documents, where 7 have the word pereque (without accent) and 3 have the word perequê (with accent) When I do a search pereque, solr is returning just 7, and when I do a search perequê solr is returning 3. But for me, these words are the same, and when I do some search for perequê or pereque, it should show me 10 results. About the ISOLatin you told, do you know how can I enable it? tks, Claudio On Wed, Nov 10, 2010 at 5:00 PM, Tomas Fernandez Lobbe tomasflo...@yahoo.com.ar wrote: I don't understand, when the user search for perequê you want the results for perequê and pereque? If thats the case, any field type with ISOLatin1AccentFilterFactory should work. The accent should be removed at index time and at query time (Make sure the filter is being applied on both cases). Tomás De: Claudio Devecchi cdevec...@gmail.com Para: Lista Solr solr-user@lucene.apache.org Enviado: miércoles, 10 de noviembre, 2010 15:16:24 Asunto: Search with accent Hi all, Somebody knows how can I config my solr to make searches with and without accents? for example: pereque and perequê When I do it I need the same result, but its not working. tks -- -- Claudio Devecchi flickr.com/cdevecchi
Re: Search with accent
It looks like ISOLatin1AccentFilter is deprecated on Solr 1.4.1, If you are on that version, you should use the ASCIIFoldingFilter instead. Like with any other filter, to use it, you have to add the filter factory to the analysis chain of the field type you are using: filter class=solr.ASCIIFoldingFilterFactory/ Make sure you add it to the query and index analysis chain, otherwise you'll have extrage results. You'll have to perform a full reindex. Tomás De: Claudio Devecchi cdevec...@gmail.com Para: solr-user@lucene.apache.org Enviado: miércoles, 10 de noviembre, 2010 17:08:06 Asunto: Re: Search with accent Tomas, Let me try to explain better. For example. - I have 10 documents, where 7 have the word pereque (without accent) and 3 have the word perequê (with accent) When I do a search pereque, solr is returning just 7, and when I do a search perequê solr is returning 3. But for me, these words are the same, and when I do some search for perequê or pereque, it should show me 10 results. About the ISOLatin you told, do you know how can I enable it? tks, Claudio On Wed, Nov 10, 2010 at 5:00 PM, Tomas Fernandez Lobbe tomasflo...@yahoo.com.ar wrote: I don't understand, when the user search for perequê you want the results for perequê and pereque? If thats the case, any field type with ISOLatin1AccentFilterFactory should work. The accent should be removed at index time and at query time (Make sure the filter is being applied on both cases). Tomás De: Claudio Devecchi cdevec...@gmail.com Para: Lista Solr solr-user@lucene.apache.org Enviado: miércoles, 10 de noviembre, 2010 15:16:24 Asunto: Search with accent Hi all, Somebody knows how can I config my solr to make searches with and without accents? for example: pereque and perequê When I do it I need the same result, but its not working. tks -- -- Claudio Devecchi flickr.com/cdevecchi
Re: Dynamic creating of cores in solr
You could use the actual built-in Solr replication feature to accomplish that same function -- complete re-index to a 'master', and then when finished, trigger replication to the 'slave', with the 'slave' being the live index that actually serves your applications. I am curious if there was any reason you chose to roll your own solution using JSolr and dynamic creation of cores, instead of simply using the replication feature. Were there any downsides of using the replication feature for this purpose that you amerliorated through your solution? Jonathan Bob Sandiford wrote: We also use SolrJ, and have a dynamically created Core capability - where we don't know in advance what the Cores will be that we require. We almost always do a complete index build, and if there's a previous instance of that index, it needs to be available during a complete index build, so we have two cores per index, and switch them as required at the end of an indexing run. Here's a summary of how we do it (we're in an early prototype / implementation right now - this isn't production quality code - as you can tell from our voluminous javadocs on the methods...) 1) Identify if the core exists, and if not, create it: /** * This method instantiates two SolrServer objects, solr and indexCore. It requires that * indexName be set before calling. */ private void initSolrServer() throws IOException { String baseUrl = http://localhost:8983/solr/;; solr = new CommonsHttpSolrServer(baseUrl); String indexCoreName = indexName + SolrConstants.SUFFIX_INDEX; // SUFIX_INDEX = _INDEX String indexCoreUrl = baseUrl + indexCoreName; // Here we create two cores for the indexName, if they don't already exist - the live core used // for searching and a second core used for indexing. After indexing, the two will be switched so the // just-indexed core will become the live core. The way that core swapping works, the live core will always // be named [indexName] and the indexing core will always be named [indexname]_INDEX, but the // dataDir of each core will alternate between [indexName]_1 and [indexName]_2. createCoreIfNeeded(indexName, indexName + _1, solr); createCoreIfNeeded(indexCoreName, indexName + _2, solr); indexCore = new CommonsHttpSolrServer(indexCoreUrl); } /** * Create a core if it does not already exists. Returns true if a new core was created, false otherwise. */ private boolean createCoreIfNeeded(String coreName, String dataDir, SolrServer server) throws IOException { boolean coreExists = true; try { // SolrJ provides no direct method to check if a core exists, but getStatus will // return an empty list for any core that doesn't. CoreAdminResponse statusResponse = CoreAdminRequest.getStatus(coreName, server); coreExists = statusResponse.getCoreStatus(coreName).size() 0; if(!coreExists) { // Create the core LOG.info(Creating Solr core: + coreName); CoreAdminRequest.Create create = new CoreAdminRequest.Create(); create.setCoreName(coreName); create.setInstanceDir(.); create.setDataDir(dataDir); create.process(server); } } catch (SolrServerException e) { e.printStackTrace(); } return !coreExists; } 2) Do the index, clearing it first if it's a complete rebuild: [snip] if (fullIndex) { try { indexCore.deleteByQuery(*:*); } catch (SolrServerException e) { e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates. } } [snip] various logic, then (we submit batches of 100 : [snip] ListSolrInputDocument docList = b.getSolrInputDocumentList(); UpdateResponse rsp; try { rsp = indexCore.add(docList); rsp = indexCore.commit(); } catch (IOException e) { LOG.warn(Error commiting documents, e); } catch (SolrServerException e) { LOG.warn(Error commiting documents, e); } [snip] 3) optimize, then swap cores: private void optimizeCore() { try { indexCore.optimize(); } catch(SolrServerException e) { LOG.warn(Error while optimizing core, e); } catch(IOException e) { LOG.warn(Error while optimizing core, e); } } private void swapCores() { String liveCore = indexName;
Re: Search with accent
have you tried using a TokenFilter which removes accents both at indexing and searching time? If you index terms without accents and search the same way you should be able to find all documents as you require. On 10 November 2010 20:25, Tomas Fernandez Lobbe tomasflo...@yahoo.com.arwrote: It looks like ISOLatin1AccentFilter is deprecated on Solr 1.4.1, If you are on that version, you should use the ASCIIFoldingFilter instead. Like with any other filter, to use it, you have to add the filter factory to the analysis chain of the field type you are using: filter class=solr.ASCIIFoldingFilterFactory/ Make sure you add it to the query and index analysis chain, otherwise you'll have extrage results. You'll have to perform a full reindex. Tomás De: Claudio Devecchi cdevec...@gmail.com Para: solr-user@lucene.apache.org Enviado: miércoles, 10 de noviembre, 2010 17:08:06 Asunto: Re: Search with accent Tomas, Let me try to explain better. For example. - I have 10 documents, where 7 have the word pereque (without accent) and 3 have the word perequê (with accent) When I do a search pereque, solr is returning just 7, and when I do a search perequê solr is returning 3. But for me, these words are the same, and when I do some search for perequê or pereque, it should show me 10 results. About the ISOLatin you told, do you know how can I enable it? tks, Claudio On Wed, Nov 10, 2010 at 5:00 PM, Tomas Fernandez Lobbe tomasflo...@yahoo.com.ar wrote: I don't understand, when the user search for perequê you want the results for perequê and pereque? If thats the case, any field type with ISOLatin1AccentFilterFactory should work. The accent should be removed at index time and at query time (Make sure the filter is being applied on both cases). Tomás De: Claudio Devecchi cdevec...@gmail.com Para: Lista Solr solr-user@lucene.apache.org Enviado: miércoles, 10 de noviembre, 2010 15:16:24 Asunto: Search with accent Hi all, Somebody knows how can I config my solr to make searches with and without accents? for example: pereque and perequê When I do it I need the same result, but its not working. tks -- -- Claudio Devecchi flickr.com/cdevecchi
RE: Dynamic creating of cores in solr
Why not use replication? Call it inexperience... We're really early into working with and fully understanding Solr and the best way to approach various issues. I did mention that this was a prototype and non-production code, so I'm covered, though :) We'll take a look at the replication feature... Thanks! Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Wednesday, November 10, 2010 3:26 PM To: solr-user@lucene.apache.org Subject: Re: Dynamic creating of cores in solr You could use the actual built-in Solr replication feature to accomplish that same function -- complete re-index to a 'master', and then when finished, trigger replication to the 'slave', with the 'slave' being the live index that actually serves your applications. I am curious if there was any reason you chose to roll your own solution using JSolr and dynamic creation of cores, instead of simply using the replication feature. Were there any downsides of using the replication feature for this purpose that you amerliorated through your solution? Jonathan Bob Sandiford wrote: We also use SolrJ, and have a dynamically created Core capability - where we don't know in advance what the Cores will be that we require. We almost always do a complete index build, and if there's a previous instance of that index, it needs to be available during a complete index build, so we have two cores per index, and switch them as required at the end of an indexing run. Here's a summary of how we do it (we're in an early prototype / implementation right now - this isn't production quality code - as you can tell from our voluminous javadocs on the methods...) 1) Identify if the core exists, and if not, create it: /** * This method instantiates two SolrServer objects, solr and indexCore. It requires that * indexName be set before calling. */ private void initSolrServer() throws IOException { String baseUrl = http://localhost:8983/solr/;; solr = new CommonsHttpSolrServer(baseUrl); String indexCoreName = indexName + SolrConstants.SUFFIX_INDEX; // SUFIX_INDEX = _INDEX String indexCoreUrl = baseUrl + indexCoreName; // Here we create two cores for the indexName, if they don't already exist - the live core used // for searching and a second core used for indexing. After indexing, the two will be switched so the // just-indexed core will become the live core. The way that core swapping works, the live core will always // be named [indexName] and the indexing core will always be named [indexname]_INDEX, but the // dataDir of each core will alternate between [indexName]_1 and [indexName]_2. createCoreIfNeeded(indexName, indexName + _1, solr); createCoreIfNeeded(indexCoreName, indexName + _2, solr); indexCore = new CommonsHttpSolrServer(indexCoreUrl); } /** * Create a core if it does not already exists. Returns true if a new core was created, false otherwise. */ private boolean createCoreIfNeeded(String coreName, String dataDir, SolrServer server) throws IOException { boolean coreExists = true; try { // SolrJ provides no direct method to check if a core exists, but getStatus will // return an empty list for any core that doesn't. CoreAdminResponse statusResponse = CoreAdminRequest.getStatus(coreName, server); coreExists = statusResponse.getCoreStatus(coreName).size() 0; if(!coreExists) { // Create the core LOG.info(Creating Solr core: + coreName); CoreAdminRequest.Create create = new CoreAdminRequest.Create(); create.setCoreName(coreName); create.setInstanceDir(.); create.setDataDir(dataDir); create.process(server); } } catch (SolrServerException e) { e.printStackTrace(); } return !coreExists; } 2) Do the index, clearing it first if it's a complete rebuild: [snip] if (fullIndex) { try { indexCore.deleteByQuery(*:*); } catch (SolrServerException e) { e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates. } } [snip] various logic, then (we submit batches of 100 : [snip] ListSolrInputDocument docList = b.getSolrInputDocumentList(); UpdateResponse rsp; try {
Re: Search with accent
Hi Tomas, Do you have some example to put in schema.xml? How can I use thes filter class? Tks On Wed, Nov 10, 2010 at 6:25 PM, Tomas Fernandez Lobbe tomasflo...@yahoo.com.ar wrote: It looks like ISOLatin1AccentFilter is deprecated on Solr 1.4.1, If you are on that version, you should use the ASCIIFoldingFilter instead. Like with any other filter, to use it, you have to add the filter factory to the analysis chain of the field type you are using: filter class=solr.ASCIIFoldingFilterFactory/ Make sure you add it to the query and index analysis chain, otherwise you'll have extrage results. You'll have to perform a full reindex. Tomás De: Claudio Devecchi cdevec...@gmail.com Para: solr-user@lucene.apache.org Enviado: miércoles, 10 de noviembre, 2010 17:08:06 Asunto: Re: Search with accent Tomas, Let me try to explain better. For example. - I have 10 documents, where 7 have the word pereque (without accent) and 3 have the word perequê (with accent) When I do a search pereque, solr is returning just 7, and when I do a search perequê solr is returning 3. But for me, these words are the same, and when I do some search for perequê or pereque, it should show me 10 results. About the ISOLatin you told, do you know how can I enable it? tks, Claudio On Wed, Nov 10, 2010 at 5:00 PM, Tomas Fernandez Lobbe tomasflo...@yahoo.com.ar wrote: I don't understand, when the user search for perequê you want the results for perequê and pereque? If thats the case, any field type with ISOLatin1AccentFilterFactory should work. The accent should be removed at index time and at query time (Make sure the filter is being applied on both cases). Tomás De: Claudio Devecchi cdevec...@gmail.com Para: Lista Solr solr-user@lucene.apache.org Enviado: miércoles, 10 de noviembre, 2010 15:16:24 Asunto: Search with accent Hi all, Somebody knows how can I config my solr to make searches with and without accents? for example: pereque and perequê When I do it I need the same result, but its not working. tks -- -- Claudio Devecchi flickr.com/cdevecchi -- Claudio Devecchi flickr.com/cdevecchi
Re: Search with accent
That's what the ASCIIFoldingFilter does, it removes the accents, that's why you have to add it to the query analisis chain and to the index analysis chain, to search the same way you index. You can see how it works from the Analysis page on Solr Admin. De: Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com Para: solr-user@lucene.apache.org Enviado: miércoles, 10 de noviembre, 2010 17:27:24 Asunto: Re: Search with accent have you tried using a TokenFilter which removes accents both at indexing and searching time? If you index terms without accents and search the same way you should be able to find all documents as you require. On 10 November 2010 20:25, Tomas Fernandez Lobbe tomasflo...@yahoo.com.arwrote: It looks like ISOLatin1AccentFilter is deprecated on Solr 1.4.1, If you are on that version, you should use the ASCIIFoldingFilter instead. Like with any other filter, to use it, you have to add the filter factory to the analysis chain of the field type you are using: filter class=solr.ASCIIFoldingFilterFactory/ Make sure you add it to the query and index analysis chain, otherwise you'll have extrage results. You'll have to perform a full reindex. Tomás De: Claudio Devecchi cdevec...@gmail.com Para: solr-user@lucene.apache.org Enviado: miércoles, 10 de noviembre, 2010 17:08:06 Asunto: Re: Search with accent Tomas, Let me try to explain better. For example. - I have 10 documents, where 7 have the word pereque (without accent) and 3 have the word perequê (with accent) When I do a search pereque, solr is returning just 7, and when I do a search perequê solr is returning 3. But for me, these words are the same, and when I do some search for perequê or pereque, it should show me 10 results. About the ISOLatin you told, do you know how can I enable it? tks, Claudio On Wed, Nov 10, 2010 at 5:00 PM, Tomas Fernandez Lobbe tomasflo...@yahoo.com.ar wrote: I don't understand, when the user search for perequê you want the results for perequê and pereque? If thats the case, any field type with ISOLatin1AccentFilterFactory should work. The accent should be removed at index time and at query time (Make sure the filter is being applied on both cases). Tomás De: Claudio Devecchi cdevec...@gmail.com Para: Lista Solr solr-user@lucene.apache.org Enviado: miércoles, 10 de noviembre, 2010 15:16:24 Asunto: Search with accent Hi all, Somebody knows how can I config my solr to make searches with and without accents? for example: pereque and perequê When I do it I need the same result, but its not working. tks -- -- Claudio Devecchi flickr.com/cdevecchi
Re: Search with accent
Ok tks, I'm new with solr, my doubt is how can I enable theses feature. Or these feature is already working by default? Is this something to config on my schema.xml? Tks!! On Wed, Nov 10, 2010 at 6:40 PM, Tomas Fernandez Lobbe tomasflo...@yahoo.com.ar wrote: That's what the ASCIIFoldingFilter does, it removes the accents, that's why you have to add it to the query analisis chain and to the index analysis chain, to search the same way you index. You can see how it works from the Analysis page on Solr Admin. De: Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com Para: solr-user@lucene.apache.org Enviado: miércoles, 10 de noviembre, 2010 17:27:24 Asunto: Re: Search with accent have you tried using a TokenFilter which removes accents both at indexing and searching time? If you index terms without accents and search the same way you should be able to find all documents as you require. On 10 November 2010 20:25, Tomas Fernandez Lobbe tomasflo...@yahoo.com.arwrote: It looks like ISOLatin1AccentFilter is deprecated on Solr 1.4.1, If you are on that version, you should use the ASCIIFoldingFilter instead. Like with any other filter, to use it, you have to add the filter factory to the analysis chain of the field type you are using: filter class=solr.ASCIIFoldingFilterFactory/ Make sure you add it to the query and index analysis chain, otherwise you'll have extrage results. You'll have to perform a full reindex. Tomás De: Claudio Devecchi cdevec...@gmail.com Para: solr-user@lucene.apache.org Enviado: miércoles, 10 de noviembre, 2010 17:08:06 Asunto: Re: Search with accent Tomas, Let me try to explain better. For example. - I have 10 documents, where 7 have the word pereque (without accent) and 3 have the word perequê (with accent) When I do a search pereque, solr is returning just 7, and when I do a search perequê solr is returning 3. But for me, these words are the same, and when I do some search for perequê or pereque, it should show me 10 results. About the ISOLatin you told, do you know how can I enable it? tks, Claudio On Wed, Nov 10, 2010 at 5:00 PM, Tomas Fernandez Lobbe tomasflo...@yahoo.com.ar wrote: I don't understand, when the user search for perequê you want the results for perequê and pereque? If thats the case, any field type with ISOLatin1AccentFilterFactory should work. The accent should be removed at index time and at query time (Make sure the filter is being applied on both cases). Tomás De: Claudio Devecchi cdevec...@gmail.com Para: Lista Solr solr-user@lucene.apache.org Enviado: miércoles, 10 de noviembre, 2010 15:16:24 Asunto: Search with accent Hi all, Somebody knows how can I config my solr to make searches with and without accents? for example: pereque and perequê When I do it I need the same result, but its not working. tks -- -- Claudio Devecchi flickr.com/cdevecchi -- Claudio Devecchi flickr.com/cdevecchi
Re: Search with accent
You have to modify the field type you are using in your schema.xml file. This is the text field type of Solr 1.4.1 exmple with this filter added: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer /fieldType De: Claudio Devecchi cdevec...@gmail.com Para: solr-user@lucene.apache.org Enviado: miércoles, 10 de noviembre, 2010 17:44:01 Asunto: Re: Search with accent Ok tks, I'm new with solr, my doubt is how can I enable theses feature. Or these feature is already working by default? Is this something to config on my schema.xml? Tks!! On Wed, Nov 10, 2010 at 6:40 PM, Tomas Fernandez Lobbe tomasflo...@yahoo.com.ar wrote: That's what the ASCIIFoldingFilter does, it removes the accents, that's why you have to add it to the query analisis chain and to the index analysis chain, to search the same way you index. You can see how it works from the Analysis page on Solr Admin. De: Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com Para: solr-user@lucene.apache.org Enviado: miércoles, 10 de noviembre, 2010 17:27:24 Asunto: Re: Search with accent have you tried using a TokenFilter which removes accents both at indexing and searching time? If you index terms without accents and search the same way you should be able to find all documents as you require. On 10 November 2010 20:25, Tomas Fernandez Lobbe tomasflo...@yahoo.com.arwrote: It looks like ISOLatin1AccentFilter is deprecated on Solr 1.4.1, If you are on that version, you should use the ASCIIFoldingFilter instead. Like with any other filter, to use it, you have to add the filter factory to the analysis chain of the field type you are using: filter class=solr.ASCIIFoldingFilterFactory/ Make sure you add it to the query and index analysis chain, otherwise you'll have extrage results. You'll have to perform a full reindex. Tomás De: Claudio Devecchi cdevec...@gmail.com Para: solr-user@lucene.apache.org Enviado: miércoles, 10 de noviembre, 2010 17:08:06 Asunto: Re: Search with accent Tomas, Let me try to explain better. For example. - I have 10 documents, where 7 have the word pereque (without accent) and 3 have the word perequê (with accent) When I do a search pereque, solr is returning just 7, and when I do a search perequê solr is returning 3. But for me, these words are the same, and when I do some search for perequê or pereque, it should show me 10 results. About the ISOLatin you told, do you know how can I enable it? tks, Claudio On Wed, Nov 10, 2010 at 5:00 PM, Tomas Fernandez Lobbe tomasflo...@yahoo.com.ar wrote: I don't understand, when the user search for perequê you want the results for perequê and pereque? If thats the case, any field type with ISOLatin1AccentFilterFactory should work. The accent should be removed at index time and at query time (Make sure the filter is being applied on both cases). Tomás De: Claudio Devecchi cdevec...@gmail.com Para: Lista Solr solr-user@lucene.apache.org Enviado: miércoles, 10 de noviembre, 2010 15:16:24 Asunto: Search with accent Hi all, Somebody knows how can I config my solr to make searches with and without accents? for example: pereque and perequê When I do it I need the same result, but its not working. tks -- --
Re: Best practice for emailing this list?
In the header is a line saying what rules your message matched. That'll let you know what about your message was causing your mails to be rejected. Upayavira On Wed, 10 Nov 2010 11:42 -0800, robo - robom...@gmail.com wrote: Thanks for all your help Ezequiel. I cannot see anything in my email that would make this get marked as spam. Anybody have any ideas on how to get this fixed so I can email my questions? robo On Wed, Nov 10, 2010 at 11:36 AM, Ezequiel Calderara ezech...@gmail.com wrote: Tried to forward the mail of robomon but had the same error: Delivery to the following recipient failed permanently: solr-u...@lucene.apache.org Technical details of permanent failure: Google tried to deliver your message, but it was rejected by the recipient domain. We recommend contacting the other email provider for further information about the cause of this error. The error that the other server returned was: 552 552 spam score (5.8) exceeded threshold (state 18). - Original message - On Wed, Nov 10, 2010 at 4:12 PM, Ezequiel Calderara ezech...@gmail.comwrote: Mmmm maybe its your mail address? :P Weird, i didn't have any problem with it using gmail... Send in plain text, avoid links or links... maybe that could work... If you want, send me the mail and i will forward it to the list, just to test! On Wed, Nov 10, 2010 at 3:59 PM, robo - robom...@gmail.com wrote: No matter how much I limit my other email it will not get through the Solr mailing spam filter. This has to be the most frustrating mailing list I have ever tried to work with. All I need are some answers on replication and load balancing but I can't even get it to the list. On Wed, Nov 10, 2010 at 10:17 AM, Ken Stanley doh...@gmail.com wrote: On Wed, Nov 10, 2010 at 1:11 PM, robo - robom...@gmail.com wrote: How do people email this list without getting spam filter problems? Depends on which side of the spam filter that you're referring to. I've found that to keep these emails from entering my spam filter is to add a rule to Gmail that says Never send to spam. As for when I send emails, I make sure that I send my emails as plain text to avoid getting bounce backs. - Ken -- __ Ezequiel. Http://www.ironicnet.com http://www.ironicnet.com/ -- __ Ezequiel. Http://www.ironicnet.com
Re: Search with accent
thx so much tomas, I'll test now. On Wed, Nov 10, 2010 at 6:47 PM, Tomas Fernandez Lobbe tomasflo...@yahoo.com.ar wrote: You have to modify the field type you are using in your schema.xml file. This is the text field type of Solr 1.4.1 exmple with this filter added: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer /fieldType De: Claudio Devecchi cdevec...@gmail.com Para: solr-user@lucene.apache.org Enviado: miércoles, 10 de noviembre, 2010 17:44:01 Asunto: Re: Search with accent Ok tks, I'm new with solr, my doubt is how can I enable theses feature. Or these feature is already working by default? Is this something to config on my schema.xml? Tks!! On Wed, Nov 10, 2010 at 6:40 PM, Tomas Fernandez Lobbe tomasflo...@yahoo.com.ar wrote: That's what the ASCIIFoldingFilter does, it removes the accents, that's why you have to add it to the query analisis chain and to the index analysis chain, to search the same way you index. You can see how it works from the Analysis page on Solr Admin. De: Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com Para: solr-user@lucene.apache.org Enviado: miércoles, 10 de noviembre, 2010 17:27:24 Asunto: Re: Search with accent have you tried using a TokenFilter which removes accents both at indexing and searching time? If you index terms without accents and search the same way you should be able to find all documents as you require. On 10 November 2010 20:25, Tomas Fernandez Lobbe tomasflo...@yahoo.com.arwrote: It looks like ISOLatin1AccentFilter is deprecated on Solr 1.4.1, If you are on that version, you should use the ASCIIFoldingFilter instead. Like with any other filter, to use it, you have to add the filter factory to the analysis chain of the field type you are using: filter class=solr.ASCIIFoldingFilterFactory/ Make sure you add it to the query and index analysis chain, otherwise you'll have extrage results. You'll have to perform a full reindex. Tomás De: Claudio Devecchi cdevec...@gmail.com Para: solr-user@lucene.apache.org Enviado: miércoles, 10 de noviembre, 2010 17:08:06 Asunto: Re: Search with accent Tomas, Let me try to explain better. For example. - I have 10 documents, where 7 have the word pereque (without accent) and 3 have the word perequê (with accent) When I do a search pereque, solr is returning just 7, and when I do a search perequê solr is returning 3. But for me, these words are the same, and when I do some search for perequê or pereque, it should show me 10 results. About the ISOLatin you told, do you know how can I enable it? tks, Claudio On Wed, Nov 10, 2010 at 5:00 PM, Tomas Fernandez Lobbe tomasflo...@yahoo.com.ar wrote: I don't understand, when the user search for perequê you want the results for perequê and pereque? If thats the case, any field type with ISOLatin1AccentFilterFactory should work. The accent should be removed at index time and at query time (Make sure the filter is being applied on both cases). Tomás De: Claudio Devecchi cdevec...@gmail.com Para: Lista Solr solr-user@lucene.apache.org Enviado: miércoles, 10 de noviembre, 2010 15:16:24 Asunto: Search with accent
RE: Adding new field after data is already indexed
1) Just put the new field in the schema and stop/start solr. Documents in the index will not have the field until you reindex them but it won't hurt anything. 2) Just turn off their handlers in solrconfig is all I think that takes. -Original Message- From: gauravshetti [mailto:gaurav.she...@tcs.com] Sent: Monday, November 08, 2010 5:21 AM To: solr-user@lucene.apache.org Subject: Adding new field after data is already indexed Hi, I had a few questions regarding Solr. Say my schema file looks like field name=folder_id type=long indexed=true stored=true/ field name=indexed type=boolean indexed=true stored=true/ and i index data on the basis of these fields. Now, incase i need to add a new field, is there a way i can add the field without corrupting the previous data. Is there any feature which adds a new field with a default value to the existing records. 2) Is there any security mechanism/authorization check to prevent url like /admin and /update to only a few users. -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-new-field-after-data-is-alread y-indexed-tp1862575p1862575.html Sent from the Solr - User mailing list archive at Nabble.com.
Facet showing MORE results than expected when its selected?
A facet shows the amount of results that match with that facet, e.g. New York (433) So when the facet is clicked, you'd expect that amount of results (433). However, I have a facet Hotel en Restaurant (321), that, when clicked shows 370 results! :s 1st query: http://localhost:8983/solr/db/select/?indent=onfacet=trueq=*:*start=0rows=25fl=id,title,themesfacet.field=themes_rawfacet.mincount=1 This is (part) of the resultset of my first query lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=themes_raw int name=Hotel en Restaurant321/int /lst /lst lst name=facet_dates/ lst name=facet_ranges/ /lst Now when I click the facet Hotel en Restaurant, it fires my second query: http://localhost:8983/solr/db/select/?indent=onfacet=truefq=themes:Hotel en Restaurantq=*:*start=0rows=25fl=id,title,themesfacet.field=themes_rawfacet.mincount=1 I would expect 321, however I get 370! schema.xml field name=themes type=text indexed=true stored=true multiValued=true / field name=themes_raw type=string indexed=true stored=true multiValued=true/ copyField source=themes dest=themes_raw/ -- View this message in context: http://lucene.472066.n3.nabble.com/Facet-showing-MORE-results-than-expected-when-its-selected-tp1878828p1878828.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Facet showing MORE results than expected when its selected?
Shouldn't the second query have the clause: fq=themes_raw:Hotel en Restaurant instead of: fq=themes:Hotel en Restaurant Otherwise you're mixing apples (themes_raw) and oranges (themes). (Notice how I cleverly extended the restaurant theme to be food related :)) Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: PeterKerk [mailto:vettepa...@hotmail.com] Sent: Wednesday, November 10, 2010 4:34 PM To: solr-user@lucene.apache.org Subject: Facet showing MORE results than expected when its selected? A facet shows the amount of results that match with that facet, e.g. New York (433) So when the facet is clicked, you'd expect that amount of results (433). However, I have a facet Hotel en Restaurant (321), that, when clicked shows 370 results! :s 1st query: http://localhost:8983/solr/db/select/?indent=onfacet=trueq=*:*start= 0rows=25fl=id,title,themesfacet.field=themes_rawfacet.mincount=1 This is (part) of the resultset of my first query lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=themes_raw int name=Hotel en Restaurant321/int /lst /lst lst name=facet_dates/ lst name=facet_ranges/ /lst Now when I click the facet Hotel en Restaurant, it fires my second query: http://localhost:8983/solr/db/select/?indent=onfacet=truefq=themes:Ho tel en Restaurantq=*:*start=0rows=25fl=id,title,themesfacet.field=themes_ rawfacet.mincount=1 I would expect 321, however I get 370! schema.xml field name=themes type=text indexed=true stored=true multiValued=true / field name=themes_raw type=string indexed=true stored=true multiValued=true/ copyField source=themes dest=themes_raw/ -- View this message in context: http://lucene.472066.n3.nabble.com/Facet- showing-MORE-results-than-expected-when-its-selected- tp1878828p1878828.html Sent from the Solr - User mailing list archive at Nabble.com.
Crawling with nutch and mapping fields to solr
Hi I'm fairly new to solr but I have it configured, along with nutch, as per this tutorial http://ubuntuforums.org/showthread.php?p=9596257. Nutch is crawling and injecting documents into solr as expected, however, I want to break the data down further so what ends up in solr is a bit more granular. Can anyone explain in simple terms how I might go about parsing the data I get from nutch and mapping it to custom fields? Ideally I'd like to be able to pull out meta-data from the source HTML and map it to specific fields in solr. I hope I'm in the right place to ask this question. Any help would be much appreciated. Jean-Luc
RE: Facet showing MORE results than expected when its selected?
LOL, very clever indeed ;) The thing is: when I select the amount of records matching the theme 'Hotel en Restaurant' in my db, I end up with 321 records. So that is correct. I dont know where the 370 is coming from. Now when I change the query to this: fq=themes_raw:Hotel en Restaurant I end up with 110 records...(another number even :s) What I did notice, is that this only happens on multi-word facets Hotel en Restaurant being a 3 word facet. The facets work correct on a facet named Cafe, so I suspect it has something to do with the tokenization. As you can see, I'm using text and string. For compleness Im posting definition of those in my schema.xml as well: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_dutch.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_dutch.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true / -- View this message in context: http://lucene.472066.n3.nabble.com/Facet-showing-MORE-results-than-expected-when-its-selected-tp1878828p1879163.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facet showing MORE results than expected when its selected?
I've had that sort of thing happen from 'corrupting' my index, by changing my schema.xml without re-indexing. If you change field types or other things in schema.xml, you need to reindex all your data. (You can add brand new fields or types without having to re-index, but most other changes will require a re-index). Could that be it? PeterKerk wrote: LOL, very clever indeed ;) The thing is: when I select the amount of records matching the theme 'Hotel en Restaurant' in my db, I end up with 321 records. So that is correct. I dont know where the 370 is coming from. Now when I change the query to this: fq=themes_raw:Hotel en Restaurant I end up with 110 records...(another number even :s) What I did notice, is that this only happens on multi-word facets Hotel en Restaurant being a 3 word facet. The facets work correct on a facet named Cafe, so I suspect it has something to do with the tokenization. As you can see, I'm using text and string. For compleness Im posting definition of those in my schema.xml as well: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_dutch.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_dutch.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true /
Re: Facet showing MORE results than expected when its selected?
Nope, I restarted my server to reload schema.xml, and did a reindex, as I've done a thousand times before, but still the same behaviour :( -- View this message in context: http://lucene.472066.n3.nabble.com/Facet-showing-MORE-results-than-expected-when-its-selected-tp1878828p1879218.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facet showing MORE results than expected when its selected?
Another option : assuming themes_raw is type 'string' (couldn't get that nugget of info for 100%) it could be that you're seeing a difference in nr of results between the 110 for fq:themes_raw and 321 from your db, because fieldtype:string (thus themes_raw) is case-sensitive while (depending on your db-setup) querying your db is case-insensitive, which could explain the larger nr of hits for your db as well. Cheers, Geert-Jan 2010/11/10 Jonathan Rochkind rochk...@jhu.edu I've had that sort of thing happen from 'corrupting' my index, by changing my schema.xml without re-indexing. If you change field types or other things in schema.xml, you need to reindex all your data. (You can add brand new fields or types without having to re-index, but most other changes will require a re-index). Could that be it? PeterKerk wrote: LOL, very clever indeed ;) The thing is: when I select the amount of records matching the theme 'Hotel en Restaurant' in my db, I end up with 321 records. So that is correct. I dont know where the 370 is coming from. Now when I change the query to this: fq=themes_raw:Hotel en Restaurant I end up with 110 records...(another number even :s) What I did notice, is that this only happens on multi-word facets Hotel en Restaurant being a 3 word facet. The facets work correct on a facet named Cafe, so I suspect it has something to do with the tokenization. As you can see, I'm using text and string. For compleness Im posting definition of those in my schema.xml as well: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_dutch.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_dutch.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true /
Re: Facet showing MORE results than expected when its selected?
Nope, thats not possible either since the themename is stored in the database in table [themes] only once, and other locations refer to it using the link table [location_themes] simple DB scheme using a link table: [themes] id name [location_themes] locationid themeid [locations] id name etc etc PS. I posted definition of fields and tokenizers above if you want to have a look at it :) -- View this message in context: http://lucene.472066.n3.nabble.com/Facet-showing-MORE-results-than-expected-when-its-selected-tp1878828p1879292.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facet showing MORE results than expected when its selected?
I was playing around with the example Solr app, and I get different results when I specify something like fq=manu_exact:ASUS Computer Inc. and fq=manu_exact:ASUS Computer Inc. The latter gives many more matches, which looks kinda familiar. Silly season stuff, I know, but thought I'd mention it... Best Erick On Wed, Nov 10, 2010 at 5:32 PM, Geert-Jan Brits gbr...@gmail.com wrote: Another option : assuming themes_raw is type 'string' (couldn't get that nugget of info for 100%) it could be that you're seeing a difference in nr of results between the 110 for fq:themes_raw and 321 from your db, because fieldtype:string (thus themes_raw) is case-sensitive while (depending on your db-setup) querying your db is case-insensitive, which could explain the larger nr of hits for your db as well. Cheers, Geert-Jan 2010/11/10 Jonathan Rochkind rochk...@jhu.edu I've had that sort of thing happen from 'corrupting' my index, by changing my schema.xml without re-indexing. If you change field types or other things in schema.xml, you need to reindex all your data. (You can add brand new fields or types without having to re-index, but most other changes will require a re-index). Could that be it? PeterKerk wrote: LOL, very clever indeed ;) The thing is: when I select the amount of records matching the theme 'Hotel en Restaurant' in my db, I end up with 321 records. So that is correct. I dont know where the 370 is coming from. Now when I change the query to this: fq=themes_raw:Hotel en Restaurant I end up with 110 records...(another number even :s) What I did notice, is that this only happens on multi-word facets Hotel en Restaurant being a 3 word facet. The facets work correct on a facet named Cafe, so I suspect it has something to do with the tokenization. As you can see, I'm using text and string. For compleness Im posting definition of those in my schema.xml as well: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_dutch.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_dutch.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true /
Re: Facet showing MORE results than expected when its selected?
O wow, the quotes did the trick...thanks! :) -- View this message in context: http://lucene.472066.n3.nabble.com/Facet-showing-MORE-results-than-expected-when-its-selected-tp1878828p1879335.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facet showing MORE results than expected when its selected?
Good call. Alternately, for facet limiting, you may find it simpler, easier (and very very slightly more efficient for Solr) to use either the raw or field query parsers, that don't do the pre-tokenization that the standard query parser does, which is what is making the quotes required. fq={!field f=solr_field}My Multi-Word Value fq={!raw f=solr_field}Multi-Word Value (Still URL-escape though, not shown above for clarity). These ways you also won't have to worry about if one of your values accidentally includes a literal double quote, or something like that. For a non-tokenized String field like we're talking about here, !field and !raw, I think, will be effectively identical. http://wiki.apache.org/solr/SolrQuerySyntax#Other_built-in_useful_query_parsers Jonathan PeterKerk wrote: O wow, the quotes did the trick...thanks! :)
Re: Solr optimize operation slows my MySQL serveur
Thank you for your answers. Isn't it possible to tune Solr to use less disc bandwidth (involving a longer optimization) ? I moved Solr on the unused HDD, and the problem is solved ! Fortunately I have this separate disk... -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-optimize-operation-slows-my-MySQL-serveur-tp1877270p1879527.html Sent from the Solr - User mailing list archive at Nabble.com.
Concatenate multiple tokens into one
Hi, i've created the following filterchain in a field type, the idea is to use it for autocompletion purposes: tokenizer class=solr.WhitespaceTokenizerFactory/ !-- create tokens separated by whitespace -- filter class=solr.LowerCaseFilterFactory/ !-- lowercase everything -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / !-- throw out stopwords -- filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) replacement= replace=all / !-- throw out all everything except a-z -- !-- actually, here i would like to join multiple tokens together again, to provide one token for the EdgeNGramFilterFactory -- filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 / !-- create edgeNGram tokens for autocomplete matches -- With that kind of filterchain, the EdgeNGramFilterFactory will receive multiple tokens on input strings with whitespaces in it. This leads to the following results: Input Query: George Cloo Matches: - George Harrison - John Clooridge - George Smith -George Clooney - etc However, only George Clooney should match in the autocompletion use case. Therefore, i'd like to add a filter before the EdgeNGramFilterFactory, which concatenates all the tokens generated by the WhitespaceTokenizerFactory. Are there filters which can do such a thing? If not, are there examples how to implement a custom TokenFilter? thanks! -robert
RE: Concatenate multiple tokens into one
Are you sure you really want to throw out stopwords for your use case? I don't think autocompletion will work how you want if you do. And if you don't... then why use the WhitespaceTokenizer and then try to jam the tokens back together? Why not just NOT tokenize in the first place. Use the KeywordTokenizer, which really should be called the NonTokenizingTokenizer, becaues it doesn't tokenize at all, it just creates one token from the entire input string. Then lowercase, remove whitespace (or not), do whatever else you want to do to your single token to normalize it, and then edgengram it. If you include whitespace in the token, then when making your queries for auto-complete, be sure to use a query parser that doesn't do pre-tokenization, the 'field' query parser should work well for this. Jonathan From: Robert Gründler [rob...@dubture.com] Sent: Wednesday, November 10, 2010 6:39 PM To: solr-user@lucene.apache.org Subject: Concatenate multiple tokens into one Hi, i've created the following filterchain in a field type, the idea is to use it for autocompletion purposes: tokenizer class=solr.WhitespaceTokenizerFactory/ !-- create tokens separated by whitespace -- filter class=solr.LowerCaseFilterFactory/ !-- lowercase everything -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / !-- throw out stopwords -- filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) replacement= replace=all / !-- throw out all everything except a-z -- !-- actually, here i would like to join multiple tokens together again, to provide one token for the EdgeNGramFilterFactory -- filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 / !-- create edgeNGram tokens for autocomplete matches -- With that kind of filterchain, the EdgeNGramFilterFactory will receive multiple tokens on input strings with whitespaces in it. This leads to the following results: Input Query: George Cloo Matches: - George Harrison - John Clooridge - George Smith -George Clooney - etc However, only George Clooney should match in the autocompletion use case. Therefore, i'd like to add a filter before the EdgeNGramFilterFactory, which concatenates all the tokens generated by the WhitespaceTokenizerFactory. Are there filters which can do such a thing? If not, are there examples how to implement a custom TokenFilter? thanks! -robert
Re: Concatenate multiple tokens into one
On Nov 11, 2010, at 1:12 AM, Jonathan Rochkind wrote: Are you sure you really want to throw out stopwords for your use case? I don't think autocompletion will work how you want if you do. in our case i think it makes sense. the content is targetting the electronic music / dj scene, so we have a lot of words like DJ or featuring which make sense to throw out of the query. Also searches for the beastie boys and beastie boys should return a match in the autocompletion. And if you don't... then why use the WhitespaceTokenizer and then try to jam the tokens back together? Why not just NOT tokenize in the first place. Use the KeywordTokenizer, which really should be called the NonTokenizingTokenizer, becaues it doesn't tokenize at all, it just creates one token from the entire input string. I started out with the KeywordTokenizer, which worked well, except the StopWord problem. For now, i've come up with a quick-and-dirty custom ConcatFilter, which does what i'm after: public class ConcatFilter extends TokenFilter { private TokenStream tstream; protected ConcatFilter(TokenStream input) { super(input); this.tstream = input; } @Override public Token next() throws IOException { Token token = new Token(); StringBuilder builder = new StringBuilder(); TermAttribute termAttribute = (TermAttribute) tstream.getAttribute(TermAttribute.class); TypeAttribute typeAttribute = (TypeAttribute) tstream.getAttribute(TypeAttribute.class); boolean incremented = false; while (tstream.incrementToken()) { if (typeAttribute.type().equals(word)) { builder.append(termAttribute.term()); } incremented = true; } token.setTermBuffer(builder.toString()); if (incremented == true) return token; return null; } } I'm not sure if this is a safe way to do this, as i'm not familar with the whole solr/lucene implementation after all. best -robert Then lowercase, remove whitespace (or not), do whatever else you want to do to your single token to normalize it, and then edgengram it. If you include whitespace in the token, then when making your queries for auto-complete, be sure to use a query parser that doesn't do pre-tokenization, the 'field' query parser should work well for this. Jonathan From: Robert Gründler [rob...@dubture.com] Sent: Wednesday, November 10, 2010 6:39 PM To: solr-user@lucene.apache.org Subject: Concatenate multiple tokens into one Hi, i've created the following filterchain in a field type, the idea is to use it for autocompletion purposes: tokenizer class=solr.WhitespaceTokenizerFactory/ !-- create tokens separated by whitespace -- filter class=solr.LowerCaseFilterFactory/ !-- lowercase everything -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / !-- throw out stopwords -- filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) replacement= replace=all / !-- throw out all everything except a-z -- !-- actually, here i would like to join multiple tokens together again, to provide one token for the EdgeNGramFilterFactory -- filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 / !-- create edgeNGram tokens for autocomplete matches -- With that kind of filterchain, the EdgeNGramFilterFactory will receive multiple tokens on input strings with whitespaces in it. This leads to the following results: Input Query: George Cloo Matches: - George Harrison - John Clooridge - George Smith -George Clooney - etc However, only George Clooney should match in the autocompletion use case. Therefore, i'd like to add a filter before the EdgeNGramFilterFactory, which concatenates all the tokens generated by the WhitespaceTokenizerFactory. Are there filters which can do such a thing? If not, are there examples how to implement a custom TokenFilter? thanks! -robert
Replication with slaves and load balancing questions
(forwarded on behalf of robo ... trying to figured out odd spam blocking issue he's having) -- Forwarded message -- Good day, We are new to Solr and trying to setup a HA configuration in the cloud. We have setup a Solr Master server which does all the indexing. We have 2 Solr Slaves that use the built-in Replication Handler (not the scripts). We are thinking of load balancing the 2 Solr Slaves but have a question about writes to the slaves. I believe the only operation that might try to write to the indexes are the deletes from the web servers. Will the Slaves pass the delete query onto the Master to handle or will it try to do the delete on the Slave? If the delete is done on the Slave will the master and other Slave get updated through the replication? Thanks, robo
Re: Replication with slaves and load balancing questions
On Wed, Nov 10, 2010 at 8:29 PM, Chris Hostetter hossman_luc...@fucit.org wrote: (forwarded on behalf of robo ... trying to figured out odd spam blocking issue he's having) -- Forwarded message -- Good day, We are new to Solr and trying to setup a HA configuration in the cloud. We have setup a Solr Master server which does all the indexing. We have 2 Solr Slaves that use the built-in Replication Handler (not the scripts). We are thinking of load balancing the 2 Solr Slaves but have a question about writes to the slaves. I believe the only operation that might try to write to the indexes are the deletes from the web servers. Will the Slaves pass the delete query onto the Master to handle or will it try to do the delete on the Slave? No, those deletes will only be done on the slave on which the delete request is made but it gets worse. If the delete is done on the Slave will the master and other Slave get updated through the replication? No. In fact, if the slave's index is changed, it will be deemed as out-of-sync from the master and full index replication from the master will happen. The best way is to delete from master and commit if you deletes are very infrequent. If deletes are very frequent then you must batch them before committing. If you cannot show the deleted data to the user for subsequent searches, you can try to use filter queries to filter out such documents from searches made by the user who deleted the document. -- Regards, Shalin Shekhar Mangar.
Re: Dynamic creating of cores in solr
On Nov 10, 2010, at 12:30pm, Bob Sandiford wrote: Why not use replication? Call it inexperience... We're really early into working with and fully understanding Solr and the best way to approach various issues. I did mention that this was a prototype and non-production code, so I'm covered, though :) We'll take a look at the replication feature... Replication doesn't replicate the top-level solr.xml file that defines available cores, so if dynamic cores is a requirement then your custom code isn't wasted :) -- Ken -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Wednesday, November 10, 2010 3:26 PM To: solr-user@lucene.apache.org Subject: Re: Dynamic creating of cores in solr You could use the actual built-in Solr replication feature to accomplish that same function -- complete re-index to a 'master', and then when finished, trigger replication to the 'slave', with the 'slave' being the live index that actually serves your applications. I am curious if there was any reason you chose to roll your own solution using JSolr and dynamic creation of cores, instead of simply using the replication feature. Were there any downsides of using the replication feature for this purpose that you amerliorated through your solution? Jonathan Bob Sandiford wrote: We also use SolrJ, and have a dynamically created Core capability - where we don't know in advance what the Cores will be that we require. We almost always do a complete index build, and if there's a previous instance of that index, it needs to be available during a complete index build, so we have two cores per index, and switch them as required at the end of an indexing run. Here's a summary of how we do it (we're in an early prototype / implementation right now - this isn't production quality code - as you can tell from our voluminous javadocs on the methods...) 1) Identify if the core exists, and if not, create it: /** * This method instantiates two SolrServer objects, solr and indexCore. It requires that * indexName be set before calling. */ private void initSolrServer() throws IOException { String baseUrl = http://localhost:8983/solr/;; solr = new CommonsHttpSolrServer(baseUrl); String indexCoreName = indexName + SolrConstants.SUFFIX_INDEX; // SUFIX_INDEX = _INDEX String indexCoreUrl = baseUrl + indexCoreName; // Here we create two cores for the indexName, if they don't already exist - the live core used // for searching and a second core used for indexing. After indexing, the two will be switched so the // just-indexed core will become the live core. The way that core swapping works, the live core will always // be named [indexName] and the indexing core will always be named [indexname]_INDEX, but the // dataDir of each core will alternate between [indexName]_1 and [indexName]_2. createCoreIfNeeded(indexName, indexName + _1, solr); createCoreIfNeeded(indexCoreName, indexName + _2, solr); indexCore = new CommonsHttpSolrServer(indexCoreUrl); } /** * Create a core if it does not already exists. Returns true if a new core was created, false otherwise. */ private boolean createCoreIfNeeded(String coreName, String dataDir, SolrServer server) throws IOException { boolean coreExists = true; try { // SolrJ provides no direct method to check if a core exists, but getStatus will // return an empty list for any core that doesn't. CoreAdminResponse statusResponse = CoreAdminRequest.getStatus(coreName, server); coreExists = statusResponse.getCoreStatus(coreName).size() 0; if(!coreExists) { // Create the core LOG.info(Creating Solr core: + coreName); CoreAdminRequest.Create create = new CoreAdminRequest.Create(); create.setCoreName(coreName); create.setInstanceDir(.); create.setDataDir(dataDir); create.process(server); } } catch (SolrServerException e) { e.printStackTrace(); } return !coreExists; } 2) Do the index, clearing it first if it's a complete rebuild: [snip] if (fullIndex) { try { indexCore.deleteByQuery(*:*); } catch (SolrServerException e) { e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates. } } [snip] various logic, then (we submit batches of 100 : [snip] ListSolrInputDocument docList = b.getSolrInputDocumentList(); UpdateResponse rsp; try { rsp = indexCore.add(docList); rsp = indexCore.commit();
Re: Adding new field after data is already indexed
but if I use this field to do sorting, there will be an error occured and throw an indexOfBoundArray exception. On Thursday, November 11, 2010, Robert Petersen rober...@buy.com wrote: 1) Just put the new field in the schema and stop/start solr. Documents in the index will not have the field until you reindex them but it won't hurt anything. 2) Just turn off their handlers in solrconfig is all I think that takes. -Original Message- From: gauravshetti [mailto:gaurav.she...@tcs.com] Sent: Monday, November 08, 2010 5:21 AM To: solr-user@lucene.apache.org Subject: Adding new field after data is already indexed Hi, I had a few questions regarding Solr. Say my schema file looks like field name=folder_id type=long indexed=true stored=true/ field name=indexed type=boolean indexed=true stored=true/ and i index data on the basis of these fields. Now, incase i need to add a new field, is there a way i can add the field without corrupting the previous data. Is there any feature which adds a new field with a default value to the existing records. 2) Is there any security mechanism/authorization check to prevent url like /admin and /update to only a few users. -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-new-field-after-data-is-alread y-indexed-tp1862575p1862575.html Sent from the Solr - User mailing list archive at Nabble.com. -- Best Regards. Jerry. Li | 李宗杰