Re: Faceting on a date field multiple times
Thanks Marc. On May 4, 2012, at 8:52 PM, Marc Sturlese wrote: http://lucene.472066.n3.nabble.com/Multiple-Facet-Dates-td495480.html -- View this message in context: http://lucene.472066.n3.nabble.com/Faceting-on-a-date-field-multiple-times-tp3961282p3961865.html Sent from the Solr - User mailing list archive at Nabble.com.
Faceting on a date field multiple times
Hi. I would like to be able to do a facet on a date field, but with different ranges (in a single query). for example. I would like to show #documents by day for the last week - #documents by week for the last couple of months #documents by year for the last several years. is there a way to do this without hitting solr 3 times? thanks Ian
how does Solr/Lucene index multi-value fields
Hi. I want to store a list of documents (say each being 30-60k of text) into a single SolrDocument. (to speed up post-retrieval querying) In order to do this, I need to know if lucene calculates the TF/IDF score over the entire field or does it treat each value in the list as a unique field? If I can't store it as a multi-value, I could create a schema where I put each document into a unique field, but I'm not sure how to create the query to search all the fields. Regards Ian
Re: how does Solr/Lucene index multi-value fields
On May 31, 2011, at 12:11 PM, Erick Erickson wrote: Can you explain the use-case a bit more here? Especially the post-query processing and how you expect the multiple documents to help here. we have a collection of related stories. when a user searches for something, we might not want to display the story that is most-relevant (according to SOLR), but according to other home-grown rules. by combing all the possibilities in one SolrDocument, we can avoid a DB-hit to get related stories. But TF/IDF is calculated over all the values in the field. There's really no difference between a multi-valued field and storing all the data in a single field as far as relevance calculations are concerned. so.. it will suck regardless.. I thought we had per-field relevance in the current trunk. :-( Best Erick On Tue, May 31, 2011 at 11:02 AM, Ian Holsman had...@holsman.net wrote: Hi. I want to store a list of documents (say each being 30-60k of text) into a single SolrDocument. (to speed up post-retrieval querying) In order to do this, I need to know if lucene calculates the TF/IDF score over the entire field or does it treat each value in the list as a unique field? If I can't store it as a multi-value, I could create a schema where I put each document into a unique field, but I'm not sure how to create the query to search all the fields. Regards Ian
Re: how does Solr/Lucene index multi-value fields
Thanks Erick. sadly in my use-case I don't that wouldn't work. I'll go back to storing them at the story level, and hitting a DB to get related stories I think. --I On May 31, 2011, at 12:27 PM, Erick Erickson wrote: Hmmm, I may have mis-lead you. Re-reading my text it wasn't very well written TF/IDF calculations are, indeed, per-field. I was trying to say that there was no difference between storing all the data for an individual field as a single long string of text in a single-valued field or as several shorter strings in a multi-valued field. Best Erick On Tue, May 31, 2011 at 12:16 PM, Ian Holsman had...@holsman.net wrote: On May 31, 2011, at 12:11 PM, Erick Erickson wrote: Can you explain the use-case a bit more here? Especially the post-query processing and how you expect the multiple documents to help here. we have a collection of related stories. when a user searches for something, we might not want to display the story that is most-relevant (according to SOLR), but according to other home-grown rules. by combing all the possibilities in one SolrDocument, we can avoid a DB-hit to get related stories. But TF/IDF is calculated over all the values in the field. There's really no difference between a multi-valued field and storing all the data in a single field as far as relevance calculations are concerned. so.. it will suck regardless.. I thought we had per-field relevance in the current trunk. :-( Best Erick On Tue, May 31, 2011 at 11:02 AM, Ian Holsman had...@holsman.net wrote: Hi. I want to store a list of documents (say each being 30-60k of text) into a single SolrDocument. (to speed up post-retrieval querying) In order to do this, I need to know if lucene calculates the TF/IDF score over the entire field or does it treat each value in the list as a unique field? If I can't store it as a multi-value, I could create a schema where I put each document into a unique field, but I'm not sure how to create the query to search all the fields. Regards Ian
[ANN] Zoie Solr Plugin - Zoie Solr Plugin enables real-time update functionality for Apache Solr 1.4+
I just saw this on twitter, and thought you guys would be interested.. I haven't tried it, but it looks interesting. http://snaprojects.jira.com/wiki/display/ZOIE/Zoie+Solr+Plugin Thanks for the RT Shalin!
Re: If you could have one feature in Solr...
On 2/24/10 8:42 AM, Grant Ingersoll wrote: What would it be? most of this will be coming in 1.5, but for me it's - sharding.. it still seems a bit clunky secondly.. this one isn't in 1.5. I'd like to be able to find interesting terms that appear in my result set that don't appear in the global corpus. it's kind of like doing a facet count on *:* and then on the search term and discount the terms that appear heavily on the global one. (sorry.. there is a textbook definition of this.. XX distance.. but I haven't got the books in front of me).
Re: Improvising solr queries
On 1/5/10 12:46 AM, Shalin Shekhar Mangar wrote: sitename:XYZ OR sitename:All Sites) AND (localeid:1237400589415) AND ((assettype:Gallery)) AND (rbcategory:ABC XYZ ) AND (startdate:[* TO 2009-12-07T23:59:00Z] AND enddate:[2009-12-07T00:00:00Z TO *])rows=9start=63sort=date descfacet=truefacet.field=assettypefacet.mincount=1 Similar to this query we have several much complex queries supporting all major landing pages of our application. Just want to confirm that whether anyone can identify any major flaws or issues in the sample query? I'm not the expert Shalin is, but I seem to remember sorting by date was pretty rough on CPU. (this could have been resolved since I last looked at it) the other thing I'd question is the facet. it looks like your only retrieving a single assetType (Gallery). so you will only get a single field back. if thats the case, wouldn't the rows returned (which is part of the response) give you the same answer ? Most of those AND conditions can be separate filter queries. Filter queries can be cached separately and can therefore be re-used. See http://wiki.apache.org/solr/FilterQueryGuidance
Re: Adaptive search?
On 12/18/09 2:46 AM, Siddhant Goel wrote: Let say we have a search engine (a simple front end - web app kind of a thing - responsible for querying Solr and then displaying the results in a human readable form) based on Solr. If a user searches for something, gets quite a few search results, and then clicks on one such result - is there any mechanism by which we can notify Solr to boost the score/relevance of that particular result in future searches? If not, then any pointers on how to go about doing that would be very helpful. Hi Siddhant. Solr can't do this out of the box. you would need to use a external field and a custom scoring function to do something like this. regards Ian Thanks, On Thu, Dec 17, 2009 at 7:50 PM, Paul Libbrechtp...@activemath.org wrote: What can it mean to adapt to user clicks ? Quite many things in my head. Do you have maybe a citation that inspires you here? paul Le 17-déc.-09 à 13:52, Siddhant Goel a écrit : Does Solr provide adaptive searching? Can it adapt to user clicks within the search results it provides? Or that has to be done externally?
Re: Chrome Web Browser doesn't render properly
Brian Klippel wrote: Nope, chrome treats xml as html. Either view source or use another browser. I always thought the XML output should contain a XSLT file in it by default. that way I could debug with safari (and chrome). -Original Message- From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] Sent: Wednesday, July 15, 2009 2:15 PM To: solr-user@lucene.apache.org Subject: Chrome Web Browser doesn't render properly From the Solr admin page, solr/admin/file/?file=schema.xml and /solr/select/?q=solrversion=2.2start=0rows=10indent=on renders improperly (meaning the XML isn't formatted). Maybe Chrome doesn't support XML?
Re: Facets with an IDF concept
Asif Rahman wrote: Hi Grant, I'll give a real life example of the problem that we are trying to solve. We index a large number of current news articles on a continuing basis. We tag these articles with news topics (e.g. Barack Obama, Iran, etc.). We then use these tags to facet our queries. For example, we might issue a query for all articles in the last 24 hours. The facets would then tell us which news topics have been written about the most in that period. The problem is that Barack Obama, for example, is always written about in high frequency, as opposed to Iran which is currently very hot in the news, but which has not always been the case. In this case, we'd like to see Iran show up higher than Barack Obama in the facet results. your not looking for a IDF based function. you need to figure out what a 'normal' amount of news flow for a given topic is and then determine when an abnormal amount is happening. note.. that an abnormal amount is positive or negative. we use a similar method to this on http://love.com, so we know for example something is going on with Ed McMahon as I type. I wouldn't be looking at using SOLR to do this kind of thing btw. try something like esper. I think it might hold some promise to this kind of thing (esper is a open source stream database). Regards To me, this seems identical to the tf-idf scoring expression that is used in normal search. The facet count is analogous to the tf and I can access the facet term idf's through the Similarity API. Is my reasoning sound? Can you provide any guidance as to the best way to implement this? Thanks for your help, Asif On Tue, Jun 23, 2009 at 1:19 PM, Grant Ingersoll gsing...@apache.orgwrote: On Jun 23, 2009, at 3:58 AM, Asif Rahman wrote: Hi again, I guess nobody has used facets in the way I described below before. Do any of the experts have any ideas as to how to do this efficiently and correctly? Any thoughts would be greatly appreciated. Thanks, Asif On Wed, Jun 17, 2009 at 12:42 PM, Asif Rahman a...@newscred.com wrote: Hi all, We have an index of news articles that are tagged with news topics. Currently, we use solr facets to see which topics are popular for a given query or time period. I'd like to apply the concept of IDF to the facet counts so as to penalize the topics that occur broadly through our index. I've begun to write custom facet component that applies the IDF to the facet counts, but I also wanted to check if anyone has experience using facets in this way. I'm not sure I'm following. Would you be faceting on one field, but using the DF from some other field? Faceting is already a count of all the documents that contain the term on a given field for that search. If I'm understanding, you would still do the typical faceting, but then rerank by the global DF values, right? Backing up, what is the problem you are seeing that you are trying to solve? I think you could do this, but you'd have to hook it in yourself. By penalize, do you mean remove, or just have them in the sort? Generally speaking, looking up the DF value can be expensive, especially if you do a lot of skipping around. I don't know how pluggable the sort capabilities are for faceting, but that might be the place to start if you are just looking at the sorting options. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Auto suggest.. how to do mixed case
hi guys. I've noticed that one of the new features in Solr 1.4 is the Termscomponent which enables the Autosuggest. but what puzzles me is how to actually use it in an application. most autosuggests are case insensitive, so there is no difference if I type in 'San Francisco' or 'san francisco'. now I've tried with a 'text' field, and a 'string' field with no joy. with String providing the best result, but still with case sensitivity. at the moment I'm using a custom field type fieldType name=string_lc class=solr.TextField sortMissingLast=true omitNorms=true analyzer !-- KeywordTokenizer does no actual tokenizing, so the entire input string is preserved as a single token -- tokenizer class=solr.KeywordTokenizerFactory/ !-- The LowerCase TokenFilter does what you expect, which can be when you want your sorting to be case insensitive -- filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType which converts all the field to all lower case, which allows me to submit the query as lower case and better good results. so the point of the email is to find out how do I get the autosuggest to return mixed case results, and not require me to lower case the query before I send it?
storing complex types in a multiValued field
hi. I don't think this is a FAQ, but it's been bugging me for a while. I want to store key/value pairs in a single field. for example field name=tags type=keyval indexed=true stored=true multiValued=true / where keyval would be a ID# and the value. I'm guessing it is as simple as creating my own field class, but I was wondering if there were any gotchas. and more importantly why I've never seen the question asked before. It would seem to me a common use case.
Re: storing complex types in a multiValued field
Shalin Shekhar Mangar wrote: I guess most people store it as a simple string key(separator)value. Is there something special that you want to do with the values that you need a custom field implementation? no..not really.. I guess I could achieve it via payloads as well.. the whole thing about stuffing 2 fields into the same field irks me thats all. I've got them set up as 2 separate MV fields at the moment. On Mon, Jan 12, 2009 at 5:36 AM, Ian Holsman li...@holsman.net wrote: hi. I don't think this is a FAQ, but it's been bugging me for a while. I want to store key/value pairs in a single field. for example field name=tags type=keyval indexed=true stored=true multiValued=true / where keyval would be a ID# and the value. I'm guessing it is as simple as creating my own field class, but I was wondering if there were any gotchas. and more importantly why I've never seen the question asked before. It would seem to me a common use case.
Re: Solr security
There was a patch by Sean Timm you should investigate as well. It limited a query so it would take a maximum of X seconds to execute, and would just return the rows it had found in that time. Feak, Todd wrote: I see value in this in the form of protecting the client from itself. For example, our Solr isn't accessible from the Internet. It's all behind firewalls. But, the client applications can make programming mistakes. I would love the ability to lock them down to a certain number of rows, just in case someone typos and puts in 1000 instead of 100, or the like. Admittedly, testing and QA should catch these things, but sometimes it's nice to put in a few safeguards to stop the obvious mistakes from occurring. -Todd Feak -Original Message- From: Matthias Epheser [mailto:[EMAIL PROTECTED] Sent: Monday, November 17, 2008 9:07 AM To: solr-user@lucene.apache.org Subject: Re: Solr security Ryan McKinley schrieb: however I have found that in any site where stability/load and uptime are a serious concern, this is better handled in a tier in front of java -- typically the loadbalancer / haproxy / whatever -- and managed by people more cautious then me. Full ack. What do you think about the only solr related thing left, the paramter filtering/blocking (eg. rows1000). Is this suitable to do it in a Filter delivered by solr? Of course as an optional alternative. ryan
Re: Solr security
if thats the case putting apache in front of it would be handy. something like limit POST order deny,allow deny from all allow from 192.168.0.1 /limit might be helpful. Sean Timm wrote: I believe the Solr replication scripts require POSTing a commit to read in the new index--so at least limited POST capability is required in most scenarios. -Sean Lance Norskog wrote: About that read-only switch for Solr: one of the basic HTTP design guidelines is that GET should only return values, and should never change the state of the data. All changes to the data should be made with POST. (In REST style guidelines, PUT, POST, and DELETE.) This prevents you from passing around URLs in email that can destroy the index. The first role of security is to prevent accidents. I would suggest two layers of read-only switch. 1) Open the Lucene index in read-only mode. 2) Allow only search servers to accept GET requests. Lance
Re: Solr security
Ryan McKinley wrote: On Nov 17, 2008, at 4:20 PM, Erik Hatcher wrote: trouble is, you can also GET /solr/update, even all on the URL, no request body... http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3ESTREAMED%3C/field%3E%3C/doc%3E%3C/add%3Ecommit=true Solr is a bad RESTafarian. but with Ian's options in the apache config, this would not work... rather it would only work if stream.body was a POST location /solr/update order deny,allow deny from all allow from 192.168.0.1 /location ? or perhaps locationmatch.. but you get the picture. Getting warmer! Erik On Nov 17, 2008, at 4:11 PM, Ian Holsman wrote: if thats the case putting apache in front of it would be handy. something like limit POST order deny,allow deny from all allow from 192.168.0.1 /limit might be helpful. Sean Timm wrote: I believe the Solr replication scripts require POSTing a commit to read in the new index--so at least limited POST capability is required in most scenarios. -Sean Lance Norskog wrote: About that read-only switch for Solr: one of the basic HTTP design guidelines is that GET should only return values, and should never change the state of the data. All changes to the data should be made with POST. (In REST style guidelines, PUT, POST, and DELETE.) This prevents you from passing around URLs in email that can destroy the index. The first role of security is to prevent accidents. I would suggest two layers of read-only switch. 1) Open the Lucene index in read-only mode. 2) Allow only search servers to accept GET requests. Lance
Re: Solr security
Erik Hatcher wrote: I'm pondering the viability of running Solr as effectively a UI server... what I mean by that is having a public facing browser-based application hitting a Solr backend directly for JSON, XML, etc data. I know folks are doing this (I won't name names, in case this thread comes up with any vulnerabilities that would effect such existing environments). Let's just assume a typical deployment environment... replicated Solr's behind a load balancer, maybe even a caching proxy. What known vulnerabilities are there in Solr 1.3, for example? What I think we can get out this is a Solr deployment configuration suitable for direct browser access, but we're not safely there yet are we? Is this an absurd goal? Must we always have a moving piece between browser and data/search servers? Thanks, Erik First thing I would look at is disabling write access, or writing a servlet that sits on top of the write handler to filter your data. Second thing I would be concerned about is people writing DoS queries that bypass the cache. so you may need to write your own custom request handler to filter out that kind of thing.
Re: Solr security
Erik Hatcher wrote: On Nov 16, 2008, at 5:41 PM, Ian Holsman wrote: First thing I would look at is disabling write access, or writing a servlet that sits on top of the write handler to filter your data. We can turn off all the update handlers, but how does that affect replication? Can a Solr replicant be entirely read-only in the HTTP request sense? Second thing I would be concerned about is people writing DoS queries that bypass the cache. so you may need to write your own custom request handler to filter out that kind of thing. Is this a concern that can be punted to what you'd naturally be putting in front of Solr anyway or a proxy tier that can have DoS blocking rules? I mean, if you're deploying a Struts that hits Solr under the covers, how do you prevent against DoS on that? A malicious user could keep sending queries indirectly to a Solr through a whole lot of public apps now. In other words, another tier in front of Solr doesn't add (much) to DoS protection to an underlying Solr, no? famous last words and all, but you shouldn't be just passing what a user types directly into a application should you? I'd be parsing out wildcards, boosts, and fuzzy searches (or at least thinking about the effects). I mean jakarta apache~1000 or roam~0.1 aren't as efficient as a regular query. but they don't let me into design meetings any more ;( Erik
Re: solrj and CLOSE_WAIT's
Ryan McKinley wrote: not sure if it is something we can do better or part of HttpClient... From: http://www.nabble.com/CLOSE_WAIT-td19959428.html it seems to suggest you may want to call: con.closeIdleConnections(0L); But if you are creating a new MultiThreadedHttpConnectionManager for each request, is seems odd you would have to explicitly close the connection for each request. What happens if you try using a SimpleHttpConnectionManager rather then a MultiThreadedHttpConnectionManager? You can explicitly pass in: I was thinking the same thing when i saw the other constructor. I've modified the code to call the 'simple' version and will let it run for an hour or three to make sure it works and doesn't exhibit the behavior, so far it looks good and there are no CLOSE_WAITs (or FIN_WAIT2's) showing up for longer than a couple of seconds. (according to netstat -tn) I'd petition we go back to the 'stupid' version by default that just does what it is supposed to do, and leave the other one for 'experts'. I can't even see how to tell the multi-threaded version to close itself nicely ;( to: public CommonsHttpSolrServer(URL baseURL, HttpClient client, ResponseParser parser, boolean useMultiPartPost) { if that fixes things, it is a bit disturbing, but something we should look into. ryan
solrj and CLOSE_WAIT's
Hi guys. I'm running a little upload project that uploads documents into a solr index. there is also a 2nd thread that runs a deleteby query and a optimize every once and a while. in an effort to reduce the probably of things being held onto I've made everything local, but it is still collecting CLOSE_WAITs and FIN_WAIT2's on the server side until it eventually runs out of file handles in a day or two. the following are the code snippets being used to call solr. protected void doArchiveSolr() throws IOException, SolrServerException { Calendar rightNow = Calendar.getInstance(); rightNow.add(Calendar.DATE, 31 * -1); DateFormat f = new SimpleDateFormat(-MM-dd'T'HH:mm:ss.SSS'Z'); java.util.Date d = rightNow.getTime(); String s = publish_date:[1976-03-06T23:59:59.999Z/YEAR TO + f.format(d) + ]; logger.info(Archiver: + s); CommonsHttpSolrServer solrServer; solrServer = new CommonsHttpSolrServer(solrURL); solrServer.deleteByQuery(s); solrServer.commit(); } and this runs every X minutes. it also has other local parts like { CommonsHttpSolrServer solrServer; solrServer = new CommonsHttpSolrServer(solrURL); solrServer.optimize(); } and { CommonsHttpSolrServer solrServer; UpdateResponse r; solrServer = new CommonsHttpSolrServer(solrUrl); solrServer.setSoTimeout(12); // socket read timeout 2minutes solrServer.setConnectionTimeout(100); solrServer.setDefaultMaxConnectionsPerHost(100); solrServer.setMaxTotalConnections(100); solrServer.setFollowRedirects(false); // defaults to false solrServer.setAllowCompression(false); r = solrServer.add(docs); r = solrServer.commit(); docs.clear(); }
Re: Release date of SOLR 1.3
Noble Paul നോബിള് नोब्ळ् wrote: If you are looking for an immediate need waiting for a release I must advice you against waiting for the solr1.3 release. The best strategy would be to take a nightly and start using it. Test is thoroughly and if bugs are found report them back . If everything is fine go into production with that --Noble I'd be very hesitant to recommend ANYONE go into production with non-released software if you are unfamiliar with the codebase. waiting on the list for someone to fix a bug which is causing a site outage for your site is somewhat of a career limiting move. I'd recommend using the stable release, and learning the codebase ;-) regards Ian On Thu, May 15, 2008 at 12:28 AM, Matthew Runo [EMAIL PROTECTED] wrote: There isn't a specific date so far, but I'd like to say that only once in the year or so I've been working with the SVN head build of Solr have I noticed a bug get committed. And it was fixed very quickly once it was found.. I think if you need to have development features you're probably safe to use the SVN head, but remember that it is dev, and you should *always* test new builds before actually using them =p Thanks! Matthew Runo Software Developer Zappos.com 702.943.7833 On May 14, 2008, at 9:08 AM, Umar Shah wrote: Hi, I'm using the latest trunk code from SOLR . I am basically using function queries (sum, product, scale) for my project which are not present in 1.2. I wanted to know if there is some decided date for release of Solr1.3. If the date is far/ not decide, what should be the best practice to adopt the above mentioned feature while not compromising on stability of the system. thanks -umar
Re: Solr replication by solr (for windows)
The current scripts use rsync to minimize the amount of data actually being copied. I've had a brief look and found only 1 implementation which is GPL and abandoned http://sourceforge.net/projects/jarsync. Personally I still think the size of the transfer is important (as for most use cases not much is actually changed every hour).. but thats just me.. your case may be different than mine. regards Ian Noble Paul നോബിള് नोब्ळ् wrote: hi , The current replication strategy in solr involves shell scripts . The following are the drawbacks * It does not work with windows * Replication works as a separate piece not integrated with solr. * Cannot control replication from solr admin/JMX * Each operation requires manual telnet to the host Doing the replication within java code has the following advantages * Platform independence * Manual steps can be completely eliminated. Everything can be driven from solrconfig.xml . ** Just put in the url of the master in the slaves that should be good enough to enable replication. Other things like frequency of snapshoot/snappull can also be configured * Start/stop can be triggered from solr/admin or JMX * Can get the status/progress while replication is going on * No need to have a login into the machine The implementation can be done as two components * A SolrEventListener which does a snapshoot . Same as done by the script * A ReplicationHandler which can act as a server to dish out the index snapshots (in the master) ** In the slave the same handler can poll at regular intervals and if there is a new snapshot fetch the index over http (it can use solrj+BinaryReponseWriter) * The same Handler can do a snap install * The Handler may expose all the operations over a REST interface or JMX * It may also show the current state of the master index through the console What do you think?
Re: unique values from a field in a result
Hi Thijs. If you are not concerned with a *EXACT* number there is a paper that was published in 1990 that discusses this problem. http://dblab.kaist.ac.kr/Publication/pdf/ACM90_TODS_v15n2.pdf from the paper (If I understand it correctly) For 120,000,000 records you can sample 10,112,529 records (10%) when the variance is low and get an answer with 95% confidence. Regards Ian Thijs wrote: It must be my english. When I read your comment, I think you could compare it to the category example... Maybe with an example I can explain my situation better: The documents in the index contain variations of different products. Say for example I have 10 different products. Every product is indexed 1000 times (1000 different variations, per product) the product is not unique, the variation is unique. The first 10 result of a search only contain the best matching variations for all the products in the complete result. So lets say the result returns 1000 variations for 3 different products. What I need is some 'sidebar information' containing detailed information on al the 3 unique products in the complete result. My example is just simple, in real life the numbers are a lot bigger. However, the amount of unique products vs variations is such that it seems a lot of work to iterate over al variations in a DocSet just to get the few unique products. But, what I understand from you anwser is that the best way to get the 3 unique products is to iterate over the 1000 variations in the result DocSet? And if that is the case I'm happy with it. Thanks Thijs But to get some extra inforamtion I need al the unique values for one of the fields in the index (being the pk of the product). Chris Hostetter schreef: : You are correct I'm looking for the unique values for one field in a DocSet. : The field is not multivalued. and it contains only 1 long value, the pk of a : database table : But you said the counts are stored in the index, I don't see that. Because there's something very confusing about your question ... if the value of the field is unique for every document (by pk you mean the primary key for these docs in your database correct?) then why do you specificly need the unique terms ? ... aren't they by definition unique? usually when people ask questions like this, they are interested in the unique values for something like a category field, where lots of documenst are in the same category, and they want to know what the full list of categories is for all ofhte documenst that match their query. if you want the list of all primary keys for all the documents that match your query, why not just make sure that field has stored=true in the schema.xml and getthe values that way? I'm extra confused because of this comment... : when I debug simplefacet. It always iterates over all the documents in the : result docset (SimpleFacet.getFieldCacheCounts line 259). it doesn't *seem* like faceting is neccessary, but why do you think iterating over all the documents in your result set set seems like a waste here? if you want to know what *all* the values are for every document in your doc set, then regardless of wether the values are distinct for each doc, how else could Solr get all the values then looking at each matching doc? -Hoss
Re: Lucene-based Distributed Index Leveraging Hadoop
Clay Webster wrote: There seem to be a few other players in this space too. Are you from Rackspace? (http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop- query-terabytes-data) AOL also has a Hadoop/Solr project going on. CNET does not have much brewing there. Although Yonik and I had talked about it a bunch -- but that was long ago. Hi. AOL has a couple of projects going on in the lucene/hadoop/solr space, and we will be pushing more stuff out as we can. We don't have anything going with solr over hadoop at the moment. I'm not sure if this would be better than what SOLR-303 does, but you should have a look at the work being done there. One of the things you mentioned is that the data sets are disjoint. SOLR-303 doesn't require this, and allows us to have a document stored in multiple shards (with different caching/update characteristics). --cw Clay Webster tel:1.908.541.3724 Associate VP, Platform Infrastructure http://www.cnet.com CNET, Inc. (Nasdaq:CNET) mailto:[EMAIL PROTECTED]
Re: leading wildcards
the solution that works for me is to store the field in reverse order, and have your application reverse the field in the query. so the field www.example.com would be stored as moc.elmpaxe.www so now I can do a search for *.example.com in my application. Regards Ian (hat tip to erik for the idea) Michael Kimsal wrote: Vote for that issue and perhaps it'll gain some more traction. A former colleague of mine was the one who contributed the patch in SOLR 218 and it would be nice to have that configuration option 'standard' (if off by default) in the next SOLR release. On Nov 12, 2007 11:18 AM, Traut [EMAIL PROTECTED] wrote: Seems like there is no way to enable leading wildcard queries except code editing and files repacking. :( On 11/12/07, Bill Au [EMAIL PROTECTED] wrote: The related bug is still open: http://issues.apache.org/jira/browse/SOLR-218 Bill On Nov 12, 2007 10:25 AM, Traut [EMAIL PROTECTED] wrote: Hi I found the thread about enabling leading wildcards in Solr as additional option in config file. I've got nightly Solr build and I can't find any options connected with leading wildcards in config files. How I can enable leading wildcard queries in Solr? Thank you -- Best regards, Traut -- Best regards, Traut
where did my foreign language go?
Hi. I'm in the middle of bringing up a new solr server and am using the trunk. (where I was using an earlier nightly release of about 2-3 weeks ago on my old server) now, when I do a search for 日本 (japan) it used to show the kanji in the q area, but now it shows gibberish instead æ¥æ¬ any hints on where I should start investigating on why this is happening? regards Ian (server is here: http://pyro.holsman.net:8983/solr/select/?q=%E6%97%A5%E6%9C%ACversion=2.2start=0rows=10indent=on )
Re: where did my foreign language go?
Thanks.. I'll do that sunrise1984 wrote: Maybe the following is useful for you.(It comes from http://wiki.apache.org/solr/SolrTomcat) If you are going to query Solr using international characters (127) using HTTP-GET, you must configure Tomcat to conform to the URI standard by accepting percent-encoded UTF-8. Edit Tomcat's conf/server.xml and add the following attribute to the correct Connector element: URIEncoding=UTF-8. Server ... Service ... Connector ... URIEncoding=UTF-8/ ... /Connector /Service /Server This is only an issue when sending non-ascii characters in a query request... no configuration is needed for Solr/Tomcat to return non-ascii chars in a response, or accept non-ascii chars in an HTTP-POST body. sunrise1984 2007-10-25
Seeing if an entry exists in an index for a set of terms
Hi. I was wondering if there was a easy way to give solr a list of things and finding out which have entries. ie I pass it a list Bill Clinton George Bush Mary Papas (and possibly 20 others) to a solr index which contains news articles about presidents. I would like a response saying bill Clinton was found in 20 records George Bush was found in 15. possibly with the links, but thats not too important. I know I can do this by doing ~20 individual queries, but I thought there may be a more efficient way Regards Ian
Re: Seeing if an entry exists in an index for a set of terms
Yonik Seeley wrote: On 10/3/07, Ian Holsman [EMAIL PROTECTED] wrote: Hi. I was wondering if there was a easy way to give solr a list of things and finding out which have entries. ie I pass it a list Bill Clinton George Bush Mary Papas (and possibly 20 others) to a solr index which contains news articles about presidents. I would like a response saying bill Clinton was found in 20 records George Bush was found in 15. possibly with the links, but thats not too important. I know I can do this by doing ~20 individual queries, but I thought there may be a more efficient way How about facet.query=Bill Clintonfacet.query=George Bush, etc Will give you counts, but not the links -Yonik That will work. Thanks Yonik.
Re: Geographical distance searching
Have you guys seen Local Lucene ? http://www.nsshutdown.com/projects/lucene/whitepaper/*locallucene*.htm no need for mysql if you don't want too. rgrds Ian Will Johnson wrote: With the new/improved value source functions it should be pretty easy to develop a new best practice. You should be able to pull in the lat/lon values from valuesource fields and then do your greater circle calculation. - will -Original Message- From: Lance Norskog [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 26, 2007 3:15 PM To: solr-user@lucene.apache.org Subject: Geographical distance searching It is a best practice to store the master copy of this data in a relational database and use Solr/Lucene as a high-speed cache. MySQL has a geographical database option, so maybe that is a better option than Lucene indexing. Lance (P.s. please start new threads for new topics.) -Original Message- From: Sandeep Shetty [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 26, 2007 5:15 AM To: 'solr-user@lucene.apache.org' Subject: custom sorting Hi Guys, this question as been asked before but i was unable to find an answer thats good for me, so hope you guys can help again i am working on a website where we need to sort the results by distance from the location entered by the user. I have indexed the lat and long info for each record in solr and also i can get the lat and long of the location input by the user. Previously we were using lucene to do this. by using the SortComparatorSource we could sort the documents returned by distance nicely. we are now switching over to lucene because of the features it provides, however i am not able to see a way to do this in Solr. If someone can point me in the right direction i would be very grateful! Thanks in advance, Sandeep This email is confidential and may also be privileged. If you are not the intended recipient please notify us immediately by telephoning +44 (0)20 7452 5300 or email [EMAIL PROTECTED] You should not copy it or use it for any purpose nor disclose its contents to any other person. Touch Local cannot accept liability for statements made which are clearly the sender's own and are not made on behalf of the firm. Touch Local Limited Registered Number: 2885607 VAT Number: GB896112114 Cardinal Tower, 12 Farringdon Road, London EC1M 3NN +44 (0)20 7452 5300
Re: Nutch with SOLR
[moving this thread to solr-user, as it really has nothing to do with hadoop] Daniel Clark wrote: There's info on website http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.htm l, but it's not clear. Sami has a patch in there which used a older version of the solr client. with the current solr client in the SVN tree, his patch becomes much easier. your job would be to upgrade the patch and mail it back to him so he can update his blog, or post it as a patch for inclusion in nutch/contrib (if sami is ok with that). If you have issues with how to use the solr client api, solr-user is here to help. the nutch specific changes are: 1. configure nutch-site.xml to add a config option to point to your solr server. 2. instead of calling the nutch 'index' command, you would call it like so bin/nutch org.apache.nutch.indexer.SolrIndexer $BASEDIR/crawldb $BASEDIR/linkdb $SEGMENT regards Ian ~ Daniel Clark, President DAC Systems, Inc. (703) 403-0340 ~ -Original Message- From: Dmitry [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 25, 2007 2:56 PM To: [EMAIL PROTECTED] Subject: Re: Nutch with SOLR Daniel, We just started to test/research posibility of integration of Nutch and Solr so it will be nice to hear any advices as well. Thanks, DT www.ejizn.com - Original Message - From: Daniel Clark [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, September 25, 2007 1:23 PM Subject: Nutch with SOLR Has anyone been able to get Nutch 0.9 working with SOLR? Any help would be appreciated. ~ Daniel Clark, President DAC Systems, Inc. (703) 403-0340 ~
Re: Nutch with SOLR
Thanks Brian. I'm sure this will help lots of people. Brian Whitman wrote: But we still use a version of Sami's patch that works on both trunk nutch and trunk solr (solrj.) I sent my changes to sami when we did it, if you need it let me know... I put my files up here: http://variogr.am/latest/?p=26 -b
Solr Injection
Hi. I've been playing with Kettle (http://kettle.pentaho.org/ ) as a method to inject data into Solr (and other things at the same time), and it looks really promising. I was wondering if anyone else had some experience using it with Solr and if they set it up to add a document at a time, or wrote a single XML 'add' document and then added all of them in one lot Ideally I would like to have Solr accept a REST style URL without all the XML bs around and just pass the fields in as parameters (which is alluded to in http://issues.apache.org/jira/browse/SOLR-85 ) and just pound the Solr master with lots of little posts when I do incremental updates for 1000 things and use the CSV uploader for larger things. Thoughts?
RDF uploader -- has anyone built such a beast?
Hi. For a project i'm working on, I'm getting a RDF formatted feed. I was wondering if someone has built a RDF to solr upload function similar to the CSV and mysql ones sitting in Jira. regards Ian
Re: Requests per second/minute monitor?
Walter Underwood wrote: This is for monitoring -- what happened in the last 30 seconds. Log file analysis doesn't really do that. I would respectfully disagree. Log file analysis of each request can give you that, and a whole lot more. you could either grab the stats via a regular cron job, or create a separate filter to parse them real time. It would then let you grab more sophisticated stats if you choose to. What I would like to know is (and excuse the newbieness of the question) how to enable solr to log a file with the following data. - time spent (ms) in the request. - IP# of the incoming request - what the request was (and what handler executed it) - a status code to signal if the request failed for some reasons - number of rows fetched and - the number of rows actually returned is this possible? (I'm using tomcat if that changes the answer). regards Ian -- View this message in context: http://www.nabble.com/Re%3A-Requests-per-second-minute-monitor--tf3659369.html#a10407072 Sent from the Solr - User mailing list archive at Nabble.com.
newbie Q regarding schema configuration
hi. so I finally managed to find a bit of time to get a SolR instance going, and now have some questions about it ;-) first the application is tagging. ie.. to associate some keywords with a given item, and to show them on a particular object (you can see this in action here http://economy-chat.com/aggy/detail/andrew- leigh/ ) It user-based (ie individuals can tag a particular object themselves, and that get's merged into a global summary for that object) and it is also hierarchal, ie tagging a child implies you have also tagged the parent. so.. my first question in schema.xml, can you have a composite key as the 'uniquekey' field, or do i need to do this on the client side? 2nd question. can you have complex types which are multivalued? I'd like to store something like a tag-name with a corresponding tag-weighting. can you do sum(*) type queries in lucene/solr? it is efficient ? or are you better having a 2nd index which has these sum(*) values in it and keep it up to date instead. Thanks
Re: SolPHP
I think I could get some python bindings off those as well. and if people feel there is a need some C/APR ones as well. On 02/06/2006, at 11:16 AM, Brian Lucas wrote: Erik, I'll get the PHP bindings out to see how they suit the needs of people and use that feedback for the Rails bindings. I'm looking forward to seeing how they could be implemented as well. Brian -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Thursday, June 01, 2006 6:59 PM To: solr-user@lucene.apache.org Subject: Re: SolPHP Brian, I'd love to give any RoR bindings a try if you're a point to share. I can see all sorts of interesting fun that can be had with such bindings, such as pulling schema.xml from the server and using its field definitions to build mapping objects (like ActiveRecord), support for all the parameters of the request handler(s), clever iterators that would page through the hits by requesting bite-sized chunks from Solr. At the very least, of course, is having the request and response abstracted so no XML or HTTP is seen by the client code. Erik On Jun 1, 2006, at 8:49 PM, Brian Lucas wrote: Yes, I have written bindings but hadn't abstracted them fully. They're pretty solid and since you're the second person that's asked, let me get those out as soon as possible. I'm also working on the Ruby/Rails bindings as well. Brian -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: Thursday, June 01, 2006 6:17 PM To: solr-user@lucene.apache.org; [EMAIL PROTECTED] Subject: Re: SolPHP Nothing in SVN... It looks like Brian Lucas might have been working on something: http://www.mail-archive.com/solr-user%40lucene.apache.org/ msg00325.html -Yonik On 6/1/06, Michael J. Giarlo [EMAIL PROTECTED] wrote: Hey folks, I noticed a stub on the wiki about two PHP classes for solr. I've tried to track down the classes but have been unsuccessful so far. Does anyone know where, or if, these classes are available? Thanks! -Mike