Re: storing key,value pair in Solr document
You can have a dynamicField in your schema called entity_* and map it to the your corresponding data structure in this way: @Field (entity_*) MapString,String entity; The key would be your fieldName (other than the entity_). SOLR-1129 https://issues.apache.org/jira/browse/SOLR-1129 will give you more insights. Cheers Avlesh On Mon, Aug 10, 2009 at 10:58 AM, Ninad Raut hbase.user.ni...@gmail.comwrote: Hi, I have a Entitiy and a Value associated with it. I want to store this value as a key,value pair in Solr. I have a Java Object which I am mapping to Solr Doc using org.apache.solr.client.solrj.beans.Field . Can I also store a Map? and how can I do so? This is how I want it to be done: @Field MapString,String entity;
Re: storing key,value pair in Solr document
Hi Avlesh, Can we use SimpleOrderedMap? It seems deprecated. Is it safe to use , and how is going to be mapped to the field? @Field(ne) SimpleOrderedMapString ne = new SimpleOrderedMapString(); wont work right?? On Mon, Aug 10, 2009 at 11:36 AM, Avlesh Singh avl...@gmail.com wrote: You can have a dynamicField in your schema called entity_* and map it to the your corresponding data structure in this way: @Field (entity_*) MapString,String entity; The key would be your fieldName (other than the entity_). SOLR-1129 https://issues.apache.org/jira/browse/SOLR-1129 will give you more insights. Cheers Avlesh On Mon, Aug 10, 2009 at 10:58 AM, Ninad Raut hbase.user.ni...@gmail.com wrote: Hi, I have a Entitiy and a Value associated with it. I want to store this value as a key,value pair in Solr. I have a Java Object which I am mapping to Solr Doc using org.apache.solr.client.solrj.beans.Field . Can I also store a Map? and how can I do so? This is how I want it to be done: @Field MapString,String entity;
Re: Embedded Solr Clustering
On Mon, Aug 10, 2009 at 3:57 AM, born2fish tswan...@yahoo.com wrote: Hi everyone, We have a web app that uses embedded solr for better performance. I would advise against it. We use Solr on sites with millions of page views a month on HTTP. With HTTP keep-alives, the overhead of an http request is minimal as compared to the actual search. You have the advantages of replication as well as the option of adding a http cache in front of Solr. Is there a performance problem you're trying to solve by using embedded solr? Now we are trying to deploy the app to a clustered environment. My question is: 1. Can we configure the embedded solr instances to share the same index on the network? Yes. It may be very slow though. Best to benchmark it before going to production. Also, you'll need to make sure that only one Solr instance is writing to the index at one time. It is better to have separate indexes. 2. If the answer to question 1 is no, can we configure embedded solr instances to replicate indexes in a master / slave fashion just like normal web based Solr? Yes you can use the script based replication. You'd need to expose a way to call commit on your application if you use embedded solr. -- Regards, Shalin Shekhar Mangar.
Re: Guide to using SolrQuery object
You'll find the available parameters in various interfaces in the package org.apache.solr.common.params.* For instance: import org.apache.solr.common.params.FacetParams; import org.apache.solr.common.params.ShardParams; import org.apache.solr.common.params.TermVectorParams; As a side note to what Shalin said, SolrQuery extends ModifiableSolrParams (just so that you are aware of that). Hope that helps a bit. Cheers, Aleks On Tue, 14 Jul 2009 16:27:50 +0200, Reuben Firmin reub...@benetech.org wrote: Also, are there enums or constants around the various param names that can be passed in, or do people tend to define those themselves? Thanks! Reuben -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.com http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: storing key,value pair in Solr document
Can we use SimpleOrderedMap? No, Ninad that wouldn't work. Cheers Avlesh On Mon, Aug 10, 2009 at 11:46 AM, Ninad Raut hbase.user.ni...@gmail.comwrote: Hi Avlesh, Can we use SimpleOrderedMap? It seems deprecated. Is it safe to use , and how is going to be mapped to the field? @Field(ne) SimpleOrderedMapString ne = new SimpleOrderedMapString(); wont work right?? On Mon, Aug 10, 2009 at 11:36 AM, Avlesh Singh avl...@gmail.com wrote: You can have a dynamicField in your schema called entity_* and map it to the your corresponding data structure in this way: @Field (entity_*) MapString,String entity; The key would be your fieldName (other than the entity_). SOLR-1129 https://issues.apache.org/jira/browse/SOLR-1129 will give you more insights. Cheers Avlesh On Mon, Aug 10, 2009 at 10:58 AM, Ninad Raut hbase.user.ni...@gmail.com wrote: Hi, I have a Entitiy and a Value associated with it. I want to store this value as a key,value pair in Solr. I have a Java Object which I am mapping to Solr Doc using org.apache.solr.client.solrj.beans.Field . Can I also store a Map? and how can I do so? This is how I want it to be done: @Field MapString,String entity;
Multiple Unique Ids
Hi, I have two Ids DocumentId and AuthorId. I want both of them unique. Can i have two uniqueKey in my document? uniqueKeyid/uniqueKey uniqueKeyauthorId/uniqueKey Regards, Ninad Raut
AW: mergeContiguous for multiple search terms
Hallo, we are using Solr-1.3. Thanks for your time. Björn -Ursprüngliche Nachricht- Von: solr-user-return-24991-hachmann.bjoern=guj...@lucene.apache.or g [mailto:solr-user-return-24991-hachmann.bjoern=guj...@lucene.a pache.org] Im Auftrag von Avlesh Singh Gesendet: Montag, 10. August 2009 04:01 An: solr-user@lucene.apache.org Betreff: Re: mergeContiguous for multiple search terms Which Solr version are you using? Cheers Avlesh On Wed, Aug 5, 2009 at 5:55 PM, Hachmann, Bjoern hachmann.bjo...@guj.dewrote: Hello, we would like to use the highlightingComponent with the mergeContiguous parameter set to true. We have a field with value: Ökonom Charles Goodhart. If we search for all three words, they are found correctly: emÖkonom/em emCharles/em emGoodhart/em But, as I set the mergeContiguous parameter to true, I expected: emÖkonom Charles Goodhart/em. Am I misunderstanding the behaviour of this parameter? We are using the dismax-query parser and solr-1.3. Thank you very much for your time. Björn Hachmann
Pojo not getting added to Solr Index
I am not getting any excpetion, but the document is not getting added to Solr. Here is the code: public class ClientSearch { public SolrServer getSolrServer() throws MalformedURLException{ //the instance can be reused return new CommonsHttpSolrServer(http://germinait22:8983/solr/core0/;); } void store() throws IOException, SolrServerException { IthursDocument ithursDocument = new IthursDocument(); System.out.println(Created IthursDocument..); ithursDocument.setId(testID_2); ithursDocument.setMedia(BLOG); ithursDocument.setContent(Khatoo is a good Gal); Date date = new Date(23/08/2009); ithursDocument.setPubDate(date); MapString,String namedEntity = new HashMapString,String(); namedEntity.put(Germinait, 0.7); ithursDocument.setNe(namedEntity); ithursDocument.setSentiment(0.1f); SolrServer server = getSolrServer(); server.addBean(ithursDocument); } void query() throws MalformedURLException, SolrServerException { SolrServer server = getSolrServer(); SolrQuery query = new SolrQuery(); query.setQuery(id:testID); QueryResponse rsp = server.query(query); ListIthursDocument list= rsp.getBeans(IthursDocument.class); System.out.println(list.size()); } public static void main(String[] args) { ClientSearch clientSearch = new ClientSearch(); try { clientSearch.store(); clientSearch.query(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (SolrServerException e) { // TODO Auto-generated catch block e.printStackTrace(); } } The logs show the following: 192.168.0.115 - - [10/08/2009:12:10:47 +] POST /solr/core0/update?wt=javabinversion=1 HTTP/1.1 200 40 Where am I going wrong??
Re: Pojo not getting added to Solr Index
Where am I going wrong?? I think you forgot to commit after adding beans via the SolrServer. PS: I am damn sure that you don't intend to create a new instance of CommonsHttpSolrServer everytime. Cheers Avlesh On Mon, Aug 10, 2009 at 5:55 PM, Ninad Raut hbase.user.ni...@gmail.comwrote: I am not getting any excpetion, but the document is not getting added to Solr. Here is the code: public class ClientSearch { public SolrServer getSolrServer() throws MalformedURLException{ //the instance can be reused return new CommonsHttpSolrServer(http://germinait22:8983/solr/core0/;); } void store() throws IOException, SolrServerException { IthursDocument ithursDocument = new IthursDocument(); System.out.println(Created IthursDocument..); ithursDocument.setId(testID_2); ithursDocument.setMedia(BLOG); ithursDocument.setContent(Khatoo is a good Gal); Date date = new Date(23/08/2009); ithursDocument.setPubDate(date); MapString,String namedEntity = new HashMapString,String(); namedEntity.put(Germinait, 0.7); ithursDocument.setNe(namedEntity); ithursDocument.setSentiment(0.1f); SolrServer server = getSolrServer(); server.addBean(ithursDocument); } void query() throws MalformedURLException, SolrServerException { SolrServer server = getSolrServer(); SolrQuery query = new SolrQuery(); query.setQuery(id:testID); QueryResponse rsp = server.query(query); ListIthursDocument list= rsp.getBeans(IthursDocument.class); System.out.println(list.size()); } public static void main(String[] args) { ClientSearch clientSearch = new ClientSearch(); try { clientSearch.store(); clientSearch.query(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (SolrServerException e) { // TODO Auto-generated catch block e.printStackTrace(); } } The logs show the following: 192.168.0.115 - - [10/08/2009:12:10:47 +] POST /solr/core0/update?wt=javabinversion=1 HTTP/1.1 200 40 Where am I going wrong??
Re: Pojo not getting added to Solr Index
thanks Avlesh, u saved my day... !! yes I am not going to have a new instance of server every time... this is just a Proof of concept. On Mon, Aug 10, 2009 at 6:06 PM, Avlesh Singh avl...@gmail.com wrote: Where am I going wrong?? I think you forgot to commit after adding beans via the SolrServer. PS: I am damn sure that you don't intend to create a new instance of CommonsHttpSolrServer everytime. Cheers Avlesh On Mon, Aug 10, 2009 at 5:55 PM, Ninad Raut hbase.user.ni...@gmail.com wrote: I am not getting any excpetion, but the document is not getting added to Solr. Here is the code: public class ClientSearch { public SolrServer getSolrServer() throws MalformedURLException{ //the instance can be reused return new CommonsHttpSolrServer(http://germinait22:8983/solr/core0/ ); } void store() throws IOException, SolrServerException { IthursDocument ithursDocument = new IthursDocument(); System.out.println(Created IthursDocument..); ithursDocument.setId(testID_2); ithursDocument.setMedia(BLOG); ithursDocument.setContent(Khatoo is a good Gal); Date date = new Date(23/08/2009); ithursDocument.setPubDate(date); MapString,String namedEntity = new HashMapString,String(); namedEntity.put(Germinait, 0.7); ithursDocument.setNe(namedEntity); ithursDocument.setSentiment(0.1f); SolrServer server = getSolrServer(); server.addBean(ithursDocument); } void query() throws MalformedURLException, SolrServerException { SolrServer server = getSolrServer(); SolrQuery query = new SolrQuery(); query.setQuery(id:testID); QueryResponse rsp = server.query(query); ListIthursDocument list= rsp.getBeans(IthursDocument.class); System.out.println(list.size()); } public static void main(String[] args) { ClientSearch clientSearch = new ClientSearch(); try { clientSearch.store(); clientSearch.query(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (SolrServerException e) { // TODO Auto-generated catch block e.printStackTrace(); } } The logs show the following: 192.168.0.115 - - [10/08/2009:12:10:47 +] POST /solr/core0/update?wt=javabinversion=1 HTTP/1.1 200 40 Where am I going wrong??
Re: MoreLikeThis: How to get quality terms from html from content stream?
Right, a SearchComponent wrapper around some of the Solr Cell capabilities could make this so. On Aug 9, 2009, at 11:21 AM, Jay Hill wrote: Solr Cell definitely sounds like it has a place here. But wouldn't it be needed for as an extracting component earlier in the process for the MoreLikeThisHandler? The MLT Handler works great when it's directed to a content stream of plain text. If we could just use Solr Cell to identify the file type and do the content extraction earlier in the stream that would do the trick I think. Then whether the URL pointed to HTML, a PDF, or whatever, MLT would be receiving a stream of extracted content. -Jay On Sun, Aug 9, 2009 at 7:17 AM, Grant Ingersoll gsing...@apache.org wrote: It's starting to sound like Solr Cell needs a SearchComponent as well, that can come before the QueryComponent and can be used to map into the other components. Essentially, take the functionality of the extractOnly option and have it feed other SearchComponent. On Aug 8, 2009, at 10:42 AM, Ken Krugler wrote: On Aug 7, 2009, at 5:23pm, Jay Hill wrote: I'm using the MoreLikeThisHandler with a content stream to get documents from my index that match content from an html page like this: http://localhost:8080/solr/mlt?stream.url=http://www.sfgate.com/cgi-bin/article.cgi ?f=/c/a/2009/08/06/ SP5R194Q13.DTLmlt.fl=bodyrows=4debugQuery=true But, not surprisingly, the query generated is meaningless because a lot of the markup is picked out as terms: str name=parsedquery_toString body:li body:href body:div body:class body:a body:script body:type body:js body:ul body:text body:javascript body:style body:css body:h body:img body:var body:articl body:ad body:http body:span body:prop /str Does anyone know a way to transform the html so that the content can be parsed out of the content stream and processed w/o the markup? Or do I need to write my own HTMLParsingMoreLikeThisHandler? You'd want to parse the HTML to extract only text first, and use that for your index data. Both the Nutch and Tika OSS projects have examples of using HTML parsers (based on TagSoup or CyberNeko) to generate content suitable for indexing. -- Ken If I parse the content out to a plain text file and point the stream.url param to file:///parsedfile.txt it works great. -Jay -- Ken Krugler TransPac Software, Inc. http://www.transpac.com +1 530-210-6378 -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
UTF-8 query support?
Hi, I tried to query my text field with a UTF-8 string that was in the indexed document, but it returned nothing. e.g. http://192.168.2.10:8081/solr4/select/?q=%E5%BE%93%E6%9D%A5%E9%80%9A%E3% 82%8Aversion=2.2start=0rows=10indent=on The result page showed a garbled query string (wrong encoding). str name=qå¾æ¥éã/str How do I set UTF-8 encoding so lucene can find the documents since it supports UTF-8 queries? thanks! Darren
Re: mergeContiguous for multiple search terms
Hachmann, Bjoern wrote: Hello, we would like to use the highlightingComponent with the mergeContiguous parameter set to true. We have a field with value: Ökonom Charles Goodhart. If we search for all three words, they are found correctly: emÖkonom/em emCharles/em emGoodhart/em But, as I set the mergeContiguous parameter to true, I expected: emÖkonom Charles Goodhart/em. Am I misunderstanding the behaviour of this parameter? We are using the dismax-query parser and solr-1.3. Currrent highlighter doesn't support this type of highlighting. Using FastVectorHighlighter in Lucene 2.9, when you query phrase (Ökonom Charles Goodhart), you can expect the output you mentioned above. But it hasn't been in Solr yet. Koji
Re: UTF-8 query support?
Your URL suggests you set up your own servlet container - that's probably the issue. If you're using tomcat see http://wiki.apache.org/solr/SolrTomcat Test out your config with example/exampledocs/test_utf8.sh -Yonik http://www.lucidimagination.com On Mon, Aug 10, 2009 at 10:19 AM, Darren Govonidar...@ontrenet.com wrote: Hi, I tried to query my text field with a UTF-8 string that was in the indexed document, but it returned nothing. e.g. http://192.168.2.10:8081/solr4/select/?q=%E5%BE%93%E6%9D%A5%E9%80%9A%E3% 82%8Aversion=2.2start=0rows=10indent=on The result page showed a garbled query string (wrong encoding). str name=qå¾“æ ¥é€šã‚Š/str How do I set UTF-8 encoding so lucene can find the documents since it supports UTF-8 queries? thanks! Darren
Re: UTF-8 query support?
On Mon, Aug 10, 2009 at 4:19 PM, Darren Govonidar...@ontrenet.com wrote: How do I set UTF-8 encoding so lucene can find the documents since it supports UTF-8 queries? This depends on the app server you're using. I'm guessing Tomcat (as that's where I had the same issue), and you can fix this by enabling UTF-8 encoded query strings in Tomcat itself: http://wiki.apache.org/solr/SolrTomcat#head-20147ee4d9dd5ca83ed264898280ab60457847c4 --mats
Re: Embedded Solr Clustering
Thanks Shalin and Avlesh for your responses. Yes we are using Solr for a non-traditional search purpose and the performance is critical. However it sounds like that sharing the same index could slow down reading / writing to the index. And access synchronization is tricky as well. Therefore, we might have to use a single web based Solr instance or use multiple embedded Solr instances and setup the script based replication. Thanks again for your help! born2fish wrote: Hi everyone, We have a web app that uses embedded solr for better performance. Now we are trying to deploy the app to a clustered environment. My question is: 1. Can we configure the embedded solr instances to share the same index on the network? 2. If the answer to question 1 is no, can we configure embedded solr instances to replicate indexes in a master / slave fashion just like normal web based Solr? Thanks, born2fish -- View this message in context: http://www.nabble.com/Embedded-Solr-Clustering-tp24891931p24900854.html Sent from the Solr - User mailing list archive at Nabble.com.
[OT] Solr Webinar
I will be giving a free one hour webinar on getting started with Apache Solr on August 13th, 2009 ~ 11:00 AM PDT / 2:00 PM EDT You can sign up @ http://www2.eventsvc.com/lucidimagination/081309?trk=WR-AUG2009-AP I will present and demo: * Getting started with LucidWorks for Solr * Getting better, faster results using Solr's findability and relevance improvement tools * Deploying Solr in production, including monitoring performance and trends with the LucidGaze for Solr performance profiler -Grant
Re: [OT] Solr Webinar
Hello Grant, Will the webinar be recorded and available to download later someplace? Unfortunately, I can't watch this time. Thanks, []s, Lucas Frare Teixeira .·. - lucas...@gmail.com - blog.lucastex.com - twitter.com/lucastex On Mon, Aug 10, 2009 at 12:33 PM, Grant Ingersoll gsing...@apache.orgwrote: I will be giving a free one hour webinar on getting started with Apache Solr on August 13th, 2009 ~ 11:00 AM PDT / 2:00 PM EDT You can sign up @ http://www2.eventsvc.com/lucidimagination/081309?trk=WR-AUG2009-AP I will present and demo: * Getting started with LucidWorks for Solr * Getting better, faster results using Solr's findability and relevance improvement tools * Deploying Solr in production, including monitoring performance and trends with the LucidGaze for Solr performance profiler -Grant
Re: Relevant results with DisMaxRequestHandler
Hello, Thank you for your answer, I finally used only a 'qf' parameter in the dismax requesthandler and it seems that I have now better and more relevant results. I just don't understand why a result is mainly boosted by his last update by default ! Vincent -- View this message in context: http://www.nabble.com/Relevant-results-with-DisMaxRequestHandler-tp24716870p24903143.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Relevant results with DisMaxRequestHandler
I actually have an other question... The 'qf' parameter used in the dismax seems to work with a 'AND' separator. I have much more results without dixmax. Is there any way to keep the same amount of document and process the 'qf' ? My dismax : requestHandler name=dismax class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qf text^0.5 title_ac^4.0 name_ac^4.0 authors_list_sm^4.0 /str /lst /requestHandler -- View this message in context: http://www.nabble.com/Relevant-results-with-DisMaxRequestHandler-tp24716870p24903219.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dealing with duplicates
so in the case someone can help me with the query syntax, the relational query i would use for this would be something like: SELECT * FROM videos WHERE title LIKE 'family guy' AND desc LIKE 'stewie%' AND ( ( is_dup = 0 ) OR ( is_dup = 1 AND id NOT IN ( SELECT id FROM videos WHERE title LIKE 'family guy' AND desc LIKE 'stewie%' AND is_dup = 0 ) ) ) ORDER BY views LIMIT 10 can a similar query be written in lucene or do i need to structure my index differently to be able to do such a query? thx much --joe On Sat, Aug 1, 2009 at 9:15 AM, Joe Calderoncalderon@gmail.com wrote: hello, thanks for the response, i did take a look at that document but in my application i actually want the duplicates, as i mentioned, the matching text could be very different among cluster members, what joins them together is a similar set of numeric features. currently i do a query with fq=duplicate:0 and show a link to optionally show the dupes via by querying for all dupes of the master id, however im currently missing any documents that matched the query but are duplicates of other masters not included in that result set. in a relational database (fulltext indexing aside) i would use a subquery, i imagine a similar approach could be used with lucene, i just dont know the syntax best, --joe On Fri, Jul 31, 2009 at 11:32 PM, Otis Gospodneticotis_gospodne...@yahoo.com wrote: Joe, Maybe we can take a step back first. Would it be better if your index was cleaner and didn't have flagged duplicates in the first place? If so, have you tried using http://wiki.apache.org/solr/Deduplication ? Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Joe Calderon calderon@gmail.com To: solr-user@lucene.apache.org Sent: Friday, July 31, 2009 5:06:48 PM Subject: dealing with duplicates hello all, i have a collection of a few million documents; i have many duplicates in this collection. they have been clustered with a simple algorithm, i have a field called 'duplicate' which is 0 or 1 and a fields called 'description, tags, meta', documents are clustered on different criteria and the text i search against could be very different among members of a cluster. im currently using a dismax handler to search across the text fields with different boosts, and a filter query to restrict to masters (duplicate: 0) my question is then, how do i best query for documents which are masters OR match text but are not included in the matched set of masters? does this make sense?
Re: UTF-8 query support?
Thank you! I am using Tomcat and will give it a try. On Mon, 2009-08-10 at 16:31 +0200, Mats Lindh wrote: On Mon, Aug 10, 2009 at 4:19 PM, Darren Govonidar...@ontrenet.com wrote: How do I set UTF-8 encoding so lucene can find the documents since it supports UTF-8 queries? This depends on the app server you're using. I'm guessing Tomcat (as that's where I had the same issue), and you can fix this by enabling UTF-8 encoded query strings in Tomcat itself: http://wiki.apache.org/solr/SolrTomcat#head-20147ee4d9dd5ca83ed264898280ab60457847c4 --mats
Overview of Query Parsing API Stack? / Dismax parsing, new 1.4 parsing, etc.
There's some good Wiki pages on the syntax to use for queries, including nested queries. But trying traipse through the code to get the big picture is a bit involved. A couple example: Over the past few months I've had several questions about dismax, and why it was or wasn't doing something a certain way. I came up with a workaround for CJK, but today I'm back looking at the shingles stuff today and where, exactly, shingle queries break. I found the logical discussions about *why* in some of the threads, but the actual code path makes quite a few hops, to util classes, and to Lucene, etc. I'll get there eventually, but having a map would be nice. Another example, at the last Meetup it was mentioned that big changes are coming to query parsing pretty soon. Understanding the before and after logic would be nice, and I don't recall whether that impacted just Lucene, or if Solr was also going to be affected. -- Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
Re: excluding certain terms from facet counts when faceting based on indexed terms of a field
I just upgraded to Solr 1.4/Lucene 2.9 for something else so I am trying to see if I can use localParams to exclude certain terms from the facet counts. I tried the suggested: facet.field={!terms=foo,bar}cat actually only shows the facet counts of foo and bar. What I want is to exclude a value from the facet counts so I tried: facet.field={!ex=cat:foo}cat but that has not effect as as foo still show up in the facet counts. Still looking... Bill On Thu, Jul 23, 2009 at 11:53 AM, Bill Au bill.w...@gmail.com wrote: That's actually what we have been doing. I was just wondering if there is any way to move this work from the client back into Solr. Bill On Thu, Jul 23, 2009 at 11:47 AM, Erik Hatcher e...@ehatchersolutions.com wrote: Give it is a small number of terms, seems like just excluding them from use/visibility on the client would be reasonable. Erik On Jul 23, 2009, at 11:43 AM, Bill Au wrote: I want to exclude a very small number of terms which will be different for each query. So I think my best bet is to use localParam. Bill On Wed, Jul 22, 2009 at 4:16 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I am faceting based on the indexed terms of a field by using facet.field. : Is there any way to exclude certain terms from the facet counts? if you're talking about a lot of terms, and they're going to be hte same for *all* queries, the best appraoch is to strip them out when indexing (StopWordFilter is your freind) -Hoss
Question mark glyphs in indexed content
Hello, I am using the latest Solr4j to index content. When I look at that content in the Solr Admin web utility I see weird characters like this: http://brockwine.com/images/solrglyphs.png When I look at the text in the MySQL DB those chars appear to just be plain hyphens. The MySQL table character set is utf8 and the collation is utf8. Environment: OS X 10.5.8 java version 1.5.0_19 Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_19-b02-304) Java HotSpot(TM) Client VM (build 1.5.0_19-137, mixed mode, sharing) Solr Specification Version: 1.3.0 Solr Implementation Version: 1.3.0 694707 - grantingersoll - 2008-09-12 11:06:47 Lucene Specification Version: 2.4-dev Lucene Implementation Version: 2.4-dev 691741 - 2008-09-03 15:25:16 Jetty 6.1.3 Any thoughts? Thanks /Rupert
Newbie problem ordering results
Hello everybody I have the following (resumed) schema: field name=title type=text indexed=true stored=true multiValued=true/ field name=titleorder type=string indexed=true stored=true multiValued=true/ field name=contributor type=text indexed=true stored=true multiValued=true/ field name=contributorfacet type=textFacetN indexed=true stored=true multiValued=true/ field name=contributororder type=string indexed=true stored=true multiValued=true/ . copyField source=title dest=text / copyField source=title dest=titleorder / copyField source=contributor dest=text / copyField source=contributor dest=contributorfacet / copyField source=contributor dest=contributororder / ... I use for instance contributor for searching, contributorfacet for faceting and order for ordering results, but when I try to order using contributororder, Solr says that cannot order by a tokenized field...(?) I'm using Solr 1.4 nightly. Is this a bug? I believe that in previous versions I have this issue working... Regards and thanks Germán
Re: dealing with duplicates
Can you please provide your schema details here? Cheers Avlesh On Tue, Aug 11, 2009 at 1:29 AM, Joe Calderon calderon@gmail.comwrote: so in the case someone can help me with the query syntax, the relational query i would use for this would be something like: SELECT * FROM videos WHERE title LIKE 'family guy' AND desc LIKE 'stewie%' AND ( ( is_dup = 0 ) OR ( is_dup = 1 AND id NOT IN ( SELECT id FROM videos WHERE title LIKE 'family guy' AND desc LIKE 'stewie%' AND is_dup = 0 ) ) ) ORDER BY views LIMIT 10 can a similar query be written in lucene or do i need to structure my index differently to be able to do such a query? thx much --joe On Sat, Aug 1, 2009 at 9:15 AM, Joe Calderoncalderon@gmail.com wrote: hello, thanks for the response, i did take a look at that document but in my application i actually want the duplicates, as i mentioned, the matching text could be very different among cluster members, what joins them together is a similar set of numeric features. currently i do a query with fq=duplicate:0 and show a link to optionally show the dupes via by querying for all dupes of the master id, however im currently missing any documents that matched the query but are duplicates of other masters not included in that result set. in a relational database (fulltext indexing aside) i would use a subquery, i imagine a similar approach could be used with lucene, i just dont know the syntax best, --joe On Fri, Jul 31, 2009 at 11:32 PM, Otis Gospodneticotis_gospodne...@yahoo.com wrote: Joe, Maybe we can take a step back first. Would it be better if your index was cleaner and didn't have flagged duplicates in the first place? If so, have you tried using http://wiki.apache.org/solr/Deduplication ? Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Joe Calderon calderon@gmail.com To: solr-user@lucene.apache.org Sent: Friday, July 31, 2009 5:06:48 PM Subject: dealing with duplicates hello all, i have a collection of a few million documents; i have many duplicates in this collection. they have been clustered with a simple algorithm, i have a field called 'duplicate' which is 0 or 1 and a fields called 'description, tags, meta', documents are clustered on different criteria and the text i search against could be very different among members of a cluster. im currently using a dismax handler to search across the text fields with different boosts, and a filter query to restrict to masters (duplicate: 0) my question is then, how do i best query for documents which are masters OR match text but are not included in the matched set of masters? does this make sense?
Re: Newbie problem ordering results
Can you please post the fieldType definition for the string field in your schema.xml? Cheers Avlesh On Tue, Aug 11, 2009 at 9:52 AM, Germán Biozzoli germanbiozz...@gmail.comwrote: Hello everybody I have the following (resumed) schema: field name=title type=text indexed=true stored=true multiValued=true/ field name=titleorder type=string indexed=true stored=true multiValued=true/ field name=contributor type=text indexed=true stored=true multiValued=true/ field name=contributorfacet type=textFacetN indexed=true stored=true multiValued=true/ field name=contributororder type=string indexed=true stored=true multiValued=true/ . copyField source=title dest=text / copyField source=title dest=titleorder / copyField source=contributor dest=text / copyField source=contributor dest=contributorfacet / copyField source=contributor dest=contributororder / ... I use for instance contributor for searching, contributorfacet for faceting and order for ordering results, but when I try to order using contributororder, Solr says that cannot order by a tokenized field...(?) I'm using Solr 1.4 nightly. Is this a bug? I believe that in previous versions I have this issue working... Regards and thanks Germán
Querying Dynamic Fields.. simple query not working
Hi, when I do a *:* query I can see the dynamic field as show below: str name=ne_.*{Germinait=0.7}/str but when I try to query for the same like ne_Germinait:0.7 I get zero records. All the other field which are not dynamic can be easily queried. Can some one please tell me how to query for dynamic fields? Thanks. Ninad.
Retrieving the boost factor using Solrj CommonsHttpSolrServer
I'm using the solrj CommonsHttpSolrServer to retrieve documents from the index for update. I therefore also need to retrieve the boost factor as else each resubmission would reset the boost factor. I just cant figure out how to retrieve the boost factor. The boost factor is available in the SolrInputDocument, but not in the SolrDocument returned by the SolrServer 'query' method. And there is no relationship between the SolrInputDocument and the SolrDocument (... which in itself is pretty confusing). How can I get the boost factor? Do I have to use 'request' method and parse the result myself? Cheers, Gert. Please help Logica to respect the environment by not printing this email / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen / Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico. This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
Re: Querying Dynamic Fields.. simple query not working
Weird that you get to see a field name like ne_.* in the response. I am afraid that you might be using the field in an incorrect way. Can you share the field definition please? And a peek into how are you populating these fields? Cheers Avlesh On Tue, Aug 11, 2009 at 10:29 AM, Ninad Raut hbase.user.ni...@gmail.comwrote: Hi, when I do a *:* query I can see the dynamic field as show below: str name=ne_.*{Germinait=0.7}/str but when I try to query for the same like ne_Germinait:0.7 I get zero records. All the other field which are not dynamic can be easily queried. Can some one please tell me how to query for dynamic fields? Thanks. Ninad.
Re: Querying Dynamic Fields.. simple query not working
This is the POJO field mapping: @Field(*_ne) MapString,String ne = new HashMapString,String(); this is how I set the value: MapString,String namedEntity = new HashMapString,String(); namedEntity.put(Germinait, 0.7); ithursDocument.setNe(namedEntity); server.addBean(ithursDocument); server.commit(); The schema had this dynamic field: dynamicField name=ne_* type=string indexed=true stored=true/ Let me know if something is missing. Thanks Avlesh. On Tue, Aug 11, 2009 at 10:34 AM, Avlesh Singh avl...@gmail.com wrote: Weird that you get to see a field name like ne_.* in the response. I am afraid that you might be using the field in an incorrect way. Can you share the field definition please? And a peek into how are you populating these fields? Cheers Avlesh On Tue, Aug 11, 2009 at 10:29 AM, Ninad Raut hbase.user.ni...@gmail.com wrote: Hi, when I do a *:* query I can see the dynamic field as show below: str name=ne_.*{Germinait=0.7}/str but when I try to query for the same like ne_Germinait:0.7 I get zero records. All the other field which are not dynamic can be easily queried. Can some one please tell me how to query for dynamic fields? Thanks. Ninad.
Re: Querying Dynamic Fields.. simple query not working
Ah! I guessed you were using it this way. I would need to reconfirm this, but there seems to be an inconsistency in fetching data versus adding data via SolrJ w.r.t dynamic fields. SOLR-1129https://issues.apache.org/jira/browse/SOLR-1129is essentially about binding the response into a bean with a Map type property. My guess is that SolrInputDocument is yet to understand the map type property while firing update requests. I don't think it works in the way you have used it :( Noble, can you please confirm this? If my guess turns out to be true, lets open a JIRA issue asap. Cheers Avlesh On Tue, Aug 11, 2009 at 10:45 AM, Ninad Raut hbase.user.ni...@gmail.comwrote: This is the POJO field mapping: @Field(*_ne) MapString,String ne = new HashMapString,String(); this is how I set the value: MapString,String namedEntity = new HashMapString,String(); namedEntity.put(Germinait, 0.7); ithursDocument.setNe(namedEntity); server.addBean(ithursDocument); server.commit(); The schema had this dynamic field: dynamicField name=ne_* type=string indexed=true stored=true/ Let me know if something is missing. Thanks Avlesh. On Tue, Aug 11, 2009 at 10:34 AM, Avlesh Singh avl...@gmail.com wrote: Weird that you get to see a field name like ne_.* in the response. I am afraid that you might be using the field in an incorrect way. Can you share the field definition please? And a peek into how are you populating these fields? Cheers Avlesh On Tue, Aug 11, 2009 at 10:29 AM, Ninad Raut hbase.user.ni...@gmail.com wrote: Hi, when I do a *:* query I can see the dynamic field as show below: str name=ne_.*{Germinait=0.7}/str but when I try to query for the same like ne_Germinait:0.7 I get zero records. All the other field which are not dynamic can be easily queried. Can some one please tell me how to query for dynamic fields? Thanks. Ninad.
Re: Querying Dynamic Fields.. simple query not working
Hi Avlesh, Can you tell me a work around to this problem?? Till you have this resolved.:) Regards, Ninad. On Tue, Aug 11, 2009 at 11:16 AM, Avlesh Singh avl...@gmail.com wrote: Ah! I guessed you were using it this way. I would need to reconfirm this, but there seems to be an inconsistency in fetching data versus adding data via SolrJ w.r.t dynamic fields. SOLR-1129https://issues.apache.org/jira/browse/SOLR-1129is essentially about binding the response into a bean with a Map type property. My guess is that SolrInputDocument is yet to understand the map type property while firing update requests. I don't think it works in the way you have used it :( Noble, can you please confirm this? If my guess turns out to be true, lets open a JIRA issue asap. Cheers Avlesh On Tue, Aug 11, 2009 at 10:45 AM, Ninad Raut hbase.user.ni...@gmail.com wrote: This is the POJO field mapping: @Field(*_ne) MapString,String ne = new HashMapString,String(); this is how I set the value: MapString,String namedEntity = new HashMapString,String(); namedEntity.put(Germinait, 0.7); ithursDocument.setNe(namedEntity); server.addBean(ithursDocument); server.commit(); The schema had this dynamic field: dynamicField name=ne_* type=string indexed=true stored=true/ Let me know if something is missing. Thanks Avlesh. On Tue, Aug 11, 2009 at 10:34 AM, Avlesh Singh avl...@gmail.com wrote: Weird that you get to see a field name like ne_.* in the response. I am afraid that you might be using the field in an incorrect way. Can you share the field definition please? And a peek into how are you populating these fields? Cheers Avlesh On Tue, Aug 11, 2009 at 10:29 AM, Ninad Raut hbase.user.ni...@gmail.com wrote: Hi, when I do a *:* query I can see the dynamic field as show below: str name=ne_.*{Germinait=0.7}/str but when I try to query for the same like ne_Germinait:0.7 I get zero records. All the other field which are not dynamic can be easily queried. Can some one please tell me how to query for dynamic fields? Thanks. Ninad.
Re: Querying Dynamic Fields.. simple query not working
Well there are multiple ways to do it. Instead of using your own class (with annotated fields), you can directly use an instance of SolrInputDocument for each document and call a SolrServer.add(SolrInputDocument doc). For each SolrInputDocument, you can use the addField(String name, Object value) to add data per field. For dynamic fields, just pass in the full field name, Germinait_ne in your case, as the first argument and 0.7 as the second one. Search the way you were doing earlier. Cheers Avlesh On Tue, Aug 11, 2009 at 11:20 AM, Ninad Raut hbase.user.ni...@gmail.comwrote: Hi Avlesh, Can you tell me a work around to this problem?? Till you have this resolved.:) Regards, Ninad. On Tue, Aug 11, 2009 at 11:16 AM, Avlesh Singh avl...@gmail.com wrote: Ah! I guessed you were using it this way. I would need to reconfirm this, but there seems to be an inconsistency in fetching data versus adding data via SolrJ w.r.t dynamic fields. SOLR-1129https://issues.apache.org/jira/browse/SOLR-1129is essentially about binding the response into a bean with a Map type property. My guess is that SolrInputDocument is yet to understand the map type property while firing update requests. I don't think it works in the way you have used it :( Noble, can you please confirm this? If my guess turns out to be true, lets open a JIRA issue asap. Cheers Avlesh On Tue, Aug 11, 2009 at 10:45 AM, Ninad Raut hbase.user.ni...@gmail.com wrote: This is the POJO field mapping: @Field(*_ne) MapString,String ne = new HashMapString,String(); this is how I set the value: MapString,String namedEntity = new HashMapString,String(); namedEntity.put(Germinait, 0.7); ithursDocument.setNe(namedEntity); server.addBean(ithursDocument); server.commit(); The schema had this dynamic field: dynamicField name=ne_* type=string indexed=true stored=true/ Let me know if something is missing. Thanks Avlesh. On Tue, Aug 11, 2009 at 10:34 AM, Avlesh Singh avl...@gmail.com wrote: Weird that you get to see a field name like ne_.* in the response. I am afraid that you might be using the field in an incorrect way. Can you share the field definition please? And a peek into how are you populating these fields? Cheers Avlesh On Tue, Aug 11, 2009 at 10:29 AM, Ninad Raut hbase.user.ni...@gmail.com wrote: Hi, when I do a *:* query I can see the dynamic field as show below: str name=ne_.*{Germinait=0.7}/str but when I try to query for the same like ne_Germinait:0.7 I get zero records. All the other field which are not dynamic can be easily queried. Can some one please tell me how to query for dynamic fields? Thanks. Ninad.