Re: Highlighting in SolrJ?
Thanks Jay! On Sat, Sep 12, 2009 at 10:03 PM, Jay Hill jayallenh...@gmail.com wrote: Will do Shalin. -Jay http://www.lucidimagination.com On Fri, Sep 11, 2009 at 9:23 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Jay, it would be great if you can add this example to the Solrj wiki: http://wiki.apache.org/solr/Solrj On Fri, Sep 11, 2009 at 5:15 AM, Jay Hill jayallenh...@gmail.com wrote: Set up the query like this to highlight a field named content: SolrQuery query = new SolrQuery(); query.setQuery(foo); query.setHighlight(true).setHighlightSnippets(1); //set other params as needed query.setParam(hl.fl, content); QueryResponse queryResponse =getSolrServer().query(query); Then to get back the highlight results you need something like this: IteratorSolrDocument iter = queryResponse.getResults(); while (iter.hasNext()) { SolrDocument resultDoc = iter.next(); String content = (String) resultDoc.getFieldValue(content)); String id = (String) resultDoc.getFieldValue(id); //id is the uniqueKey field if (queryResponse.getHighlighting().get(id) != null) { ListString highightSnippets = queryResponse.getHighlighting().get(id).get(content); } } Hope that gets you what you need. -Jay http://www.lucidimagination.com On Thu, Sep 10, 2009 at 3:19 PM, Paul Tomblin ptomb...@xcski.com wrote: Can somebody point me to some sample code for using highlighting in SolrJ? I understand the highlighted versions of the field comes in a separate NamedList? How does that work? -- http://www.linkedin.com/in/paultomblin -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Re: Highlighting in SolrJ?
Thanks to Jay, I have my code doing what I need it to do. If anybody cares, this is my code: SolrQuery query = new SolrQuery(); query.setQuery(searchTerm); query.addFilterQuery(Chunk.SOLR_KEY_CONCEPT + : + concept); query.addFilterQuery(Chunk.SOLR_KEY_CATEGORY + : + category); if (maxChunks 0) query.setRows(maxChunks); // Set highlighting fields query.setHighlight(true); query.setHighlightFragsize(0); query.addHighlightField(Chunk.SOLR_KEY_TEXT); query.setHighlightSnippets(1); query.setHighlightSimplePre(b); query.setHighlightSimplePost(/b); QueryResponse resp = solrChunkServer.query(query); SolrDocumentList docs = resp.getResults(); retCode = new ArrayListChunk(docs.size()); for (SolrDocument doc : docs) { LOG.debug(got doc + doc); Chunk chunk = new Chunk(doc); // retrieve highlighting ListString highlights = resp.getHighlighting().get(chunk.getId()).get(Chunk.SOLR_KEY_TEXT); if (highlights != null highlights.size() 0) chunk.setHighlighted(highlights.get(0)); retCode.add(chunk); } -- http://www.linkedin.com/in/paultomblin
CSV Update - Need help mapping csv field to schema's ID
Using http://localhost:8983/solr/update/csv?stream.file, is there any way to map one of the csv fields to one's schema unique id? e.g. A file with 3 fields (sku, product,price): http://localhost:8983/solr/update/csv?stream.file=products.csvstream.contentType=text/plain;charset=utf-8header=trueseparator=%2cencapsulator=%22escape=%5cfieldnames=sku,product,price I would like to add an additional name:value pair for every line, mapping the sku field to my schema's id field: .map={sku.field}:{id} I would prefer NOT to change the schema by adding a copyField source=sku dest=id/. I read: http://wiki.apache.org/solr/UpdateCSV, but can't quite get it. Thanks! Dan
[DIH] Multiple repeat XPath stmts
I'm trying to import several RSS feeds using DIH and running into a bit of a problem. Some feeds define a GUID value that I map to my Solr ID, while others don't. I also have a link field which I fill in with the RSS link field. For the feeds that don't have the GUID value set, I want to use the link field as the id. However, if I define the same XPath twice, but map it to two diff. columns I don't get the id value set. For instance, I want to do: schema.xml field name=id type=string indexed=true stored=true required=true/ field name=link type=string indexed=true stored=false/ DIH config: field column=id xpath=/rss/channel/item/link / field column=link xpath=/rss/channel/item/link / Because I am consolidating multiple fields, I'm not able to do copyFields, unless of course, I wanted to implement conditional copy fields (only copy if the field is not defined) which I would rather not. How do I solve this? Thanks, Grant
Re: [DIH] Multiple repeat XPath stmts
I'm trying to import several RSS feeds using DIH and running into a bit of a problem. Some feeds define a GUID value that I map to my Solr ID, while others don't. I also have a link field which I fill in with the RSS link field. For the feeds that don't have the GUID value set, I want to use the link field as the id. However, if I define the same XPath twice, but map it to two diff. columns I don't get the id value set. For instance, I want to do: schema.xml field name=id type=string indexed=true stored=true required=true/ field name=link type=string indexed=true stored=false/ DIH config: field column=id xpath=/rss/channel/item/link / field column=link xpath=/rss/channel/item/link / Because I am consolidating multiple fields, I'm not able to do copyFields, unless of course, I wanted to implement conditional copy fields (only copy if the field is not defined) which I would rather not. How do I solve this? How about. entity name=x ... transformer=TemplateTransformer field column=link xpath=/rss/channel/item/link / field column=GUID xpath=/rss/channel/GUID / field column=id template=${x.link} / field column-id template=${x.GUID} / The TemplateTransformer does nothing if its source expression is null. So the first transform assign the fallback value to ID, this is overwritten by the GUID if it is defined. You can not sort of do if-then-else using a combination of template and regex transformers. Adding a bit of maths to the transformers and I think we will have a turing complete language:-) fergus. Thanks, Grant -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
Seeking help setting up solr in eclipse
Hi, I'ld like to set up Eclipse to run solr (in Tomcat for example), but struggling with the issue that I can't get the index.jsp and other files to be properly executed, for debugging and working on a plugin. I've checked out solr via subclipse plugin, created a Dynamic Web Project. It seems that I've to know in advance which directories contain the proper web files. Since I can't find a definitive UI to change that aftewards, I modified the .settings/org.eclipse.wst.common.component by hand, but I can't get it work. When I open solr/src/webapp/web/index.jsp via Run as/Run on Server, Tomcat gets started and the browser window opens the URL http://localhost:8080/solr/index.jsp which only gives me a HTTP Status 404 - /solr/index.jsp . That's straight to the point for me, but I'm not sure where to fix this. My org.eclipse.wst.common.component looks like this: ?xml version=1.0 encoding=UTF-8? project-modules id=moduleCoreId project-version=1.5.0 wb-module deploy-name=solr wb-resource deploy-path=/ source-path=/src/webapp/web/ wb-resource deploy-path=/WEB-INF/classes source-path=/src/common/ wb-resource deploy-path=/WEB-INF/classes source-path=/src/java/ wb-resource deploy-path=/WEB-INF/classes source-path=/src/webapp/src/ wb-resource deploy-path=/WEB-INF/classes source-path=/src/webapp/web/ property name=java-output-path/ property name=context-root value=// /wb-module /project-modules I see that Tomcat gets started with these values (stripped path to workspace): /usr/lib/jvm/java-6-sun-1.6.0.15/bin/java -Dcatalina.base=/workspace/.metadata/.plugins/org.eclipse.wst.server.core/tmp0 -Dcatalina.home=/apache-tomcat-6.0.20 -Dwtp.deploy=/workspace/.metadata/.plugins/org.eclipse.wst.server.core/tmp0/wtpwebapps -Djava.endorsed.dirs=/apache-tomcat-6.0.20/endorsed -Dfile.encoding=UTF-8 -classpath /apache-tomcat-6.0.20/bin/bootstrap.jar:/usr/lib/jvm/java-6-sun-1.6.0.15/lib/tools.jar org.apache.catalina.startup.Bootstrap start The configuration files in /workspace/Servers/Tomcat v6.0 Server at localhost-config, e.g. server.xml, contain: Host appBase=webapps autoDeploy=true name=localhost unpackWARs=true xmlNamespaceAware=false xmlValidation=falseContext docBase=solr path=/solr reloadable=true source=org.eclipse.jst.jee.server:solr//Host I see files copied, e.g. /workspace/.metadata/.plugins/org.eclipse.wst.server.core/tmp0/wtpwebapps/solr/WEB-INF/classes/index.jsp I'm bumping against a wall currently, I don't see the woods anymore ... thanks for any help, - Markus
When to optimize?
Folks: Are there good rules of thumb for when to optimize? We have a large index consisting of approx 7M documents and we currently have it set to optimize once a day. But sometimes there are very few changes that have been committed during a day and it seems like a waste to optimize (esp. since our servers are pretty well loaded). So I was looking to get some good rules of thumb for when it makes sense to optimize: Optimize when x% of the documents have been changed since the last optimize or some such. Any ideas would be greatly appreciated! -- Bill
Re: When to optimize?
I would say once a day is a pretty good rule of thumb. If you think this is a bit much and if you have few updates you can probably back that off to once every couple days to once a week. However, if you have a large batch update or your query performance starts to degrade, you will need to optimize your index. Thanks, Matt Weber On Sep 13, 2009, at 6:21 PM, William Pierce wrote: Folks: Are there good rules of thumb for when to optimize? We have a large index consisting of approx 7M documents and we currently have it set to optimize once a day. But sometimes there are very few changes that have been committed during a day and it seems like a waste to optimize (esp. since our servers are pretty well loaded). So I was looking to get some good rules of thumb for when it makes sense to optimize: Optimize when x% of the documents have been changed since the last optimize or some such. Any ideas would be greatly appreciated! -- Bill
stopfilterFactory isn't removing field name
I'm kind of stumped by this one.. is it something obvious? I'm running the latest trunk. In some cases the stopFilterFactory isn't removing the field name. Thanks in advance, -mike From debugQuery (both words are in the stopwords file): http://localhost:8983/solr/select?q=citations:fordebugQuery=true str name=rawquerystringcitations:for/str str name=querystringcitations:for/str str name=parsedquerycitations:/str str name=parsedquery_toStringcitations:/str http://localhost:8983/solr/select?q=citations:thedebugQuery=true str name=rawquerystringcitations:the/str str name=querystringcitations:the/str str name=parsedquery/str str name=parsedquery_toString/str schema analyzer for this field: !-- Citation text -- fieldType name=citationtext class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=substitutions.txt ignoreCase=true expand=false/ filter class=solr.StandardFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=false words=citationstopwords.txt/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ !--filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/-- /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=substitutions.txt ignoreCase=true expand=false/ filter class=solr.StandardFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=false words=citationstopwords.txt/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ !-- filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ -- /analyzer /fieldType
Re: about replication
replication uses httpclient for connection. It is likely that you notice some CLOSE_WAIT . But , how many do you see? On Mon, Sep 14, 2009 at 6:37 AM, liugang8440265 liugang8440...@huawei.com wrote: hi,I hava a problem about solr-replication. Every time I use the replication api to replicate index , A TCP connection with CLOSE_WAIT status always appears. At last ,there will be many CLOSE_WAIT connections. I used the one time replication api like this: http://localhost:8983/solr/core2/replication?command=fetchindexmasterUrl=http://localhost:8983/solr/core1/replication this is my conf about replication: requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=replicateAftercommit/str /lst /requestHandler and both cores use the same config file. Waiting for your reply. Jack Liu. 2009-09-14 liugang8440265 -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: [DIH] Multiple repeat XPath stmts
The XPathRecordreader has a limit one mapping per xpath. So copying is the best solution On Mon, Sep 14, 2009 at 2:54 AM, Fergus McMenemie fer...@twig.me.uk wrote: I'm trying to import several RSS feeds using DIH and running into a bit of a problem. Some feeds define a GUID value that I map to my Solr ID, while others don't. I also have a link field which I fill in with the RSS link field. For the feeds that don't have the GUID value set, I want to use the link field as the id. However, if I define the same XPath twice, but map it to two diff. columns I don't get the id value set. For instance, I want to do: schema.xml field name=id type=string indexed=true stored=true required=true/ field name=link type=string indexed=true stored=false/ DIH config: field column=id xpath=/rss/channel/item/link / field column=link xpath=/rss/channel/item/link / Because I am consolidating multiple fields, I'm not able to do copyFields, unless of course, I wanted to implement conditional copy fields (only copy if the field is not defined) which I would rather not. How do I solve this? How about. entity name=x ... transformer=TemplateTransformer field column=link xpath=/rss/channel/item/link / field column=GUID xpath=/rss/channel/GUID / field column=id template=${x.link} / field column-id template=${x.GUID} / The TemplateTransformer does nothing if its source expression is null. So the first transform assign the fallback value to ID, this is overwritten by the GUID if it is defined. You can not sort of do if-then-else using a combination of template and regex transformers. Adding a bit of maths to the transformers and I think we will have a turing complete language:-) fergus. Thanks, Grant -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer === -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: stopfilterFactory isn't removing field name
That's pretty strange... perhaps something to do with your synonyms file mapping for to a zero length token? -Yonik http://www.lucidimagination.com On Mon, Sep 14, 2009 at 12:13 AM, mike anderson saidthero...@gmail.com wrote: I'm kind of stumped by this one.. is it something obvious? I'm running the latest trunk. In some cases the stopFilterFactory isn't removing the field name. Thanks in advance, -mike From debugQuery (both words are in the stopwords file): http://localhost:8983/solr/select?q=citations:fordebugQuery=true str name=rawquerystringcitations:for/str str name=querystringcitations:for/str str name=parsedquerycitations:/str str name=parsedquery_toStringcitations:/str http://localhost:8983/solr/select?q=citations:thedebugQuery=true str name=rawquerystringcitations:the/str str name=querystringcitations:the/str str name=parsedquery/str str name=parsedquery_toString/str schema analyzer for this field: !-- Citation text -- fieldType name=citationtext class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=substitutions.txt ignoreCase=true expand=false/ filter class=solr.StandardFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=false words=citationstopwords.txt/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ !--filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/-- /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=substitutions.txt ignoreCase=true expand=false/ filter class=solr.StandardFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=false words=citationstopwords.txt/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ !-- filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ -- /analyzer /fieldType
Issue on Facet field and exact match
Hi to all, While i am working with facet using solrj, i am using string filed in schema to avoid split in the word(i.e,Rekha dharshana, previously i was getting rekha separate word and dharshana separate word..),in order to avoid this in shema i use two fileds to index. My Schema.xml will look like this field name=userId type=text indexed=true stored=true / field name=blogId type=text indexed=true stored=true / field name=postId type=text indexed=true stored=true / field name=blogTitle type=text indexed=true stored=true / field name=postTitle type=text indexed=true stored=true / field name=postMessage type=text indexed=true stored=true / field name=blogTitle_exact type=string indexed=true stored=false/ field name=blogId_exact type=string indexed=true stored=false/ field name=userId_exact type=string indexed=true stored=false/ field name=postId_exact type=string indexed=true stored=false / field name=postTitle_exact type=string indexed=true stored=false / field name=postMessage_exact type=string indexed=true stored=false / And this is my copy field.. copyField source=blogTitle dest=blogTitle_exact/ copyField source=userId dest=userId_exact/ copyField source=blogId dest=blogId_exact/ copyField source=postId dest=postId_exact/ copyField source=postTitle dest=postTitle_exact/ copyField source=postMessage dest=postMessage_exact/ This is my coding where i add fileds for blog details to solr, SolrInputDocument solrInputDocument = new SolrInputDocument(); solrInputDocument.addField(blogTitle,$Never Fails$); solrInputDocument.addField(blogId,$Never Fails$); solrInputDocument.addField(userId,1); This is my coding to add fileds for post details to solr.. solrInputDocument.addField(blogId,$Never Fails$); solrInputDocument.addField(postId,$Never Fails post$); solrInputDocument.addField(postTitle,$Never Fails post$); solrInputDocument.addField(postMessage,$Never Fails post message$); While i am quering it from solr, this is my coding.. SolrQuery queryOfMyBlog = new SolrQuery(blogId_exact:Never Fails); queryOfMyBlog.setFacet(true); queryOfMyBlog.addFacetField(blogTitle_exact); queryOfMyBlog.addFacetField(userId_exact); queryOfMyBlog.addFacetField(blogId_exact); queryOfMyBlog.setFacetMinCount(1); queryOfMyBlog.setIncludeScore(true); ListFacetField facets = query.getFacetFields(); List listOfAllValues = new ArrayList(); System.out.println(inside facettt size+facets.size()); for(FacetField facet : facets) { System.out.println(inside for); ListFacetField.Count facetEntries = facet.getValues(); for(FacetField.Count fcount : facetEntries) { String s= fcount.getName(); listOfAllValues.add(s); System.out.println(BlogId+s); } } In the above code it copies the field from blogId,blogTitle,userId to blogId_exact,blogTitle_exact,userId_exact,so that i can get the out put ,but while i am indexing it to solr i index the filed in this manner ie.. $Never Fails$ this i do to get an exact search but i am not getting the exact search ,when i try to query only Never Fails it also brings and show me the Success Never Fails field to ..I need only to display Never Fails details of the particular blog,but i even get Success Never Fails along with that... what should i do for this to get exact match.. The above is my First Issue... The next issue is ,when i am querying the post details , the same thing i do ,to get the post details... SolrQuery queryOfMyPost = new SolrQuery(blogId_exact:$Success Never Fails$); queryOfMyPost.setFacet(true); queryOfMyPost.addFacetField(blogId_exact); queryOfMyPost.addFacetField(postId_exact); queryOfMyPost.addFacetField(postTitle_exact); queryOfMyPost.addFacetField(postMessage_exact); queryOfMyPost.setFacetMinCount(1); queryOfMyPost.setIncludeScore(true); ListFacetField facetsForPost = queryPost.getFacetFields(); List listOfAllFacetsForPost = new ArrayList(); System.out.println(inside facettt size+facetsForPost); for(FacetField facetPost1 : facetsForPost) { System.out.println(inside for+facetPost1); ListFacetField.Count facetEntries = facetPost1.getValues(); for(FacetField.Count fcount1 : facetEntries) { String s1= fcount1.getName(); listOfAllFacetsForPost.add(s1); System.out.println(Post details+ s1); } } Here in the above facet filed postId_exact,postTitle_exact,postMessage_exact i get null values.. the copy filed has not been copied to those values.. so i get null values for this.. Please check my code and tell me where i am wrong... And