Re: Highlighting in SolrJ?

2009-09-13 Thread Shalin Shekhar Mangar
Thanks Jay!

On Sat, Sep 12, 2009 at 10:03 PM, Jay Hill jayallenh...@gmail.com wrote:

 Will do Shalin.

 -Jay
 http://www.lucidimagination.com


 On Fri, Sep 11, 2009 at 9:23 PM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

  Jay, it would be great if you can add this example to the Solrj wiki:
 
  http://wiki.apache.org/solr/Solrj
 
  On Fri, Sep 11, 2009 at 5:15 AM, Jay Hill jayallenh...@gmail.com
 wrote:
 
   Set up the query like this to highlight a field named content:
  
  SolrQuery query = new SolrQuery();
  query.setQuery(foo);
  
  query.setHighlight(true).setHighlightSnippets(1); //set other params
  as
   needed
  query.setParam(hl.fl, content);
  
  QueryResponse queryResponse =getSolrServer().query(query);
  
   Then to get back the highlight results you need something like this:
  
  IteratorSolrDocument iter = queryResponse.getResults();
  
  while (iter.hasNext()) {
SolrDocument resultDoc = iter.next();
  
String content = (String) resultDoc.getFieldValue(content));
String id = (String) resultDoc.getFieldValue(id); //id is the
   uniqueKey field
  
if (queryResponse.getHighlighting().get(id) != null) {
  ListString highightSnippets =
   queryResponse.getHighlighting().get(id).get(content);
}
  }
  
   Hope that gets you what you need.
  
   -Jay
   http://www.lucidimagination.com
  
   On Thu, Sep 10, 2009 at 3:19 PM, Paul Tomblin ptomb...@xcski.com
  wrote:
  
Can somebody point me to some sample code for using highlighting in
SolrJ?  I understand the highlighted versions of the field comes in a
separate NamedList?  How does that work?
   
--
http://www.linkedin.com/in/paultomblin
   
  
 
 
 
  --
  Regards,
  Shalin Shekhar Mangar.
 




-- 
Regards,
Shalin Shekhar Mangar.


Re: Highlighting in SolrJ?

2009-09-13 Thread Paul Tomblin
Thanks to Jay, I have my code doing what I need it to do.  If anybody
cares, this is my code:

SolrQuery query = new SolrQuery();
query.setQuery(searchTerm);
query.addFilterQuery(Chunk.SOLR_KEY_CONCEPT + : + concept);
query.addFilterQuery(Chunk.SOLR_KEY_CATEGORY + : + category);
if (maxChunks  0)
query.setRows(maxChunks);

// Set highlighting fields
query.setHighlight(true);
query.setHighlightFragsize(0);
query.addHighlightField(Chunk.SOLR_KEY_TEXT);
query.setHighlightSnippets(1);
query.setHighlightSimplePre(b);
query.setHighlightSimplePost(/b);

QueryResponse resp = solrChunkServer.query(query);
SolrDocumentList docs = resp.getResults();
retCode = new ArrayListChunk(docs.size());
for (SolrDocument doc : docs)
{
LOG.debug(got doc  + doc);
Chunk chunk = new Chunk(doc);

// retrieve highlighting
ListString highlights =
resp.getHighlighting().get(chunk.getId()).get(Chunk.SOLR_KEY_TEXT);
if (highlights != null  highlights.size()  0)
chunk.setHighlighted(highlights.get(0));

retCode.add(chunk);
}



-- 
http://www.linkedin.com/in/paultomblin


CSV Update - Need help mapping csv field to schema's ID

2009-09-13 Thread Insight 49, LLC
Using http://localhost:8983/solr/update/csv?stream.file, is there any 
way to map one of the csv fields to one's schema unique id?


e.g. A file with 3 fields (sku, product,price):
http://localhost:8983/solr/update/csv?stream.file=products.csvstream.contentType=text/plain;charset=utf-8header=trueseparator=%2cencapsulator=%22escape=%5cfieldnames=sku,product,price

I would like to add an additional name:value pair for every line, 
mapping the sku field to my schema's id field:


.map={sku.field}:{id}

I would prefer NOT to change the schema by adding a copyField 
source=sku dest=id/.


I read: http://wiki.apache.org/solr/UpdateCSV, but can't quite get it.

Thanks!

Dan


[DIH] Multiple repeat XPath stmts

2009-09-13 Thread Grant Ingersoll
I'm trying to import several RSS feeds using DIH and running into a  
bit of a problem.  Some feeds define a GUID value that I map to my  
Solr ID, while others don't.  I also have a link field which I fill in  
with the RSS link field.  For the feeds that don't have the GUID value  
set, I want to use the link field as the id.  However, if I define the  
same XPath twice, but map it to two diff. columns I don't get the id  
value set.


For instance, I want to do:
schema.xml
field name=id type=string indexed=true stored=true  
required=true/

field name=link type=string indexed=true stored=false/

DIH config:
field column=id xpath=/rss/channel/item/link /
field column=link xpath=/rss/channel/item/link /

Because I am consolidating multiple fields, I'm not able to do  
copyFields, unless of course, I wanted to implement conditional copy  
fields (only copy if the field is not defined) which I would rather not.


How do I solve this?

Thanks,
Grant


Re: [DIH] Multiple repeat XPath stmts

2009-09-13 Thread Fergus McMenemie
I'm trying to import several RSS feeds using DIH and running into a  
bit of a problem.  Some feeds define a GUID value that I map to my  
Solr ID, while others don't.  I also have a link field which I fill in  
with the RSS link field.  For the feeds that don't have the GUID value  
set, I want to use the link field as the id.  However, if I define the  
same XPath twice, but map it to two diff. columns I don't get the id  
value set.

For instance, I want to do:
schema.xml
field name=id type=string indexed=true stored=true  
required=true/
field name=link type=string indexed=true stored=false/

DIH config:
field column=id xpath=/rss/channel/item/link /
field column=link xpath=/rss/channel/item/link /

Because I am consolidating multiple fields, I'm not able to do  
copyFields, unless of course, I wanted to implement conditional copy  
fields (only copy if the field is not defined) which I would rather not.

How do I solve this?


How about.

entity name=x ... transformer=TemplateTransformer
  field column=link xpath=/rss/channel/item/link /
  field column=GUID xpath=/rss/channel/GUID /
  field column=id   template=${x.link} /
  field column-id   template=${x.GUID} /

The TemplateTransformer does nothing if its source expression is null.
So the first transform assign the fallback value to ID, this is
overwritten by the GUID if it is defined.

You can not sort of do if-then-else using a combination of template
and regex transformers. Adding a bit of maths to the transformers and
I think we will have a turing complete language:-) 

fergus.

Thanks,
Grant

-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Seeking help setting up solr in eclipse

2009-09-13 Thread Markus Fischer
Hi,

I'ld like to set up Eclipse to run solr (in Tomcat for example), but
struggling with the issue that I can't get the index.jsp and other files
to be properly executed, for debugging and working on a plugin.

I've checked out solr via subclipse plugin, created a Dynamic Web
Project. It seems that I've to know in advance which directories contain
the proper web files. Since I can't find a definitive UI to change that
aftewards, I modified the .settings/org.eclipse.wst.common.component by
hand, but I can't get it work.

When I open solr/src/webapp/web/index.jsp via Run as/Run on Server,
Tomcat gets started and the browser window opens the URL
http://localhost:8080/solr/index.jsp which only gives me a HTTP Status
404 - /solr/index.jsp . That's straight to the point for me, but I'm not
sure where to fix this. My org.eclipse.wst.common.component looks like this:

?xml version=1.0 encoding=UTF-8?
project-modules id=moduleCoreId project-version=1.5.0
wb-module deploy-name=solr
wb-resource deploy-path=/ source-path=/src/webapp/web/
wb-resource deploy-path=/WEB-INF/classes
source-path=/src/common/
wb-resource deploy-path=/WEB-INF/classes
source-path=/src/java/
wb-resource deploy-path=/WEB-INF/classes
source-path=/src/webapp/src/
wb-resource deploy-path=/WEB-INF/classes
source-path=/src/webapp/web/
property name=java-output-path/
property name=context-root value=//
/wb-module
/project-modules

I see that Tomcat gets started with these values (stripped path to
workspace):

/usr/lib/jvm/java-6-sun-1.6.0.15/bin/java
-Dcatalina.base=/workspace/.metadata/.plugins/org.eclipse.wst.server.core/tmp0
-Dcatalina.home=/apache-tomcat-6.0.20
-Dwtp.deploy=/workspace/.metadata/.plugins/org.eclipse.wst.server.core/tmp0/wtpwebapps
-Djava.endorsed.dirs=/apache-tomcat-6.0.20/endorsed
-Dfile.encoding=UTF-8 -classpath
/apache-tomcat-6.0.20/bin/bootstrap.jar:/usr/lib/jvm/java-6-sun-1.6.0.15/lib/tools.jar
org.apache.catalina.startup.Bootstrap start

The configuration files in /workspace/Servers/Tomcat v6.0 Server at
localhost-config, e.g. server.xml, contain:

Host appBase=webapps autoDeploy=true name=localhost
unpackWARs=true xmlNamespaceAware=false
xmlValidation=falseContext docBase=solr path=/solr
reloadable=true source=org.eclipse.jst.jee.server:solr//Host

I see files copied, e.g.

/workspace/.metadata/.plugins/org.eclipse.wst.server.core/tmp0/wtpwebapps/solr/WEB-INF/classes/index.jsp

I'm bumping against a wall currently, I don't see the woods anymore ...

thanks for any help,
- Markus


When to optimize?

2009-09-13 Thread William Pierce

Folks:

Are there good rules of thumb for when to optimize?  We have a large index 
consisting of approx 7M documents and we currently have it set to optimize 
once a day.  But sometimes there are very few changes that have been 
committed during a day and it seems like a waste to optimize (esp. since our 
servers are pretty well loaded).


So I was looking to get some good rules of thumb for when it makes sense to 
optimize:   Optimize when x% of the documents have been changed since the 
last optimize or some such.


Any ideas would be greatly appreciated!

-- Bill 



Re: When to optimize?

2009-09-13 Thread Matt Weber
I would say once a day is a pretty good rule of thumb.  If you think  
this is a bit much and if you have few updates you can probably back  
that off to once every couple days to once a week.  However, if you  
have a large batch update or your query performance starts to degrade,  
you will need to optimize your index.


Thanks,

Matt Weber

On Sep 13, 2009, at 6:21 PM, William Pierce wrote:


Folks:

Are there good rules of thumb for when to optimize?  We have a large  
index consisting of approx 7M documents and we currently have it set  
to optimize once a day.  But sometimes there are very few changes  
that have been committed during a day and it seems like a waste to  
optimize (esp. since our servers are pretty well loaded).


So I was looking to get some good rules of thumb for when it makes  
sense to optimize:   Optimize when x% of the documents have been  
changed since the last optimize or some such.


Any ideas would be greatly appreciated!

-- Bill 




stopfilterFactory isn't removing field name

2009-09-13 Thread mike anderson
I'm kind of stumped by this one.. is it something obvious?
I'm running the latest trunk. In some cases the stopFilterFactory isn't
removing the field name.

Thanks in advance,

-mike

From debugQuery (both words are in the stopwords file):

http://localhost:8983/solr/select?q=citations:fordebugQuery=true

str name=rawquerystringcitations:for/str
str name=querystringcitations:for/str
str name=parsedquerycitations:/str
str name=parsedquery_toStringcitations:/str


http://localhost:8983/solr/select?q=citations:thedebugQuery=true

str name=rawquerystringcitations:the/str
str name=querystringcitations:the/str
str name=parsedquery/str
str name=parsedquery_toString/str




schema analyzer for this field:
!-- Citation text --
fieldType name=citationtext class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory
synonyms=substitutions.txt ignoreCase=true expand=false/
filter class=solr.StandardFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=false
words=citationstopwords.txt/
filter class=solr.LowerCaseFilterFactory/
   filter class=solr.ISOLatin1AccentFilterFactory/

!--filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/--
  /analyzer
  analyzer type=query
  tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.SynonymFilterFactory
synonyms=substitutions.txt ignoreCase=true expand=false/
  filter class=solr.StandardFilterFactory/
  filter class=solr.StopFilterFactory ignoreCase=false
words=citationstopwords.txt/
  filter class=solr.LowerCaseFilterFactory/
filter class=solr.ISOLatin1AccentFilterFactory/
   !-- filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/ --
  /analyzer
/fieldType


Re: about replication

2009-09-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
replication uses httpclient for connection. It is likely that you
notice some CLOSE_WAIT . But , how many do you see?

On Mon, Sep 14, 2009 at 6:37 AM, liugang8440265
liugang8440...@huawei.com wrote:
 hi,I hava a problem about solr-replication.

  Every time I use the replication api to replicate index ,  A TCP connection
 with CLOSE_WAIT status always appears. At last ,there will be many CLOSE_WAIT 
 connections.

 I used the one time replication api like this:

 http://localhost:8983/solr/core2/replication?command=fetchindexmasterUrl=http://localhost:8983/solr/core1/replication


 this is my conf about replication:

 requestHandler name=/replication class=solr.ReplicationHandler 
    lst name=master
        str name=replicateAftercommit/str
    /lst
 /requestHandler


 and both cores use the same config file.

 Waiting for your reply.
 Jack Liu.

 2009-09-14



 liugang8440265




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: [DIH] Multiple repeat XPath stmts

2009-09-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
The XPathRecordreader has a limit one mapping per xpath. So copying is
the best solution

On Mon, Sep 14, 2009 at 2:54 AM, Fergus McMenemie fer...@twig.me.uk wrote:
I'm trying to import several RSS feeds using DIH and running into a
bit of a problem.  Some feeds define a GUID value that I map to my
Solr ID, while others don't.  I also have a link field which I fill in
with the RSS link field.  For the feeds that don't have the GUID value
set, I want to use the link field as the id.  However, if I define the
same XPath twice, but map it to two diff. columns I don't get the id
value set.

For instance, I want to do:
schema.xml
field name=id type=string indexed=true stored=true
required=true/
field name=link type=string indexed=true stored=false/

DIH config:
field column=id xpath=/rss/channel/item/link /
field column=link xpath=/rss/channel/item/link /

Because I am consolidating multiple fields, I'm not able to do
copyFields, unless of course, I wanted to implement conditional copy
fields (only copy if the field is not defined) which I would rather not.

How do I solve this?


 How about.

 entity name=x ... transformer=TemplateTransformer
  field column=link xpath=/rss/channel/item/link /
  field column=GUID xpath=/rss/channel/GUID /
  field column=id   template=${x.link} /
  field column-id   template=${x.GUID} /

 The TemplateTransformer does nothing if its source expression is null.
 So the first transform assign the fallback value to ID, this is
 overwritten by the GUID if it is defined.

 You can not sort of do if-then-else using a combination of template
 and regex transformers. Adding a bit of maths to the transformers and
 I think we will have a turing complete language:-)

 fergus.

Thanks,
Grant

 --

 ===
 Fergus McMenemie               Email:fer...@twig.me.uk
 Techmore Ltd                   Phone:(UK) 07721 376021

 Unix/Mac/Intranets             Analyst Programmer
 ===




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: stopfilterFactory isn't removing field name

2009-09-13 Thread Yonik Seeley
That's pretty strange... perhaps something to do with your synonyms
file mapping for to a zero length token?

-Yonik
http://www.lucidimagination.com

On Mon, Sep 14, 2009 at 12:13 AM, mike anderson saidthero...@gmail.com wrote:
 I'm kind of stumped by this one.. is it something obvious?
 I'm running the latest trunk. In some cases the stopFilterFactory isn't
 removing the field name.

 Thanks in advance,

 -mike

 From debugQuery (both words are in the stopwords file):

 http://localhost:8983/solr/select?q=citations:fordebugQuery=true

 str name=rawquerystringcitations:for/str
 str name=querystringcitations:for/str
 str name=parsedquerycitations:/str
 str name=parsedquery_toStringcitations:/str


 http://localhost:8983/solr/select?q=citations:thedebugQuery=true

 str name=rawquerystringcitations:the/str
 str name=querystringcitations:the/str
 str name=parsedquery/str
 str name=parsedquery_toString/str




 schema analyzer for this field:
 !-- Citation text --
 fieldType name=citationtext class=solr.TextField
 positionIncrementGap=100
      analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
         filter class=solr.SynonymFilterFactory
 synonyms=substitutions.txt ignoreCase=true expand=false/
 filter class=solr.StandardFilterFactory/
        filter class=solr.StopFilterFactory ignoreCase=false
 words=citationstopwords.txt/
        filter class=solr.LowerCaseFilterFactory/
   filter class=solr.ISOLatin1AccentFilterFactory/

        !--filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/--
      /analyzer
      analyzer type=query
      tokenizer class=solr.StandardTokenizerFactory/
       filter class=solr.SynonymFilterFactory
 synonyms=substitutions.txt ignoreCase=true expand=false/
  filter class=solr.StandardFilterFactory/
  filter class=solr.StopFilterFactory ignoreCase=false
 words=citationstopwords.txt/
      filter class=solr.LowerCaseFilterFactory/
    filter class=solr.ISOLatin1AccentFilterFactory/
       !-- filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/ --
      /analyzer
    /fieldType



Issue on Facet field and exact match

2009-09-13 Thread dharhsana

Hi to all,

While i am working with facet using solrj, i am using string filed in schema
to avoid split in the word(i.e,Rekha dharshana, previously i was getting
rekha separate word and dharshana separate word..),in order to avoid this in
shema i use two fileds to index. My Schema.xml will look like this

field name=userId type=text indexed=true stored=true /
field name=blogId type=text indexed=true stored=true /
field name=postId type=text indexed=true stored=true /
field name=blogTitle type=text indexed=true stored=true /
field name=postTitle type=text indexed=true stored=true /
field name=postMessage type=text indexed=true stored=true /


field name=blogTitle_exact type=string indexed=true stored=false/
field name=blogId_exact type=string indexed=true stored=false/
field name=userId_exact type=string indexed=true stored=false/
field name=postId_exact type=string indexed=true stored=false /
field name=postTitle_exact type=string indexed=true stored=false /
field name=postMessage_exact type=string indexed=true stored=false
/

And this is my copy field..

copyField source=blogTitle dest=blogTitle_exact/
copyField source=userId dest=userId_exact/
copyField source=blogId dest=blogId_exact/
copyField source=postId dest=postId_exact/
copyField source=postTitle dest=postTitle_exact/
copyField source=postMessage dest=postMessage_exact/

This is my coding where i add fileds for blog details to solr,

 SolrInputDocument solrInputDocument = new SolrInputDocument();
 solrInputDocument.addField(blogTitle,$Never Fails$);
 solrInputDocument.addField(blogId,$Never Fails$);
 solrInputDocument.addField(userId,1);

This is my coding to add fileds for post details to solr..

solrInputDocument.addField(blogId,$Never Fails$);
solrInputDocument.addField(postId,$Never Fails post$);
solrInputDocument.addField(postTitle,$Never Fails post$);
solrInputDocument.addField(postMessage,$Never Fails post message$);

While i am quering it from solr, this is my coding..


 SolrQuery queryOfMyBlog = new SolrQuery(blogId_exact:Never Fails);
 queryOfMyBlog.setFacet(true);
 queryOfMyBlog.addFacetField(blogTitle_exact);
 queryOfMyBlog.addFacetField(userId_exact);
 queryOfMyBlog.addFacetField(blogId_exact);
 queryOfMyBlog.setFacetMinCount(1);
 queryOfMyBlog.setIncludeScore(true);

 ListFacetField facets = query.getFacetFields();
   List listOfAllValues =  new ArrayList();
System.out.println(inside facettt  size+facets.size());
for(FacetField facet : facets)
{
 System.out.println(inside for);
ListFacetField.Count facetEntries = facet.getValues();
for(FacetField.Count fcount : facetEntries)
{
String s= fcount.getName();
listOfAllValues.add(s);
 System.out.println(BlogId+s);
}
}


In the above code it copies the field from blogId,blogTitle,userId to
blogId_exact,blogTitle_exact,userId_exact,so that i can get the out put ,but
while i am indexing it to solr i index the filed in this manner ie.. $Never
Fails$ this i do to get an exact search but i am not getting the exact
search ,when i try to query only Never Fails it also brings and show me
the Success Never Fails field to ..I need only to display Never Fails
details of the particular blog,but i even get Success Never Fails along
with that... what should i do for this to get exact match..

The above is my First Issue...

The next issue is ,when i am querying the post details , the same thing i do
,to get the post details...


SolrQuery queryOfMyPost = new SolrQuery(blogId_exact:$Success Never
Fails$);
 queryOfMyPost.setFacet(true);
 queryOfMyPost.addFacetField(blogId_exact);
 queryOfMyPost.addFacetField(postId_exact);
 queryOfMyPost.addFacetField(postTitle_exact);
 queryOfMyPost.addFacetField(postMessage_exact);
 queryOfMyPost.setFacetMinCount(1);
 queryOfMyPost.setIncludeScore(true);


ListFacetField facetsForPost = queryPost.getFacetFields();
   List listOfAllFacetsForPost =  new ArrayList();
   System.out.println(inside facettt  size+facetsForPost);
for(FacetField facetPost1 : facetsForPost)
{
 System.out.println(inside for+facetPost1);
ListFacetField.Count facetEntries =
facetPost1.getValues();
for(FacetField.Count fcount1 : facetEntries)
{
String s1= fcount1.getName();
listOfAllFacetsForPost.add(s1);
 System.out.println(Post details+ s1);
}
}

Here in the above facet filed postId_exact,postTitle_exact,postMessage_exact
i get null values..

the copy filed has not been copied to those values.. so i get null values
for this..


Please check my code and tell me where i am wrong...

And