Re: Crawling with nutch and mapping fields to solr

2010-11-12 Thread Ramavtar Meena
Hi,

This question is more suitable for nutch mailing list but let me give
you couple of pointers.

If its only metadata you can use the below mentioned patch, but if you
want more flexibility with your data you can look at writing your own
parser plugin, here is a good place to start:

http://wiki.apache.org/nutch/WritingPluginExample-0.9

xpath+htmlcleaner+beanshell would be a good set of tools for your custom parser.

regards,
Ram

On Thu, Nov 11, 2010 at 9:21 PM, Jean-Luc jeanl...@gmail.com wrote:

 I'm going down the route of patching nutch so I can use this ParseMetaTags
 plugin:
 https://issues.apache.org/jira/browse/NUTCH-809

 Also wondering whether I will be able to use the XMLParser to allow me to
 parse well formed XHTML, using xpath would be bonus:
 https://issues.apache.org/jira/browse/NUTCH-185

 Any thoughts appreciated...
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Crawling-with-nutch-and-mapping-fields-to-solr-tp1879060p1883295.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: WELCOME to solr-user@lucene.apache.org

2010-11-11 Thread Ramavtar Meena
Hi,

If you are looking for query time boosting on title field you can do
the following:
/select?q=title:android^10

Also unless you have a very good reason to use string for date data
(in your case pubdate and reldate), you should be using
solr.DateField.

regards,
Ram
On Fri, Nov 12, 2010 at 3:41 AM, Ahmet Arslan iori...@yahoo.com wrote:
 There are several mistakes in your approach:

 copyField just copies data. Index time boost is not copied.

 There is no such boosting syntax. /select?q=Eachtitle^9fl=score

 You are searching on your default field.

 This is not your cause of your problem but omitNorms=true disables index 
 time boosts.

 http://wiki.apache.org/solr/DisMaxQParserPlugin can satisfy your need.


 --- On Thu, 11/11/10, Solr User solr...@gmail.com wrote:

 From: Solr User solr...@gmail.com
 Subject: Re: WELCOME to solr-user@lucene.apache.org
 To: solr-user@lucene.apache.org
 Date: Thursday, November 11, 2010, 11:54 PM
 Eric,

 Thank you so much for the reply and apologize for not
 providing all the
 details.

 The following are the field definitons in my schema.xml:

 field name=title type=string indexed=true
 stored=true
 omitNorms=false /

 field name=author type=string indexed=true
 stored=true
 multiValued=true omitNorms=true /

 field name=authortype type=string indexed=true
 stored=true
 multiValued=true omitNorms=true /

 field name=isbn13 type=string indexed=true
 stored=true /

 field name=isbn10 type=string indexed=true
 stored=true /

 field name=material type=string indexed=true
 stored=true /

 field name=pubdate type=string indexed=true
 stored=true /

 field name=pubyear type=string indexed=true
 stored=true /

 field name=reldate type=string indexed=false
 stored=true /

 field name=format type=string indexed=true
 stored=true /

 field name=pages type=string indexed=false
 stored=true /

 field name=desc type=string indexed=true
 stored=true /

 field name=series type=string indexed=true
 stored=true /

 field name=season type=string indexed=true
 stored=true /

 field name=imprint type=string indexed=true
 stored=true /

 field name=bisacsub type=string indexed=true
 stored=true
 multiValued=true omitNorms=true /

 field name=bisacstatus type=string indexed=false
 stored=true /

 field name=category type=string indexed=true
 stored=true
 multiValued=true omitNorms=true /

 field name=award type=string indexed=true
 stored=true
 multiValued=true omitNorms=true /

 field name=age type=string indexed=true
 stored=true /

 field name=reading type=string indexed=true
 stored=true /

 field name=grade type=string indexed=true
 stored=true /

 field name=path type=string indexed=false
 stored=true /

 field name=shortdesc type=string indexed=true
 stored=true /

 field name=subtitle type=string indexed=true
 stored=true
 omitNorms=true/

 field name=price type=float indexed=true
 stored=true/

 field name=searchFields type=textSpell
 indexed=true stored=true
 multiValued=true omitNorms=true/

 Copy Fields:

 copyField source=title dest=searchFields/

 copyField source=author dest=searchFields/

 copyField source=isbn13 dest=searchFields/

 copyField source=isbn10 dest=searchFields/

 copyField source=format dest=searchFields/

 copyField source=series dest=searchFields/

 copyField source=season dest=searchFields/

 copyField source=imprint dest=searchFields/

 copyField source=bisacsub dest=searchFields/

 copyField source=category dest=searchFields/

 copyField source=award dest=searchFields/

 copyField source=shortdesc dest=searchFields/

 copyField source=desc dest=searchFields/

 copyField source=subtitle dest=searchFields/



 defaultSearchFieldsearchFields/defaultSearchField



 Before creating the indexes I feed XML file to the Solr job
 to create index
 files. I added Boost attribute to the title field before
 creating indexes
 and an example is below:

 ?xml version=1.0 encoding=UTF-8
 standalone=no?adddocfield
 name=material1785440/fieldfield
 boost=10.0 name=titleEach Little
 Bird That Sings/fieldfield
 name=price16.0/fieldfield
 name=isbn100152051139/fieldfield
 name=isbn139780152051136/fieldfield
 name=formatHardcover/fieldfield
 name=pubdate2005-03-01/fieldfield
 name=pubyear2005/fieldfield
 name=reldate2005-02-22/fieldfield
 name=pages272/fieldfield
 name=bisacstatusActive/fieldfield
 name=seasonSpring
 2005/fieldfield
 name=imprintChildren's/fieldfield
 name=age8.0-12.0/fieldfield
 name=grade3-6/fieldfield
 name=authorMarla Frazee/fieldfield
 name=authortypeJacket
 Illustrator/fieldfield name=authorDeborah
 Wiles/fieldfield
 name=authortypeAuthor/fieldfield
 name=bisacsubSocial
 Issues/Friendship/fieldfield
 name=bisacsubSocial Issues/General (see
 also headings under Family)/fieldfield
 name=bisacsubGeneral/fieldfield
 name=bisacsubGirls 
 Women/fieldfield
 name=categoryFiction/Middle
 Grade/fieldfield
 name=categoryFiction/Award
 Winners/fieldfield name=categoryComing
 of Age/fieldfield name=categorySocial
 Situations/Death 
 Dying/fieldfield