Hi, Is it possible to crawl three different website like

1. https://www.urgenthomework.com/
2. https://www.myassignmenthelp.net/
3. https://www.assignmenthelp.net/

in single nutch configuration and then send the respective index pages to
corrosponding cores [ uah, mah , yah]  in solr.  I tried to acheieve it by
exchange and writer id.  Please look below for my confirgurations

-------------exchange.xml---------------------------------







*<exchange id="uahIndexernew" class="default">    <writers>      <writer
id="indexer_solr_1" />    </writers>    <params>      <param name="expr"
value="doc.getFieldValue('host')=='urgenthomework.com
<http://urgenthomework.com>'" />    </params>  </exchange>*








*<exchange id="mahIndexernew" class="default">    <writers>      <writer
id="indexer_solr_2" />    </writers>    <params>      <param name="expr"
value="doc.getFieldValue('host')=='myassignmenthelp.net
<http://myassignmenthelp.net>'" />    </params>  </exchange>*










* <exchange id="yahIndexernew" class="default">    <writers>      <writer
id="indexer_solr_3" />    </writers>    <params>      <param name="expr"
value="doc.getFieldValue('host')=='assignmenthelp.net
<http://assignmenthelp.net>'" />    </params>  </exchange>*



---------------------------------index.writers.xml----------------------------------------

 <writer id="indexer_solr_1"
class="org.apache.nutch.indexwriter.solr.SolrIndexWriter">
    <parameters>
      <param name="type" value="http" />
      <param name="url" value="http://localhost:8983/solr/uah"; />
      <param name="collection" value="" />
      <param name="weight.field" value="" />
      <param name="commitSize" value="1000" />
      <param name="auth" value="false" />
      <param name="username" value="username" />
      <param name="password" value="password" />
    </parameters>
    <mapping>
      <copy>
        <!-- <field source="title" dest="content" />
        <field source="metatag.description" dest="content" />
        <field source="metatag.keywords" dest="content" /> -->
      </copy>
      <rename></rename>
      <remove>
        <field source="segment" />
        <field source="host" />
        <field source="url" />
        <!-- <field source="metatag.description" />
        <field source="metatag.keywords" />
        <field source="date" />
        <field source="url" />
         -->
      </remove>
    </mapping>
  </writer>


  <writer id="indexer_solr_2"
class="org.apache.nutch.indexwriter.solr.SolrIndexWriter">
    <parameters>
      <param name="type" value="http" />
      <param name="url" value="http://localhost:8983/solr/mah"; />
      <param name="collection" value="" />
      <param name="weight.field" value="" />
      <param name="commitSize" value="1000" />
      <param name="auth" value="false" />
      <param name="username" value="username" />
      <param name="password" value="password" />
    </parameters>
    <mapping>
      <copy>
      </copy>
      <rename></rename>
      <remove>
        <field source="segment" />
        <field source="host" />
        <field source="url" />
      </remove>
    </mapping>
  </writer>



  <writer id="indexer_solr_3"
class="org.apache.nutch.indexwriter.solr.SolrIndexWriter">
    <parameters>
      <param name="type" value="http" />
      <param name="url" value="http://localhost:8983/solr/yah"; />
      <param name="collection" value="" />
      <param name="weight.field" value="" />
      <param name="commitSize" value="1000" />
      <param name="auth" value="false" />
      <param name="username" value="username" />
      <param name="password" value="password" />
    </parameters>
    <mapping>
      <copy>
      </copy>
      <rename></rename>
      <remove>
        <field source="segment" />
        <field source="host" />
        <field source="url" />
      </remove>
    </mapping>
  </writer>

---------------------------------------------------------------------------------------------------------------

But it is not pushing data into corrosinding cores rather it is sending
data in one core from different domain, Please do let me know. I am sure
there has to be way to achieve it. I didnt try wth sobcollecion.xml. Do you
think I can achieve it using subcollection?

Reply via email to