Hi, Is it possible to crawl three different website like 1. https://www.urgenthomework.com/ 2. https://www.myassignmenthelp.net/ 3. https://www.assignmenthelp.net/
in single nutch configuration and then send the respective index pages to corrosponding cores [ uah, mah , yah] in solr. I tried to acheieve it by exchange and writer id. Please look below for my confirgurations -------------exchange.xml--------------------------------- *<exchange id="uahIndexernew" class="default"> <writers> <writer id="indexer_solr_1" /> </writers> <params> <param name="expr" value="doc.getFieldValue('host')=='urgenthomework.com <http://urgenthomework.com>'" /> </params> </exchange>* *<exchange id="mahIndexernew" class="default"> <writers> <writer id="indexer_solr_2" /> </writers> <params> <param name="expr" value="doc.getFieldValue('host')=='myassignmenthelp.net <http://myassignmenthelp.net>'" /> </params> </exchange>* * <exchange id="yahIndexernew" class="default"> <writers> <writer id="indexer_solr_3" /> </writers> <params> <param name="expr" value="doc.getFieldValue('host')=='assignmenthelp.net <http://assignmenthelp.net>'" /> </params> </exchange>* ---------------------------------index.writers.xml---------------------------------------- <writer id="indexer_solr_1" class="org.apache.nutch.indexwriter.solr.SolrIndexWriter"> <parameters> <param name="type" value="http" /> <param name="url" value="http://localhost:8983/solr/uah" /> <param name="collection" value="" /> <param name="weight.field" value="" /> <param name="commitSize" value="1000" /> <param name="auth" value="false" /> <param name="username" value="username" /> <param name="password" value="password" /> </parameters> <mapping> <copy> <!-- <field source="title" dest="content" /> <field source="metatag.description" dest="content" /> <field source="metatag.keywords" dest="content" /> --> </copy> <rename></rename> <remove> <field source="segment" /> <field source="host" /> <field source="url" /> <!-- <field source="metatag.description" /> <field source="metatag.keywords" /> <field source="date" /> <field source="url" /> --> </remove> </mapping> </writer> <writer id="indexer_solr_2" class="org.apache.nutch.indexwriter.solr.SolrIndexWriter"> <parameters> <param name="type" value="http" /> <param name="url" value="http://localhost:8983/solr/mah" /> <param name="collection" value="" /> <param name="weight.field" value="" /> <param name="commitSize" value="1000" /> <param name="auth" value="false" /> <param name="username" value="username" /> <param name="password" value="password" /> </parameters> <mapping> <copy> </copy> <rename></rename> <remove> <field source="segment" /> <field source="host" /> <field source="url" /> </remove> </mapping> </writer> <writer id="indexer_solr_3" class="org.apache.nutch.indexwriter.solr.SolrIndexWriter"> <parameters> <param name="type" value="http" /> <param name="url" value="http://localhost:8983/solr/yah" /> <param name="collection" value="" /> <param name="weight.field" value="" /> <param name="commitSize" value="1000" /> <param name="auth" value="false" /> <param name="username" value="username" /> <param name="password" value="password" /> </parameters> <mapping> <copy> </copy> <rename></rename> <remove> <field source="segment" /> <field source="host" /> <field source="url" /> </remove> </mapping> </writer> --------------------------------------------------------------------------------------------------------------- But it is not pushing data into corrosinding cores rather it is sending data in one core from different domain, Please do let me know. I am sure there has to be way to achieve it. I didnt try wth sobcollecion.xml. Do you think I can achieve it using subcollection?