[
https://issues.apache.org/jira/browse/NUTCH-2469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ninaad Joshi updated NUTCH-2469:
--------------------------------
Attachment: NinaadJoshi.IndexingJob.java.patch
> Documents not commited to solr in Sever mode
> --------------------------------------------
>
> Key: NUTCH-2469
> URL: https://issues.apache.org/jira/browse/NUTCH-2469
> Project: Nutch
> Issue Type: Bug
> Components: indexer
> Affects Versions: 2.3.1
> Reporter: Ninaad Joshi
> Priority: Blocker
> Attachments: NinaadJoshi.IndexingJob.java.patch
>
>
> I found there is a discrepancy in execution paths when running Nutch in local
> standalone mode vis-à-vis server mode.
> I observed, in local standalone mode, when the indexing process is done the
> document along with its fields get indexed and committed in solr and is
> returned if queried immediately. However, the same when done through server
> mode, the document gets indexed but is not committed in solr, hence not
> returned if queried immediately. When we restart solr the indexed document is
> returned if queried.
> I browsed through the IndexingJob.java file to understand the cause for this.
> I found out:
> # There are two different entry paths for the local standalone mode and the
> server mode
> ** Server mode entry point: public Map<String, Object> run(Map<String,
> Object> args)
> ** Standalone mode entry point:
> *** public int run(String[] args)
> *** public void index(String batchId)
> # The local standalone mode path did extra stuff than the server mode
> ** The public void index(String batchId) function initially calls the server
> mode path: public Map<String, Object> run(Map<String, Object> args)
> ** And then does this extra stuff
> *** Gets IndexWriters
> *** Using IndexWriters Describes
> Using IndexWriters commits if COMMIT_INDEX=true is specified in the
> configuration
> *** The aforementioned extra stuff is not done in the server mode
> I feel the execution paths for both the modes should be same and hence
> propose to:
> # Move the extra stuff done using IndexWriters in public void index(String
> batchId) to the end of server mode execution path i.e public Map<String,
> Object> run(Map<String, Object> args) function
> # Call public Map<String, Object> run(Map<String, Object> args) function
> directly from Standalone mode entry point: public int run(String[] args)
> # public int run(String[] args) becomes redundant and can be safely removed.
> I have attached the proposed patch along with this issue. Kindly go through
> the same and approve.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)