[ 
https://issues.apache.org/jira/browse/NUTCH-2469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ninaad Joshi updated NUTCH-2469:
--------------------------------
    Attachment: NinaadJoshi.IndexingJob.java.patch

> Documents not commited to solr in Sever mode
> --------------------------------------------
>
>                 Key: NUTCH-2469
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2469
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 2.3.1
>            Reporter: Ninaad Joshi
>            Priority: Blocker
>         Attachments: NinaadJoshi.IndexingJob.java.patch
>
>
> I found there is a discrepancy in execution paths when running Nutch in local 
> standalone mode vis-à-vis server mode. 
> I observed, in local standalone mode, when the indexing process is done the 
> document along with its fields get indexed and committed in solr and is 
> returned if queried immediately. However, the same when done through server 
> mode, the document gets indexed but is not committed in solr, hence not 
> returned if queried immediately. When we restart solr the indexed document is 
> returned if queried.
> I browsed through the IndexingJob.java file to understand the cause for this. 
> I found out:
> # There are two different entry paths for the local standalone mode and the 
> server mode
> ** Server mode entry point: public Map<String, Object> run(Map<String, 
> Object> args)
> ** Standalone mode entry point: 
> *** public int run(String[] args)
> *** public void index(String batchId)
> # The local standalone mode path did extra stuff than the server mode
> ** The public void index(String batchId) function initially calls the server 
> mode path: public Map<String, Object> run(Map<String, Object> args)
> ** And then does this extra stuff
> *** Gets IndexWriters
> *** Using IndexWriters Describes 
> Using IndexWriters commits if COMMIT_INDEX=true is specified in the 
> configuration
> *** The aforementioned extra stuff is not done in the server mode
> I feel the execution paths for both the modes should be same and hence 
> propose to:
> # Move the extra stuff done using IndexWriters in public void index(String 
> batchId) to the end of server mode execution path i.e public Map<String, 
> Object> run(Map<String, Object> args) function 
> # Call public Map<String, Object> run(Map<String, Object> args) function 
> directly from Standalone mode entry point: public int run(String[] args)
> # public int run(String[] args) becomes redundant and can be safely removed.
> I have attached the proposed patch along with this issue. Kindly go through 
> the same and approve.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to