[ 
https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924838#action_12924838
 ] 

Lance Norskog edited comment on SOLR-2186 at 10/26/10 1:20 AM:
---------------------------------------------------------------

This patch file fixes up the DataImportHandler so that the TikaEntityProcessor 
works under threads.

The technique is to pass in a resolver when creating a ThreadedContext 
(wrapper). This allows TikaEP.firstInit() to work. However, TikaEP.nextRow is 
called with a context without a functioning resolver, so: TikeEP caches the 
resolver given in firstInit() and uses it during nextRow() instead of using the 
one it should use. Even so, the parsed text is spewed to the logger in addition 
to being indexed.

This is not intended as fix patch;  it merely demonstrates the problem.

The patch is made with 'git diff' and I still haven't mastered it; some 'patch' 
programs may not like it.





      was (Author: lancenorskog):
    This patch file fixes up the DataImportHandler so that the 
TikaEntityProcessor works under threads.

The technique is to pass in a resolver when creating a ThreadedContext 
(wrapper). This allows TikaEP.firstInit() to work. However, TikaEP.nextRow is 
called with a context without a functioning resolver, so: TikeEP caches the 
resolver given in firstInit() and uses it during nextRow() instead of using the 
one it should use.

This is not intended as fix patch;  it merely demonstrates the problem.

The patch is made with 'git diff' and I still haven't mastered it; some 'patch' 
programs may not like it.




  
> DataImportHandler multi-threaded option throws exception
> --------------------------------------------------------
>
>                 Key: SOLR-2186
>                 URL: https://issues.apache.org/jira/browse/SOLR-2186
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>            Reporter: Lance Norskog
>         Attachments: TikaResolver.patch
>
>
> The multi-threaded option for the DataImportHandler throws an exception and 
> the entire operation fails. This is true even if only 1 thread is configured 
> via *threads='1'*

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to