[jira] [Commented] (SOLR-2186) DataImportHandler multi-threaded option throws exception
[ https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065833#comment-13065833 ] Shalin Shekhar Mangar commented on SOLR-2186: - Frank, I've opened SOLR-2655 for related issues. I may not have time to go into these soon so I'd advise people not to use multi threaded mode for the time being. > DataImportHandler multi-threaded option throws exception > > > Key: SOLR-2186 > URL: https://issues.apache.org/jira/browse/SOLR-2186 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler >Reporter: Lance Norskog >Assignee: Shalin Shekhar Mangar > Fix For: 3.4, 4.0 > > Attachments: SOLR-2186.patch, SOLR-2186.patch, SOLR-2186.patch, > Solr-2186.patch, TestDocBuilderThreaded.java, TestTikaEntityProcessor.patch, > TikaResolver.patch > > > The multi-threaded option for the DataImportHandler throws an exception and > the entire operation fails. This is true even if only 1 thread is configured > via *threads='1'* -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2186) DataImportHandler multi-threaded option throws exception
[ https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065363#comment-13065363 ] Frank Wesemann commented on SOLR-2186: -- Thanks for taking this issue Shalin. You might close SOLR-2544 along with this > DataImportHandler multi-threaded option throws exception > > > Key: SOLR-2186 > URL: https://issues.apache.org/jira/browse/SOLR-2186 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler >Reporter: Lance Norskog >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-2186.patch, SOLR-2186.patch, SOLR-2186.patch, > Solr-2186.patch, TestDocBuilderThreaded.java, TestTikaEntityProcessor.patch, > TikaResolver.patch > > > The multi-threaded option for the DataImportHandler throws an exception and > the entire operation fails. This is true even if only 1 thread is configured > via *threads='1'* -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2186) DataImportHandler multi-threaded option throws exception
[ https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041855#comment-13041855 ] Robert Muir commented on SOLR-2186: --- Hi Lance/Frank, Thanks for working on this issue. Any ideas on how we could make a junit test to show the problem? This would make it easier to evaluate the patch and possible to prevent regressions. > DataImportHandler multi-threaded option throws exception > > > Key: SOLR-2186 > URL: https://issues.apache.org/jira/browse/SOLR-2186 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler >Reporter: Lance Norskog >Assignee: Grant Ingersoll > Attachments: SOLR-2186.patch, Solr-2186.patch, TikaResolver.patch > > > The multi-threaded option for the DataImportHandler throws an exception and > the entire operation fails. This is true even if only 1 thread is configured > via *threads='1'* -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2186) DataImportHandler multi-threaded option throws exception
[ https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041652#comment-13041652 ] Frank Wesemann commented on SOLR-2186: -- The added patch addresses the problem that EntityProcessors do not have a usable VariableResolver in their {{init()}} Method. This is done in the EntityRunner's {{runAThread()}} Method by first initing the EntityProcessorWrapper and after that initing the Entityprocessor. By changing the order as described the according namespaces a created on the variableResolver before it can be used by the EntityProcessor. Additionally I changed the loglevel for the "adding a row" messages to "debug". This patch does don't solve the problem described in SOLR-2544. > DataImportHandler multi-threaded option throws exception > > > Key: SOLR-2186 > URL: https://issues.apache.org/jira/browse/SOLR-2186 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler >Reporter: Lance Norskog >Assignee: Grant Ingersoll > Attachments: TikaResolver.patch > > > The multi-threaded option for the DataImportHandler throws an exception and > the entire operation fails. This is true even if only 1 thread is configured > via *threads='1'* -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2186) DataImportHandler multi-threaded option throws exception
[ https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12997656#comment-12997656 ] Lance Norskog commented on SOLR-2186: - bq. Lance, can you update this patch and add a unit test? Sorry Grant, this wasn't on my watch list. This patch is not a patch to fix it, it is a patch to demonstrate the problem. I don't know the right way to solve this. > DataImportHandler multi-threaded option throws exception > > > Key: SOLR-2186 > URL: https://issues.apache.org/jira/browse/SOLR-2186 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler >Reporter: Lance Norskog >Assignee: Grant Ingersoll > Attachments: TikaResolver.patch > > > The multi-threaded option for the DataImportHandler throws an exception and > the entire operation fails. This is true even if only 1 thread is configured > via *threads='1'* -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2186) DataImportHandler multi-threaded option throws exception
[ https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12968890#action_12968890 ] Fuad Efendi commented on SOLR-2186: --- I resolved this issue for SQL, SOLR-2233; it was related to 'thread A closes connection needed by thread B' > DataImportHandler multi-threaded option throws exception > > > Key: SOLR-2186 > URL: https://issues.apache.org/jira/browse/SOLR-2186 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler >Reporter: Lance Norskog >Assignee: Grant Ingersoll > Attachments: TikaResolver.patch > > > The multi-threaded option for the DataImportHandler throws an exception and > the entire operation fails. This is true even if only 1 thread is configured > via *threads='1'* -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2186) DataImportHandler multi-threaded option throws exception
[ https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12968439#action_12968439 ] Grant Ingersoll commented on SOLR-2186: --- Lance, can you update this patch and add a unit test? > DataImportHandler multi-threaded option throws exception > > > Key: SOLR-2186 > URL: https://issues.apache.org/jira/browse/SOLR-2186 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler >Reporter: Lance Norskog >Assignee: Grant Ingersoll > Attachments: TikaResolver.patch > > > The multi-threaded option for the DataImportHandler throws an exception and > the entire operation fails. This is true even if only 1 thread is configured > via *threads='1'* -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2186) DataImportHandler multi-threaded option throws exception
[ https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931236#action_12931236 ] Lance Norskog commented on SOLR-2186: - bq. Two answers: 1) try it and see. you'll find the usage soon enough :) 2) TikaEntityProcessor, branch 3.x, line 96: {code} public Map nextRow() { if(done) return null; Map row = new HashMap(); DataSource dataSource = context.getDataSource(); InputStream is = dataSource.getData(context.getResolvedEntityAttribute(URL)); <- ContentHandler contentHandler = null; Metadata metadata = new Metadata(); {code} > DataImportHandler multi-threaded option throws exception > > > Key: SOLR-2186 > URL: https://issues.apache.org/jira/browse/SOLR-2186 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler >Reporter: Lance Norskog > Attachments: TikaResolver.patch > > > The multi-threaded option for the DataImportHandler throws an exception and > the entire operation fails. This is true even if only 1 thread is configured > via *threads='1'* -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2186) DataImportHandler multi-threaded option throws exception
[ https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931043#action_12931043 ] Fuad Efendi commented on SOLR-2186: --- I can't find any usage of resolver in TikaEP.nextRow(); am I missing something? Thanks > DataImportHandler multi-threaded option throws exception > > > Key: SOLR-2186 > URL: https://issues.apache.org/jira/browse/SOLR-2186 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler >Reporter: Lance Norskog > Attachments: TikaResolver.patch > > > The multi-threaded option for the DataImportHandler throws an exception and > the entire operation fails. This is true even if only 1 thread is configured > via *threads='1'* -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2186) DataImportHandler multi-threaded option throws exception
[ https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924284#action_12924284 ] Lance Norskog commented on SOLR-2186: - I've tracked it down. The ThreadedContext object is built without a resolver. There is a notation that the resolver will be set dynamicall but it is not. The ThreadedContext resolver is called in the "firstInit" methods TikaEntityProcessor, LineEntityProcessor, and XPathEntityProcessor. TikaEntityProcessor also calls it in nextRow. public class ThreadedContext extends ContextImpl{ private DocBuilder.EntityRunner entityRunner; private boolean limitedContext = false; public ThreadedContext(DocBuilder.EntityRunner entityRunner, DocBuilder docBuilder) { super(entityRunner.entity, null,//to be fethed realtime null, null, docBuilder.session, null, docBuilder); this.entityRunner = entityRunner; } I hacked DocBuilder.java to throw in a resolver and that allowed the TikaEP to function during firstInit. Then, the entity attribute resolver failed in the nextRow method. TikaEP is the only class that calls the entity attribute resolver outside of the firstInit() call. Is it possible to change TikeEP to only use the resolver in firstInit? > DataImportHandler multi-threaded option throws exception > > > Key: SOLR-2186 > URL: https://issues.apache.org/jira/browse/SOLR-2186 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler >Reporter: Lance Norskog > > The multi-threaded option for the DataImportHandler throws an exception and > the entire operation fails. This is true even if only 1 thread is configured > via *threads='1'* -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2186) DataImportHandler multi-threaded option throws exception
[ https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923765#action_12923765 ] Lance Norskog commented on SOLR-2186: - This is the dataConfig.xml. It is very simple: it walks a directory and indexes every PDF file it finds. If you change threads='4' to threads='1', it will still fail. If you remove the threads directive, it runs. > DataImportHandler multi-threaded option throws exception > > > Key: SOLR-2186 > URL: https://issues.apache.org/jira/browse/SOLR-2186 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler >Reporter: Lance Norskog > > The multi-threaded option for the DataImportHandler throws an exception and > the entire operation fails. This is true even if only 1 thread is configured > via *threads='1'* -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2186) DataImportHandler multi-threaded option throws exception
[ https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923764#action_12923764 ] Lance Norskog commented on SOLR-2186: - This is the stack trace. The operation configures 4 threads and then does a full-import: Oct 21, 2010 10:21:16 PM org.apache.solr.handler.dataimport.DocBuilder doFullDump INFO: running multithreaded full-import Oct 21, 2010 10:21:16 PM org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper nextRow INFO: arow : {fileSize=18837, fileLastModified=Wed Nov 21 08:15:23 PST 2007, fileAbsolutePath=/lucid/private_pdfs/10.pdfs/10.1.1.10.1.pdf, fileDir=/lucid/private_pdfs/10.pdfs, file=10.1.1.10.1.pdf} Oct 21, 2010 10:21:16 PM org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper nextRow INFO: arow : {fileSize=289898, fileLastModified=Wed Nov 21 08:15:25 PST 2007, fileAbsolutePath=/lucid/private_pdfs/10.pdfs/10.1.1.10.10.pdf, fileDir=/lucid/private_pdfs/10.pdfs, file=10.1.1.10.10.pdf} Oct 21, 2010 10:21:16 PM org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper nextRow INFO: arow : {fileSize=121847, fileLastModified=Wed Nov 21 08:15:43 PST 2007, fileAbsolutePath=/lucid/private_pdfs/10.pdfs/10.1.1.10.100.pdf, fileDir=/lucid/private_pdfs/10.pdfs, file=10.1.1.10.100.pdf} Oct 21, 2010 10:21:16 PM org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper nextRow INFO: arow : {fileSize=59844, fileLastModified=Wed Nov 21 08:18:49 PST 2007, fileAbsolutePath=/lucid/private_pdfs/10.pdfs/10.1.1.10.1000.pdf, fileDir=/lucid/private_pdfs/10.pdfs, file=10.1.1.10.1000.pdf} Oct 21, 2010 10:21:16 PM org.apache.solr.handler.dataimport.DocBuilder doFullDump SEVERE: error in import java.lang.NullPointerException at org.apache.solr.handler.dataimport.ContextImpl.getResolvedEntityAttribute(ContextImpl.java:79) at org.apache.solr.handler.dataimport.ThreadedContext.getResolvedEntityAttribute(ThreadedContext.java:78) at org.apache.solr.handler.dataimport.TikaEntityProcessor.firstInit(TikaEntityProcessor.java:67) at org.apache.solr.handler.dataimport.EntityProcessorBase.init(EntityProcessorBase.java:56) at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.initEntity(DocBuilder.java:507) at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(DocBuilder.java:425) at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.run(DocBuilder.java:386) at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(DocBuilder.java:453) at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.access$000(DocBuilder.java:340) at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner$1.run(DocBuilder.java:393) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Oct 21, 2010 10:21:16 PM org.apache.solr.handler.dataimport.DocBuilder finish INFO: Import completed successfully Oct 21, 2010 10:21:16 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=true,waitFlush=false,waitSearcher=true,expungeDeletes=false) > DataImportHandler multi-threaded option throws exception > > > Key: SOLR-2186 > URL: https://issues.apache.org/jira/browse/SOLR-2186 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler >Reporter: Lance Norskog > > The multi-threaded option for the DataImportHandler throws an exception and > the entire operation fails. This is true even if only 1 thread is configured > via *threads='1'* -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org