[jira] [Commented] (SOLR-2186) DataImportHandler multi-threaded option throws exception

2011-07-15 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065833#comment-13065833
 ] 

Shalin Shekhar Mangar commented on SOLR-2186:
-

Frank, I've opened SOLR-2655 for related issues. I may not have time to go into 
these soon so I'd advise people not to use multi threaded mode for the time 
being.

> DataImportHandler multi-threaded option throws exception
> 
>
> Key: SOLR-2186
> URL: https://issues.apache.org/jira/browse/SOLR-2186
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Reporter: Lance Norskog
>Assignee: Shalin Shekhar Mangar
> Fix For: 3.4, 4.0
>
> Attachments: SOLR-2186.patch, SOLR-2186.patch, SOLR-2186.patch, 
> Solr-2186.patch, TestDocBuilderThreaded.java, TestTikaEntityProcessor.patch, 
> TikaResolver.patch
>
>
> The multi-threaded option for the DataImportHandler throws an exception and 
> the entire operation fails. This is true even if only 1 thread is configured 
> via *threads='1'*

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2186) DataImportHandler multi-threaded option throws exception

2011-07-14 Thread Frank Wesemann (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065363#comment-13065363
 ] 

Frank Wesemann commented on SOLR-2186:
--

Thanks for taking this issue Shalin.
You might close SOLR-2544 along with this

> DataImportHandler multi-threaded option throws exception
> 
>
> Key: SOLR-2186
> URL: https://issues.apache.org/jira/browse/SOLR-2186
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Reporter: Lance Norskog
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-2186.patch, SOLR-2186.patch, SOLR-2186.patch, 
> Solr-2186.patch, TestDocBuilderThreaded.java, TestTikaEntityProcessor.patch, 
> TikaResolver.patch
>
>
> The multi-threaded option for the DataImportHandler throws an exception and 
> the entire operation fails. This is true even if only 1 thread is configured 
> via *threads='1'*

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2186) DataImportHandler multi-threaded option throws exception

2011-05-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041855#comment-13041855
 ] 

Robert Muir commented on SOLR-2186:
---

Hi Lance/Frank,

Thanks for working on this issue.

Any ideas on how we could make a junit test to show the problem?
This would make it easier to evaluate the patch and possible to prevent 
regressions.

> DataImportHandler multi-threaded option throws exception
> 
>
> Key: SOLR-2186
> URL: https://issues.apache.org/jira/browse/SOLR-2186
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Reporter: Lance Norskog
>Assignee: Grant Ingersoll
> Attachments: SOLR-2186.patch, Solr-2186.patch, TikaResolver.patch
>
>
> The multi-threaded option for the DataImportHandler throws an exception and 
> the entire operation fails. This is true even if only 1 thread is configured 
> via *threads='1'*

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2186) DataImportHandler multi-threaded option throws exception

2011-05-31 Thread Frank Wesemann (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041652#comment-13041652
 ] 

Frank Wesemann commented on SOLR-2186:
--

The added patch addresses the problem that EntityProcessors do not have a 
usable VariableResolver in their {{init()}} Method.
This is done in the EntityRunner's {{runAThread()}} Method by first initing the 
EntityProcessorWrapper and after that initing the Entityprocessor.
By changing the order as described the according namespaces a created on the 
variableResolver before it can be used by the EntityProcessor.

Additionally I changed the loglevel for the "adding a row" messages to "debug".

This patch does don't solve the problem described in SOLR-2544.




> DataImportHandler multi-threaded option throws exception
> 
>
> Key: SOLR-2186
> URL: https://issues.apache.org/jira/browse/SOLR-2186
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Reporter: Lance Norskog
>Assignee: Grant Ingersoll
> Attachments: TikaResolver.patch
>
>
> The multi-threaded option for the DataImportHandler throws an exception and 
> the entire operation fails. This is true even if only 1 thread is configured 
> via *threads='1'*

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2186) DataImportHandler multi-threaded option throws exception

2011-02-21 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12997656#comment-12997656
 ] 

Lance Norskog commented on SOLR-2186:
-

bq. Lance, can you update this patch and add a unit test?
Sorry Grant, this wasn't on my watch list. This patch is not a patch to fix it, 
it is a patch to demonstrate the problem. I don't know the right way to solve 
this. 

> DataImportHandler multi-threaded option throws exception
> 
>
> Key: SOLR-2186
> URL: https://issues.apache.org/jira/browse/SOLR-2186
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Reporter: Lance Norskog
>Assignee: Grant Ingersoll
> Attachments: TikaResolver.patch
>
>
> The multi-threaded option for the DataImportHandler throws an exception and 
> the entire operation fails. This is true even if only 1 thread is configured 
> via *threads='1'*

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2186) DataImportHandler multi-threaded option throws exception

2010-12-07 Thread Fuad Efendi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12968890#action_12968890
 ] 

Fuad Efendi commented on SOLR-2186:
---

I resolved this issue for SQL, SOLR-2233; it was related to 'thread A closes 
connection needed by thread B'

> DataImportHandler multi-threaded option throws exception
> 
>
> Key: SOLR-2186
> URL: https://issues.apache.org/jira/browse/SOLR-2186
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Reporter: Lance Norskog
>Assignee: Grant Ingersoll
> Attachments: TikaResolver.patch
>
>
> The multi-threaded option for the DataImportHandler throws an exception and 
> the entire operation fails. This is true even if only 1 thread is configured 
> via *threads='1'*

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2186) DataImportHandler multi-threaded option throws exception

2010-12-06 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12968439#action_12968439
 ] 

Grant Ingersoll commented on SOLR-2186:
---

Lance, can you update this patch and add a unit test?

> DataImportHandler multi-threaded option throws exception
> 
>
> Key: SOLR-2186
> URL: https://issues.apache.org/jira/browse/SOLR-2186
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Reporter: Lance Norskog
>Assignee: Grant Ingersoll
> Attachments: TikaResolver.patch
>
>
> The multi-threaded option for the DataImportHandler throws an exception and 
> the entire operation fails. This is true even if only 1 thread is configured 
> via *threads='1'*

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2186) DataImportHandler multi-threaded option throws exception

2010-11-11 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931236#action_12931236
 ] 

Lance Norskog commented on SOLR-2186:
-

bq. 

Two answers:
1) try it and see. you'll find the usage soon enough :)
2) TikaEntityProcessor, branch 3.x, line 96:

{code}
  public Map nextRow() {
if(done) return null;
Map row = new HashMap();
DataSource dataSource = context.getDataSource();
InputStream is = 
dataSource.getData(context.getResolvedEntityAttribute(URL));   <-
ContentHandler contentHandler = null;
Metadata metadata = new Metadata();
{code}

> DataImportHandler multi-threaded option throws exception
> 
>
> Key: SOLR-2186
> URL: https://issues.apache.org/jira/browse/SOLR-2186
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Reporter: Lance Norskog
> Attachments: TikaResolver.patch
>
>
> The multi-threaded option for the DataImportHandler throws an exception and 
> the entire operation fails. This is true even if only 1 thread is configured 
> via *threads='1'*

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2186) DataImportHandler multi-threaded option throws exception

2010-11-11 Thread Fuad Efendi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931043#action_12931043
 ] 

Fuad Efendi commented on SOLR-2186:
---

I can't find any usage of resolver in TikaEP.nextRow(); am I missing something?
Thanks

> DataImportHandler multi-threaded option throws exception
> 
>
> Key: SOLR-2186
> URL: https://issues.apache.org/jira/browse/SOLR-2186
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Reporter: Lance Norskog
> Attachments: TikaResolver.patch
>
>
> The multi-threaded option for the DataImportHandler throws an exception and 
> the entire operation fails. This is true even if only 1 thread is configured 
> via *threads='1'*

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2186) DataImportHandler multi-threaded option throws exception

2010-10-23 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924284#action_12924284
 ] 

Lance Norskog commented on SOLR-2186:
-

I've tracked it down. The ThreadedContext object is built without a resolver. 
There is a notation that the resolver will be set dynamicall but it is not.

The ThreadedContext resolver is called in the "firstInit" methods 
TikaEntityProcessor, LineEntityProcessor, and XPathEntityProcessor. 
TikaEntityProcessor also calls it in nextRow.

public class ThreadedContext extends ContextImpl{
  private DocBuilder.EntityRunner entityRunner;
  private boolean limitedContext = false;

  public ThreadedContext(DocBuilder.EntityRunner entityRunner, DocBuilder 
docBuilder) {
super(entityRunner.entity,
null,//to be fethed realtime
null,
null,
docBuilder.session,
null,
docBuilder);
this.entityRunner = entityRunner;
  }

I hacked DocBuilder.java to throw in a resolver and that allowed the TikaEP to 
function during firstInit. Then, the entity attribute resolver failed in the 
nextRow method.

TikaEP is the only class that calls the entity attribute resolver outside of 
the firstInit() call. Is it possible to change TikeEP to only use the resolver 
in firstInit?


> DataImportHandler multi-threaded option throws exception
> 
>
> Key: SOLR-2186
> URL: https://issues.apache.org/jira/browse/SOLR-2186
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Reporter: Lance Norskog
>
> The multi-threaded option for the DataImportHandler throws an exception and 
> the entire operation fails. This is true even if only 1 thread is configured 
> via *threads='1'*

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2186) DataImportHandler multi-threaded option throws exception

2010-10-21 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923765#action_12923765
 ] 

Lance Norskog commented on SOLR-2186:
-

This is the dataConfig.xml. It is very simple: it walks a directory and indexes 
every PDF file it finds.
If you change threads='4' to threads='1', it will still fail. If you remove the 
threads directive, it runs.


   
   
 








  




> DataImportHandler multi-threaded option throws exception
> 
>
> Key: SOLR-2186
> URL: https://issues.apache.org/jira/browse/SOLR-2186
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Reporter: Lance Norskog
>
> The multi-threaded option for the DataImportHandler throws an exception and 
> the entire operation fails. This is true even if only 1 thread is configured 
> via *threads='1'*

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2186) DataImportHandler multi-threaded option throws exception

2010-10-21 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923764#action_12923764
 ] 

Lance Norskog commented on SOLR-2186:
-

This is the stack trace. The operation configures 4 threads and then does a 
full-import:

Oct 21, 2010 10:21:16 PM org.apache.solr.handler.dataimport.DocBuilder 
doFullDump
INFO: running multithreaded full-import
Oct 21, 2010 10:21:16 PM 
org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper nextRow
INFO: arow : {fileSize=18837, fileLastModified=Wed Nov 21 08:15:23 PST 2007, 
fileAbsolutePath=/lucid/private_pdfs/10.pdfs/10.1.1.10.1.pdf, 
fileDir=/lucid/private_pdfs/10.pdfs, file=10.1.1.10.1.pdf}
Oct 21, 2010 10:21:16 PM 
org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper nextRow
INFO: arow : {fileSize=289898, fileLastModified=Wed Nov 21 08:15:25 PST 2007, 
fileAbsolutePath=/lucid/private_pdfs/10.pdfs/10.1.1.10.10.pdf, 
fileDir=/lucid/private_pdfs/10.pdfs, file=10.1.1.10.10.pdf}
Oct 21, 2010 10:21:16 PM 
org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper nextRow
INFO: arow : {fileSize=121847, fileLastModified=Wed Nov 21 08:15:43 PST 2007, 
fileAbsolutePath=/lucid/private_pdfs/10.pdfs/10.1.1.10.100.pdf, 
fileDir=/lucid/private_pdfs/10.pdfs, file=10.1.1.10.100.pdf}
Oct 21, 2010 10:21:16 PM 
org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper nextRow
INFO: arow : {fileSize=59844, fileLastModified=Wed Nov 21 08:18:49 PST 2007, 
fileAbsolutePath=/lucid/private_pdfs/10.pdfs/10.1.1.10.1000.pdf, 
fileDir=/lucid/private_pdfs/10.pdfs, file=10.1.1.10.1000.pdf}
Oct 21, 2010 10:21:16 PM org.apache.solr.handler.dataimport.DocBuilder 
doFullDump
SEVERE: error in import
java.lang.NullPointerException
 at 
org.apache.solr.handler.dataimport.ContextImpl.getResolvedEntityAttribute(ContextImpl.java:79)
 at 
org.apache.solr.handler.dataimport.ThreadedContext.getResolvedEntityAttribute(ThreadedContext.java:78)
 at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.firstInit(TikaEntityProcessor.java:67)
 at 
org.apache.solr.handler.dataimport.EntityProcessorBase.init(EntityProcessorBase.java:56)
 at 
org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.initEntity(DocBuilder.java:507)
 at 
org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(DocBuilder.java:425)
 at 
org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.run(DocBuilder.java:386)
 at 
org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(DocBuilder.java:453)
 at 
org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.access$000(DocBuilder.java:340)
 at 
org.apache.solr.handler.dataimport.DocBuilder$EntityRunner$1.run(DocBuilder.java:393)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
Oct 21, 2010 10:21:16 PM org.apache.solr.handler.dataimport.DocBuilder finish
INFO: Import completed successfully
Oct 21, 2010 10:21:16 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start 
commit(optimize=true,waitFlush=false,waitSearcher=true,expungeDeletes=false)


> DataImportHandler multi-threaded option throws exception
> 
>
> Key: SOLR-2186
> URL: https://issues.apache.org/jira/browse/SOLR-2186
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Reporter: Lance Norskog
>
> The multi-threaded option for the DataImportHandler throws an exception and 
> the entire operation fails. This is true even if only 1 thread is configured 
> via *threads='1'*

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org