[
https://issues.apache.org/jira/browse/MAHOUT-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831042#comment-13831042
]
Frank Scholten commented on MAHOUT-1345:
----------------------------------------
Aha, good to know. What happens in the 'invalid query' test is that the
initialize method of the LuceneSegmentRecordReader throws an IllegalArgument
exception when creating the scorer. By that time the two map taks are already
created. Suneel showed that waiting for them does not work. Is there something
in the Hadoop API we can use to influence these threads somehow?
{code}
Nov 24, 2013 9:19:46 PM org.apache.hadoop.mapred.Task initialize
INFO: Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1347dad
Nov 24, 2013 9:19:46 PM org.apache.hadoop.mapred.MapTask runNewMapper
INFO: Processing split: org.apache.mahout.text.LuceneSegmentInputSplit@1649784
Nov 24, 2013 9:19:46 PM
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable run
INFO: Starting task: attempt_local1401304847_0001_m_000001_0
Nov 24, 2013 9:19:46 PM org.apache.hadoop.mapred.Task initialize
INFO: Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@8a6ff9
Nov 24, 2013 9:19:46 PM org.apache.hadoop.mapred.MapTask runNewMapper
INFO: Processing split: org.apache.mahout.text.LuceneSegmentInputSplit@b413de
Nov 24, 2013 9:19:46 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
INFO: Map task executor complete.
Nov 24, 2013 9:19:46 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
WARNING: job_local1401304847_0001
java.lang.Exception: java.io.IOException: Could not create query scorer for
query: invalid:query
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.io.IOException: Could not create query scorer for query:
invalid:query
at
org.apache.mahout.text.LuceneSegmentRecordReader.initialize(LuceneSegmentRecordReader.java:72)
at
org.apache.mahout.text.LuceneSegmentInputFormat.createRecordReader(LuceneSegmentInputFormat.java:76)
at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:488)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
{code}
> Enable randomised testing for all Mahout modules
> ------------------------------------------------
>
> Key: MAHOUT-1345
> URL: https://issues.apache.org/jira/browse/MAHOUT-1345
> Project: Mahout
> Issue Type: Improvement
> Affects Versions: 0.8
> Reporter: Isabel Drost-Fromm
> Priority: Minor
> Fix For: 0.9
>
> Attachments: MAHOUT-1345.diff, MAHOUT-1345.patch
>
>
> When enabling randomised testing for all modules I found a few tests became
> unstable or even fail deterministically due to lingering threads. The
> attached patch:
> * defines the randomised testing dependency in our parent pom
> * re-uses said dependencies in all depending modules (makes upgrading easier
> as the version number needs to be changed in just one place)
> * adds several code changes that fixed the failures due to lingering threads
> for me on my machine. I'd greatly appreciate input a) from those who wrote
> the respective code and b) others who ran the tests with these changes to
> make sure there are no other tests that suffer from the same issues.
> Warning: I touched quite a few bits and pieces I'm not intimately familiar
> with over the last few weeks (whenever I had a few spare minutes) - second
> pair of eyes needed.
--
This message was sent by Atlassian JIRA
(v6.1#6144)