In working through what I _think_ will be the primary viable way to make this 
stuff faster (parallel execution, fork once) it appears to me that the primary 
concurrency issue is due to how we initialize the test seed and the fact that 
we loop over all RandomWrapper objects and reset them.  So, it's likely the 
case that in mid stream of some of the tests, the RNG is getting reset by other 
calls to the static useTestSeed() method.  

Of course, there might be other concurrency issues beyond that, but this seems 
like the most likely one to start.  Thus, the question is how to fix it.  The 
obvious one, I suppose, is to not use statics for this stuff.  Another is, to 
perhaps, use a system property (-DuseTestSeed=true and/or -DuseSeed=<SEED>, the 
latter being useful for debugging other things) that is set upon invocation in 
the test plugin, but has the downside that it would also need to be set when 
running from an IDE.

And, to Sean's point below, it seems that we may have some test dependencies on 
the specific set of random numbers and the outcomes they produce.

Thoughts?  Other ideas?


On Dec 8, 2011, at 1:05 PM, Grant Ingersoll wrote:

> Progress!  I had configured the surefire plugin in the wrong place
> 
> 
> On Dec 8, 2011, at 2:55 PM, Sean Owen wrote:
> 
>> This could well be it. While every Random everywhere gets initialized to a
>> known initial state, at the start of every @Test method, you could get
>> different sequences if other tests are in progress in parallel in the same
>> JVM.
>> 
>> Ideally tests aren't that sensitive to the sequence of random numbers -- if
>> that's the case. And here it may well be the case.
>> 
>> Can this be set to fork a JVM per test class? that would probably work.
>> 
>> On Thu, Dec 8, 2011 at 7:43 PM, Grant Ingersoll <gsing...@apache.org> wrote:
>> 
>>> 
>>> On Dec 8, 2011, at 2:39 PM, Grant Ingersoll wrote:
>>> 
>>>> 
>>>> On Dec 8, 2011, at 2:23 PM, Grant Ingersoll wrote:
>>>> 
>>>>> If I add parallel, fork always to the main surefire config, I get
>>> failures all over the place for things like:
>>>>> Failed tests:
>>> testHebbianSolver(org.apache.mahout.math.decomposer.hebbian.TestHebbianSolver):
>>> Error: {0.06146049974880152 too high! (for eigen 3)
>>>>> consistency(org.apache.mahout.math.jet.random.NormalTest):
>>> offset=0.000 scale=1.000 Z = 8.2
>>>>> consistency(org.apache.mahout.math.jet.random.ExponentialTest):
>>> offset=0.000 scale=100.000 Z = 8.7
>>>>> 
>>>> 
>>>> Check that, it seems each run can produce different failures, which
>>> leads me to believe we have some shared values in our tests
>>> 
>>> Random.getRandom() the culprit, perhaps?
>>> 
>>>> 
>>>> 
>>>>> All of these pass individually and when not in parallel for me.
>>>>> 
>>>>> Here's my config:
>>>>> <plugin>
>>>>>        <groupId>org.apache.maven.plugins</groupId>
>>>>>        <artifactId>maven-surefire-plugin</artifactId>
>>>>>        <version>2.11</version>
>>>>>        <configuration>
>>>>>          <parallel>classes</parallel>
>>>>>          <forkMode>always</forkMode>
>>>>>          <perCoreThreadCount>true</perCoreThreadCount>
>>>>>        </configuration>
>>>>>      </plugin>
>>>>> 
>>>>> Anyone else seeing that?
>>>>> 
>>>>> 
>>>>> On Dec 8, 2011, at 1:53 PM, Dmitriy Lyubimov wrote:
>>>>> 
>>>>>> SSVD actually runs a rather small test but it is a MR job in local
>>>>>> mode, there's nothing to cut down there in terms of size (not much
>>>>>> anyway). It's just what it takes to initialize and run all jobs (and
>>>>>> since it is local, it is also single threaded, so it actually runs V
>>>>>> and U jobs sequentially instead of parallel so it's even longer
>>>>>> because of that (4 jobs stringed all in all).
>>>>>> 
>>>>>> But i will take a look, although even if i reduce solution size, it
>>>>>> will still likely not reduce running time by more than 20%.
>>>>>> 
>>>>>> On Thu, Dec 8, 2011 at 5:42 AM, David Murgatroyd <dmu...@gmail.com>
>>> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Dec 8, 2011, at 8:36 AM, Grant Ingersoll <gsing...@apache.org>
>>> wrote:
>>>>>>> 
>>>>>>>> MAHOUT-916 and 917 are attempts to address the running time of our
>>> tests.  As Sean rightfully pointed out, there are probably opportunities to
>>> simply cut down the sizes of some of these tests w/o effecting there
>>> correctness.  To that end, if people can take a look at:
>>>>>>>> https://builds.apache.org/job/Mahout-Quality/1237/testReport/junit/
>>>>>>>> 
>>>>>>>> You can get a sense as to which tests are taking a long time.  The
>>> main culprits are:
>>>>>>>> 1. Vectorizer
>>>>>>>> 2. SSVD
>>>>>>>> 3. K-Means
>>>>>>>> 4. taste.hadoop.item
>>>>>>>> 5. taste.hadoop.als
>>>>>>>> 6. PFPGrowth
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -Grant
>>>>>>>> 
>>>>>>>> --------------------------------------------
>>>>>>>> Grant Ingersoll
>>>>>>>> http://www.lucidimagination.com
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>> 
>>>>> --------------------------------------------
>>>>> Grant Ingersoll
>>>>> http://www.lucidimagination.com
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> --------------------------------------------
>>>> Grant Ingersoll
>>>> http://www.lucidimagination.com
>>>> 
>>>> 
>>>> 
>>> 
>>> --------------------------------------------
>>> Grant Ingersoll
>>> http://www.lucidimagination.com
>>> 
>>> 
>>> 
>>> 
> 
> --------------------------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
> 
> 
> 

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com



Reply via email to