which brings me to the thought, we could probably employ some
heuristics to deduce that parameter by the time ABt job runs. or just
set it sufficiently high there just for the case of ABt. Some danger
is if we set it significantly >> s, then we may overallocate some
memory we will never use. but it probably will not be very
significant. I can probably take a look.

On Fri, Dec 9, 2011 at 11:30 AM, Dmitriy Lyubimov <[email protected]> wrote:
> I don't think it would make sense for SSVD to remove MR. I mean, sure,
> we can test something like Givens solver independently, but it would
> not be testing much really.
>
> I will reduce the dimensionality there.
>
> Also, there are a lot of tests ( sparse/dense, sparse with power
> iteration and without , dense with power iteration and without... ) so
> we probably could just nix some of those and assume they "work" by
> manual enabling and verification of a committer.
>
>
>
> On Fri, Dec 9, 2011 at 9:00 AM, Grant Ingersoll <[email protected]> wrote:
>>
>> On Dec 8, 2011, at 12:55 PM, Sean Owen wrote:
>>
>>> This could well be it. While every Random everywhere gets initialized to a
>>> known initial state, at the start of every @Test method, you could get
>>> different sequences if other tests are in progress in parallel in the same
>>> JVM.
>>>
>>> Ideally tests aren't that sensitive to the sequence of random numbers -- if
>>> that's the case. And here it may well be the case.
>>>
>>> Can this be set to fork a JVM per test class? that would probably work.
>>
>> I'm no maven expert, but based on my reading of the docs and the things I've 
>> tried, it seems like "always" forking isn't going to get the parallelism we 
>> want.  On the other hand, we can't seem to run in parallel w/ fork once due 
>> to some threading issues.  What do others think?
>>
>> At the end of the day, I believe most of our performance issues are due to 
>> running full M/R jobs.  So, we either rework them to just test mappers and 
>> reducers independently and move the long running full tests to 
>> nightly/weekly tests or we go off an improve local mode in Hadoop to give 
>> better performance.
>>
>> I'd vote for the former since it is the only one we are likely to get done 
>> reasonably soon.
>>
>>>
>>> On Thu, Dec 8, 2011 at 7:43 PM, Grant Ingersoll <[email protected]> wrote:
>>>
>>>>
>>>> On Dec 8, 2011, at 2:39 PM, Grant Ingersoll wrote:
>>>>
>>>>>
>>>>> On Dec 8, 2011, at 2:23 PM, Grant Ingersoll wrote:
>>>>>
>>>>>> If I add parallel, fork always to the main surefire config, I get
>>>> failures all over the place for things like:
>>>>>> Failed tests:
>>>> testHebbianSolver(org.apache.mahout.math.decomposer.hebbian.TestHebbianSolver):
>>>> Error: {0.06146049974880152 too high! (for eigen 3)
>>>>>> consistency(org.apache.mahout.math.jet.random.NormalTest):
>>>> offset=0.000 scale=1.000 Z = 8.2
>>>>>> consistency(org.apache.mahout.math.jet.random.ExponentialTest):
>>>> offset=0.000 scale=100.000 Z = 8.7
>>>>>>
>>>>>
>>>>> Check that, it seems each run can produce different failures, which
>>>> leads me to believe we have some shared values in our tests
>>>>
>>>> Random.getRandom() the culprit, perhaps?
>>>>
>>>>>
>>>>>
>>>>>> All of these pass individually and when not in parallel for me.
>>>>>>
>>>>>> Here's my config:
>>>>>> <plugin>
>>>>>>         <groupId>org.apache.maven.plugins</groupId>
>>>>>>         <artifactId>maven-surefire-plugin</artifactId>
>>>>>>         <version>2.11</version>
>>>>>>         <configuration>
>>>>>>           <parallel>classes</parallel>
>>>>>>           <forkMode>always</forkMode>
>>>>>>           <perCoreThreadCount>true</perCoreThreadCount>
>>>>>>         </configuration>
>>>>>>       </plugin>
>>>>>>
>>>>>> Anyone else seeing that?
>>>>>>
>>>>>>
>>>>>> On Dec 8, 2011, at 1:53 PM, Dmitriy Lyubimov wrote:
>>>>>>
>>>>>>> SSVD actually runs a rather small test but it is a MR job in local
>>>>>>> mode, there's nothing to cut down there in terms of size (not much
>>>>>>> anyway). It's just what it takes to initialize and run all jobs (and
>>>>>>> since it is local, it is also single threaded, so it actually runs V
>>>>>>> and U jobs sequentially instead of parallel so it's even longer
>>>>>>> because of that (4 jobs stringed all in all).
>>>>>>>
>>>>>>> But i will take a look, although even if i reduce solution size, it
>>>>>>> will still likely not reduce running time by more than 20%.
>>>>>>>
>>>>>>> On Thu, Dec 8, 2011 at 5:42 AM, David Murgatroyd <[email protected]>
>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Dec 8, 2011, at 8:36 AM, Grant Ingersoll <[email protected]>
>>>> wrote:
>>>>>>>>
>>>>>>>>> MAHOUT-916 and 917 are attempts to address the running time of our
>>>> tests.  As Sean rightfully pointed out, there are probably opportunities to
>>>> simply cut down the sizes of some of these tests w/o effecting there
>>>> correctness.  To that end, if people can take a look at:
>>>>>>>>> https://builds.apache.org/job/Mahout-Quality/1237/testReport/junit/
>>>>>>>>>
>>>>>>>>> You can get a sense as to which tests are taking a long time.  The
>>>> main culprits are:
>>>>>>>>> 1. Vectorizer
>>>>>>>>> 2. SSVD
>>>>>>>>> 3. K-Means
>>>>>>>>> 4. taste.hadoop.item
>>>>>>>>> 5. taste.hadoop.als
>>>>>>>>> 6. PFPGrowth
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Grant
>>>>>>>>>
>>>>>>>>> --------------------------------------------
>>>>>>>>> Grant Ingersoll
>>>>>>>>> http://www.lucidimagination.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>>> --------------------------------------------
>>>>>> Grant Ingersoll
>>>>>> http://www.lucidimagination.com
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --------------------------------------------
>>>>> Grant Ingersoll
>>>>> http://www.lucidimagination.com
>>>>>
>>>>>
>>>>>
>>>>
>>>> --------------------------------------------
>>>> Grant Ingersoll
>>>> http://www.lucidimagination.com
>>>>
>>>>
>>>>
>>>>
>>
>> --------------------------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com
>>
>>
>>

Reply via email to