Re: Compiling mahout-math in 1.5-compatibility mode.

2010-01-20 Thread Jake Mannix
I would love to see mahout-math be fully 1.5 compatible.  I thought it might
be
possible when I pulled out Writable and the hadoop stuff, but I didn't
realize there
was it was so close.  Yay!

  -jake

On Wed, Jan 20, 2010 at 10:57 AM, Dawid Weiss  wrote:

> I must have compiled to 1.5-bytecode, but using 1.6 standard library.
> There are calls to Arrays#copyOf and, as far as I can tell, it's the
> only thing there that is 1.6-specific. Will file a patch for this.
>
> Dawid
>
> On Wed, Jan 20, 2010 at 7:14 PM, Dawid Weiss 
> wrote:
> >> Gee, I was sure something in there was using a 1.6 feature, perhaps of
> Arrays.
> >
> > Really? I recompiled it under 1.5... or so I thought... Might have
> > been 1.6 JRE with 1.5 compatibility switch for the produced
> > bytecode... Will look into it.
> >
> > Dawid
> >
>


Re: Build issue with last dirichlet change

2010-01-20 Thread Jeff Eastman
Whew! doing a clean build resolved the phantom test class. All tests 
pass, even mine :)



Jeff Eastman wrote:
The build compiles but org.apache.mahout.math.TestVectorWritable fails 
for some reason and it does not get to my test.



Jeff Eastman wrote:

I will run the build before I commit.
I will run the build before I commit.
...
I will run the build before I commit.

my bad


Ted Dunning wrote:

Our modules aren't working out as well as expected.

On Wed, Jan 20, 2010 at 4:56 PM, Jeff Eastman 
wrote:


 

Sean Owen wrote:

  

That last commit concerning the dirichlet code and models seems to
cause the build to fail -- or else I'm the victim of another
environment-specific issue.

I note it only because the fix raises a question. It causes core/ to
depend utils/, and I had thought that was not the idea?

Sean



  
Oh darn, my new unit test does depend upon utils for the TFIDF 
stuff. Where

should I move it to?






  










Re: Build issue with last dirichlet change

2010-01-20 Thread Jake Mannix
TestVectorWritable fails?  What is it saying in the surefire report?

   -jake

On Wed, Jan 20, 2010 at 5:42 PM, Jeff Eastman wrote:

> The build compiles but org.apache.mahout.math.TestVectorWritable fails for
> some reason and it does not get to my test.
>
>
>
> Jeff Eastman wrote:
>
>> I will run the build before I commit.
>> I will run the build before I commit.
>> ...
>> I will run the build before I commit.
>>
>> my bad
>>
>>
>> Ted Dunning wrote:
>>
>>> Our modules aren't working out as well as expected.
>>>
>>> On Wed, Jan 20, 2010 at 4:56 PM, Jeff Eastman <
>>> j...@windwardsolutions.com>wrote:
>>>
>>>
>>>
 Sean Owen wrote:



> That last commit concerning the dirichlet code and models seems to
> cause the build to fail -- or else I'm the victim of another
> environment-specific issue.
>
> I note it only because the fix raises a question. It causes core/ to
> depend utils/, and I had thought that was not the idea?
>
> Sean
>
>
>
>
>
 Oh darn, my new unit test does depend upon utils for the TFIDF stuff.
 Where
 should I move it to?



>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>


Re: Build issue with last dirichlet change

2010-01-20 Thread Jeff Eastman
The build compiles but org.apache.mahout.math.TestVectorWritable fails 
for some reason and it does not get to my test.



Jeff Eastman wrote:

I will run the build before I commit.
I will run the build before I commit.
...
I will run the build before I commit.

my bad


Ted Dunning wrote:

Our modules aren't working out as well as expected.

On Wed, Jan 20, 2010 at 4:56 PM, Jeff Eastman 
wrote:


 

Sean Owen wrote:

   

That last commit concerning the dirichlet code and models seems to
cause the build to fail -- or else I'm the victim of another
environment-specific issue.

I note it only because the fix raises a question. It causes core/ to
depend utils/, and I had thought that was not the idea?

Sean



  
Oh darn, my new unit test does depend upon utils for the TFIDF 
stuff. Where

should I move it to?






  







Re: Build issue with last dirichlet change

2010-01-20 Thread Jeff Eastman

I will run the build before I commit.
I will run the build before I commit.
...
I will run the build before I commit.

my bad


Ted Dunning wrote:

Our modules aren't working out as well as expected.

On Wed, Jan 20, 2010 at 4:56 PM, Jeff Eastman wrote:

  

Sean Owen wrote:



That last commit concerning the dirichlet code and models seems to
cause the build to fail -- or else I'm the victim of another
environment-specific issue.

I note it only because the fix raises a question. It causes core/ to
depend utils/, and I had thought that was not the idea?

Sean



  

Oh darn, my new unit test does depend upon utils for the TFIDF stuff. Where
should I move it to?






  




Re: Build issue with last dirichlet change

2010-01-20 Thread Jeff Eastman
I can easily move it into the /utils/test subtree. Tried that and it now 
builds ok. I actually like it fine there with the Lucene/text stuff.



Sean Owen wrote:

Actually, I suppose it's not a big deal if core *tests* depend on
utils. I can make that adjustment if that sounds right.

On Wed, Jan 20, 2010 at 4:56 PM, Jeff Eastman
 wrote:
  

Sean Owen wrote:


That last commit concerning the dirichlet code and models seems to
cause the build to fail -- or else I'm the victim of another
environment-specific issue.

I note it only because the fix raises a question. It causes core/ to
depend utils/, and I had thought that was not the idea?

Sean


  

Oh darn, my new unit test does depend upon utils for the TFIDF stuff. Where
should I move it to?




  




Re: Build issue with last dirichlet change

2010-01-20 Thread Sean Owen
I don't think there's any practical or conceptual problem with core
*tests* depending on utils.

On Wed, Jan 20, 2010 at 5:13 PM, Benson Margulies  wrote:
> You may have hit a classic maven annoyance which is shared test code.
> Generally, people seem to end up with more modules than they expected
> to.


Re: Build issue with last dirichlet change

2010-01-20 Thread Benson Margulies
You may have hit a classic maven annoyance which is shared test code.
Generally, people seem to end up with more modules than they expected
to.

On Wed, Jan 20, 2010 at 8:06 PM, Ted Dunning  wrote:
> Our modules aren't working out as well as expected.
>
> On Wed, Jan 20, 2010 at 4:56 PM, Jeff Eastman 
> wrote:
>
>> Sean Owen wrote:
>>
>>> That last commit concerning the dirichlet code and models seems to
>>> cause the build to fail -- or else I'm the victim of another
>>> environment-specific issue.
>>>
>>> I note it only because the fix raises a question. It causes core/ to
>>> depend utils/, and I had thought that was not the idea?
>>>
>>> Sean
>>>
>>>
>>>
>> Oh darn, my new unit test does depend upon utils for the TFIDF stuff. Where
>> should I move it to?
>>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>


Re: Build issue with last dirichlet change

2010-01-20 Thread Ted Dunning
Our modules aren't working out as well as expected.

On Wed, Jan 20, 2010 at 4:56 PM, Jeff Eastman wrote:

> Sean Owen wrote:
>
>> That last commit concerning the dirichlet code and models seems to
>> cause the build to fail -- or else I'm the victim of another
>> environment-specific issue.
>>
>> I note it only because the fix raises a question. It causes core/ to
>> depend utils/, and I had thought that was not the idea?
>>
>> Sean
>>
>>
>>
> Oh darn, my new unit test does depend upon utils for the TFIDF stuff. Where
> should I move it to?
>



-- 
Ted Dunning, CTO
DeepDyve


Re: Build issue with last dirichlet change

2010-01-20 Thread Sean Owen
Actually, I suppose it's not a big deal if core *tests* depend on
utils. I can make that adjustment if that sounds right.

On Wed, Jan 20, 2010 at 4:56 PM, Jeff Eastman
 wrote:
> Sean Owen wrote:
>>
>> That last commit concerning the dirichlet code and models seems to
>> cause the build to fail -- or else I'm the victim of another
>> environment-specific issue.
>>
>> I note it only because the fix raises a question. It causes core/ to
>> depend utils/, and I had thought that was not the idea?
>>
>> Sean
>>
>>
>
> Oh darn, my new unit test does depend upon utils for the TFIDF stuff. Where
> should I move it to?
>


Re: Build issue with last dirichlet change

2010-01-20 Thread Jeff Eastman

Sean Owen wrote:

That last commit concerning the dirichlet code and models seems to
cause the build to fail -- or else I'm the victim of another
environment-specific issue.

I note it only because the fix raises a question. It causes core/ to
depend utils/, and I had thought that was not the idea?

Sean

  
Oh darn, my new unit test does depend upon utils for the TFIDF stuff. 
Where should I move it to?


Build issue with last dirichlet change

2010-01-20 Thread Sean Owen
That last commit concerning the dirichlet code and models seems to
cause the build to fail -- or else I'm the victim of another
environment-specific issue.

I note it only because the fix raises a question. It causes core/ to
depend utils/, and I had thought that was not the idea?

Sean


Re: [math] collections cooked?

2010-01-20 Thread Dawid Weiss
I think your suggestion makes a lot of sense.

D.

On Wed, Jan 20, 2010 at 8:54 PM, Benson Margulies  wrote:
> On Wed, Jan 20, 2010 at 2:04 PM, Ted Dunning  wrote:
>> I think that is the brave and wise choice.
>
> If no one objects in the next day or so, I'll set up a patch to do the
> splitting.
>
>>
>> Ultimately, we may want to add a few more math-y things for the use by
>> Mahout.  Notable among these in my mind are probability distributions.
>>
>> On Wed, Jan 20, 2010 at 9:53 AM, Benson Margulies 
>> wrote:
>>
>>>  Benson (and others?), what's our long-term vision
>>> > of moving this stuff under Mahout -- are we to replace Colt's
>>> > collections, make them live side by side, yet another thing?
>>>
>>> My Own Opinion:
>>>
>>> a- Split the math module into three:
>>>
>>> 1) scalar math
>>> 2) collections
>>> 3) vectors
>>>
>>> b- replace collections with your stuff, and discard the Colt collections.
>>>
>>
>>
>>
>> --
>> Ted Dunning, CTO
>> DeepDyve
>>
>


Re: Compiling mahout-math in 1.5-compatibility mode.

2010-01-20 Thread Benson Margulies
Thus animal-sniffer.

On Wed, Jan 20, 2010 at 1:57 PM, Dawid Weiss  wrote:
> I must have compiled to 1.5-bytecode, but using 1.6 standard library.
> There are calls to Arrays#copyOf and, as far as I can tell, it's the
> only thing there that is 1.6-specific. Will file a patch for this.
>
> Dawid
>
> On Wed, Jan 20, 2010 at 7:14 PM, Dawid Weiss  wrote:
>>> Gee, I was sure something in there was using a 1.6 feature, perhaps of 
>>> Arrays.
>>
>> Really? I recompiled it under 1.5... or so I thought... Might have
>> been 1.6 JRE with 1.5 compatibility switch for the produced
>> bytecode... Will look into it.
>>
>> Dawid
>>
>


[jira] Commented: (MAHOUT-263) Matrix interface should extend Iterable for better integration with distributed storage

2010-01-20 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802960#action_12802960
 ] 

Ted Dunning commented on MAHOUT-263:


The idea of iterating through inputs sequentially is absolutely key to good 
performance on sequential algorithms with good abstraction.

Some algorithms need to permute inputs to some degree, but that is easily 
handled to a moderate degree by buffering some number of rows and presenting 
them in randomized order.

{quote}
Question is: should this go for all Matrices, or just SparseRowMatrix?  It's 
really tricky to have a matrix which is iterable both as sparse rows *and* 
sparse columns.  I guess the point would be that by default, it iterates over 
rows, unless it's SparseColumnMatrix, which obviously iterates over columns.

Thoughts?  Having to rely on random-access to a distributed-backed matrix is 
making me jump through silly extra hoops on some of the stuff I'm working on 
patches for.
{quote}
My feeling is that we don't need to support iteration both ways.  It would be 
nice, but the performance hit is so prodigious that it just isn't worth it.  In 
the past, where I needed to support column and row access, generally stored two 
copies each optimized for a different kind of access.  This is very rarely 
needed since most algorithms are very row or column centric and the data can be 
transposed (and thus permuted to match the desired access pattern) ahead of 
time to accommodate the need.  In many cases, the loop nesting can be 
rearranged as well to allow sequential row access to serve as well as column 
access, especially if map-reduce can be used to rearrange intermediate products.
  

> Matrix interface should extend Iterable for better integration with 
> distributed storage
> ---
>
> Key: MAHOUT-263
> URL: https://issues.apache.org/jira/browse/MAHOUT-263
> Project: Mahout
>  Issue Type: Improvement
>  Components: Math
>Affects Versions: 0.2
> Environment: all
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Fix For: 0.3
>
>
> Many sparse algorithms for dealing with Matrices just make sequential passes 
> over the data, but don't need to see the entire matrix at once.  The way they 
> would be implemented currently is:
> {code}
> Matrix m = getInputCorpus();
> for(int i=0; i   Vector v = m.getRow(i);
>   doStuffWithRow(v); 
> }
> {code}
> When the Matrix is backed essentially by a SequenceFile, 
> this algorithm outline doesn't make sense, because it requires lots of 
> sequential random access reads.  What makes more sense, and works for 
> in-memory matrices too, is something like the following:
> {code}
> public interface Matrix extends Iterable { 
> {code}
> which allows for algorithms which only need iterators over Vectors do use 
> them as such:
> {code}
> Matrix m = getInputCorpus();
> Iterator it = m.iterator();
> Vector v;
> while(it.hasNext() && (v = it.next()) != null) {
>   doStuffWithRow(v); 
> }
> {code}
> The Iterator interface could be easily implemented in the AbstractMatrix base 
> class, so implementing this idea would be transparent to all current Mahout 
> code.  Additionally, pulling out two layers of AbstractMatrix - one which 
> only knows how to do the things which can be done using iterators (like 
> times(Vector), timesSquared(Vector), plus(Matrix), assignRow(), etc...), 
> which would be the direct base class for DistributedMatrix (or HDFSMatrix), 
> but all the random-access matrix methods currently in AbstractMatrix would go 
> in another abstract base class of the first one (which could be called 
> AbstractVectorIterable, say).
> I think Iteratable could be made more flexible by extending that to a 
> new interface VectorIterable, which provided iterateAll() and 
> iterateNonEmpty(), in case document Ids were sparse, and could also allow for 
> the possibility of adding other methods (things like skipTo(int rowNum), 
> perhaps).  
> Question is: should this go for all Matrices, or just SparseRowMatrix?  It's 
> really tricky to have a matrix which is iterable both as sparse rows *and* 
> sparse columns.  I guess the point would be that by default, it iterates over 
> rows, unless it's SparseColumnMatrix, which obviously iterates over columns.
> Thoughts?  Having to rely on random-access to a distributed-backed matrix is 
> making me jump through silly extra hoops on some of the stuff I'm working on 
> patches for.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAHOUT-265) Error with creating MVC from Lucene Index or Arff

2010-01-20 Thread Jerry Ye (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry Ye updated MAHOUT-265:


Summary: Error with creating MVC from Lucene Index or Arff  (was: 
org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73))

> Error with creating MVC from Lucene Index or Arff
> -
>
> Key: MAHOUT-265
> URL: https://issues.apache.org/jira/browse/MAHOUT-265
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.3
> Environment: RHEL 2.6, Mac OS 10.6.2
>Reporter: Jerry Ye
>
> I'm getting the following error when trying to create vectors from a Solr 
> index.  I've also tried using the arff to mvc utility and I'm getting the 
> exact same error.
> Exception in thread "main" java.lang.NullPointerException
> at 
> org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
> at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:910)
> at 
> org.apache.hadoop.io.SequenceFile$RecordCompressWriter.(SequenceFile.java:1074)
> at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:397)
> at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:284)
> at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:265)
> at 
> org.apache.mahout.utils.vectors.lucene.Driver.getSeqFileWriter(Driver.java:226)
> at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:197)
> I'm getting this error with revision 901336 but not with revision 897299

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [math] collections cooked?

2010-01-20 Thread Benson Margulies
On Wed, Jan 20, 2010 at 2:04 PM, Ted Dunning  wrote:
> I think that is the brave and wise choice.

If no one objects in the next day or so, I'll set up a patch to do the
splitting.

>
> Ultimately, we may want to add a few more math-y things for the use by
> Mahout.  Notable among these in my mind are probability distributions.
>
> On Wed, Jan 20, 2010 at 9:53 AM, Benson Margulies 
> wrote:
>
>>  Benson (and others?), what's our long-term vision
>> > of moving this stuff under Mahout -- are we to replace Colt's
>> > collections, make them live side by side, yet another thing?
>>
>> My Own Opinion:
>>
>> a- Split the math module into three:
>>
>> 1) scalar math
>> 2) collections
>> 3) vectors
>>
>> b- replace collections with your stuff, and discard the Colt collections.
>>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>


[jira] Created: (MAHOUT-265) org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)

2010-01-20 Thread Jerry Ye (JIRA)
org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)


 Key: MAHOUT-265
 URL: https://issues.apache.org/jira/browse/MAHOUT-265
 Project: Mahout
  Issue Type: Bug
Affects Versions: 0.3
 Environment: RHEL 2.6, Mac OS 10.6.2
Reporter: Jerry Ye


I'm getting the following error when trying to create vectors from a Solr 
index.  I've also tried using the arff to mvc utility and I'm getting the exact 
same error.

Exception in thread "main" java.lang.NullPointerException
at 
org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:910)
at 
org.apache.hadoop.io.SequenceFile$RecordCompressWriter.(SequenceFile.java:1074)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:397)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:284)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:265)
at 
org.apache.mahout.utils.vectors.lucene.Driver.getSeqFileWriter(Driver.java:226)
at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:197)

I'm getting this error with revision 901336 but not with revision 897299


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [math] collections cooked?

2010-01-20 Thread Ted Dunning
I think that is the brave and wise choice.

Ultimately, we may want to add a few more math-y things for the use by
Mahout.  Notable among these in my mind are probability distributions.

On Wed, Jan 20, 2010 at 9:53 AM, Benson Margulies wrote:

>  Benson (and others?), what's our long-term vision
> > of moving this stuff under Mahout -- are we to replace Colt's
> > collections, make them live side by side, yet another thing?
>
> My Own Opinion:
>
> a- Split the math module into three:
>
> 1) scalar math
> 2) collections
> 3) vectors
>
> b- replace collections with your stuff, and discard the Colt collections.
>



-- 
Ted Dunning, CTO
DeepDyve


[jira] Updated: (MAHOUT-264) Make mahout-math compatible with Java 1.5 (bytecode and standard library).

2010-01-20 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated MAHOUT-264:
---

Attachment: MAHOUT-264.patch

As far a I can tell, this patch solves this issue. You still _must_ compile 
with a java 1.6 compiler (@Override annotations on interface methods), but the 
resulting bytecode is 1.5 compatible and does not use 1.6-specific API.

> Make mahout-math compatible with Java 1.5 (bytecode and standard library).
> --
>
> Key: MAHOUT-264
> URL: https://issues.apache.org/jira/browse/MAHOUT-264
> Project: Mahout
>  Issue Type: Wish
>  Components: Math
>Reporter: Dawid Weiss
>Assignee: Benson Margulies
>Priority: Minor
> Attachments: MAHOUT-264.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAHOUT-264) Make mahout-math compatible with Java 1.5 (bytecode and standard library).

2010-01-20 Thread Dawid Weiss (JIRA)
Make mahout-math compatible with Java 1.5 (bytecode and standard library).
--

 Key: MAHOUT-264
 URL: https://issues.apache.org/jira/browse/MAHOUT-264
 Project: Mahout
  Issue Type: Wish
  Components: Math
Reporter: Dawid Weiss
Assignee: Benson Margulies
Priority: Minor




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Compiling mahout-math in 1.5-compatibility mode.

2010-01-20 Thread Dawid Weiss
I must have compiled to 1.5-bytecode, but using 1.6 standard library.
There are calls to Arrays#copyOf and, as far as I can tell, it's the
only thing there that is 1.6-specific. Will file a patch for this.

Dawid

On Wed, Jan 20, 2010 at 7:14 PM, Dawid Weiss  wrote:
>> Gee, I was sure something in there was using a 1.6 feature, perhaps of 
>> Arrays.
>
> Really? I recompiled it under 1.5... or so I thought... Might have
> been 1.6 JRE with 1.5 compatibility switch for the produced
> bytecode... Will look into it.
>
> Dawid
>


Re: Compiling mahout-math in 1.5-compatibility mode.

2010-01-20 Thread Dawid Weiss
> Gee, I was sure something in there was using a 1.6 feature, perhaps of Arrays.

Really? I recompiled it under 1.5... or so I thought... Might have
been 1.6 JRE with 1.5 compatibility switch for the produced
bytecode... Will look into it.

Dawid


Re: [math] collections cooked?

2010-01-20 Thread Benson Margulies
 Benson (and others?), what's our long-term vision
> of moving this stuff under Mahout -- are we to replace Colt's
> collections, make them live side by side, yet another thing?

My Own Opinion:

a- Split the math module into three:

1) scalar math
2) collections
3) vectors

b- replace collections with your stuff, and discard the Colt collections.


Re: Compiling mahout-math in 1.5-compatibility mode.

2010-01-20 Thread Benson Margulies
Gee, I was sure something in there was using a 1.6 feature, perhaps of Arrays.

I tried to make animal-sniffer work and failed recently, care to take
a shot at that?

On Wed, Jan 20, 2010 at 11:30 AM, Dawid Weiss  wrote:
> Hi. Is it possible to compile mahout-math in 1.5-compatibility mode?
> This would require adding compiler plugin rules to POM. Mahout-math
> does not use any of the Java 1.6-specific API, I checked.
>
> Dawid
>


Compiling mahout-math in 1.5-compatibility mode.

2010-01-20 Thread Dawid Weiss
Hi. Is it possible to compile mahout-math in 1.5-compatibility mode?
This would require adding compiler plugin rules to POM. Mahout-math
does not use any of the Java 1.6-specific API, I checked.

Dawid


Re: [math] collections cooked?

2010-01-20 Thread Dawid Weiss
I have integrated HPPC collections with our open source and commercial
stuff, replacing PCJ. All tests pass, which is a good sign in addition
to the tests already included in HPPC.

The code is temporarily released in Carrot2 SVN at:
https://carrot2.svn.sourceforge.net/svnroot/carrot2/labs/hppc/hppc-core

This is still early stage stuff -- there are many things I would like
to add, but either haven't found a nice way to do so or don't know how
to do efficiently. Benson (and others?), what's our long-term vision
of moving this stuff under Mahout -- are we to replace Colt's
collections, make them live side by side, yet another thing? My
initial preference would be to have these classes stripped of anything
that is not essential and build upon them (adapters to Java
collections, for example). The collections in HPPC are fairly
low-level, but I had very little problems moving from PCJ code, which
is a good sign I guess.

Dawid

On Mon, Jan 18, 2010 at 4:40 PM, Benson Margulies  wrote:
> I think I might be done with collections. I can't work up any
> enthusiasm for iterators, or java.util. decorators, and I think I have
> the basic functionality all in place. There are a number of perhaps
> pointless ways in which Colt diverges from Java collections,
> particularly in the area of return values from things like 'put' and
> 'remove'. If anyone can stir up an opinion against the current state
> of the code, I'm game to make it more like Java collections.
>
> Otherwise, give or take bugs, I'm ready to watch Dawid render all this moot.
>


Re: Tapioca anyone (fisheye)

2010-01-20 Thread Isabel Drost
On Sun Benson Margulies  wrote:
> http://fisheye6.atlassian.com/browse/mahout

Thanks for fisheye integration.

Isabel


[jira] Updated: (MAHOUT-180) port Hadoop-ified Lanczos SVD implementation from decomposer

2010-01-20 Thread Jake Mannix (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jake Mannix updated MAHOUT-180:
---

Attachment: MAHOUT-180.patch

This patch actually works, has disentangled unit tests (one for Lanczos, one 
for Hebbian, and a base junit test for them to share), and scaled down test 
parameters to run without a gazillion bytes of RAM available.

Also includes a 90% solution for MAHOUT-211.

If I made the patch correctly, that is.

> port Hadoop-ified Lanczos SVD implementation from decomposer
> 
>
> Key: MAHOUT-180
> URL: https://issues.apache.org/jira/browse/MAHOUT-180
> Project: Mahout
>  Issue Type: New Feature
>  Components: Math
>Affects Versions: 0.2
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.3
>
> Attachments: MAHOUT-180.patch, MAHOUT-180.patch
>
>
> I wrote up a hadoop version of the Lanczos algorithm for performing SVD on 
> sparse matrices available at http://decomposer.googlecode.com/, which is 
> Apache-licensed, and I'm willing to donate it.  I'll have to port over the 
> implementation to use Mahout vectors, or else add in these vectors as well.
> Current issues with the decomposer implementation include: if your matrix is 
> really big, you need to re-normalize before decomposition: find the largest 
> eigenvalue first, and divide all your rows by that value, then decompose, or 
> else you'll blow over Double.MAX_VALUE once you've run too many iterations 
> (the L^2 norm of intermediate vectors grows roughly as 
> (largest-eigenvalue)^(num-eigenvalues-found-so-far), so losing precision on 
> the lower end is better than blowing over MAX_VALUE).  When this is ported to 
> Mahout, we should add in the capability to do this automatically (run a 
> couple iterations to find the largest eigenvalue, save that, then iterate 
> while scaling vectors by 1/max_eigenvalue).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.