Re : Welcome the newest Mahouts!

2009-09-15 Thread deneche abdelhakim
Got my Apache account yesterday 8D

being a coder I always find it different to write other things than code =P, so 
my biography will probably be weird:

I am an algerian PhD student, I'm expecting to use machine learning algorithms 
(probably evolutionary computing) and distributed computing (mahout? maybe). 
During my master I worked on Artificial Immune Systems applied to pattern 
recognition.
I like coding, mainly in Java, but also in C# (although being at pro-noob level 
in C#).
The past two years I learned a lot with mahout's community, and I'm looking 
forward to learn much more.



--- En date de : Mer 26.8.09, Grant Ingersoll  a écrit :

> De: Grant Ingersoll 
> Objet: Welcome the newest Mahouts!
> À: mahout-u...@lucene.apache.org, "Mahout Dev List" 
> 
> Date: Mercredi 26 Août 2009, 16h57
> I am pleased to announce that the
> Lucene PMC has voted to add Deneche Abdelhakim, Robin Anil
> and David Hall as Mahout committers.  Deneche, Robin
> and David have all made significant contributions to Mahout
> in regards to classification, clustering, evolutionary
> programming and general usage and utilities. 
> Furthermore, all three are or have been pursuing studies in
> machine learning at University, so we look for more great
> things as well!
> 
> I hope you will join me in extending them a warm
> welcome.  I know I look forward to working with them
> and continuing to build on Mahout's capabilities on our way
> to a 1.0 release.
> 
> Also, it is customary that each new committer take the time
> to introduce themselves on the mailing list with a brief
> bio/background so we can all better get to know you.
> 
> Finally, if you're interested in knowing more about what's
> involved in becoming a committer or would simply like to
> contribute to Mahout, see http://cwiki.apache.org/MAHOUT/howtocontribute.html 
> and
> http://cwiki.apache.org/MAHOUT/howtobecomeacommitter.html.
> 
> Congrats to Deneche, Robin and David!
> 
> -Grant
> 





Updating the Web site

2009-09-15 Thread deneche abdelhakim
I followed the instructions available here:

http://cwiki.apache.org/MAHOUT/howtoupdatethewebsite.html

in order to add my name to the committer list =P

when running 'forrest run' but I'm getting broken links:

X [0] skin/images/current.gif   
  BROKEN: /home/hakim/apache-forrest-0.8/main/webapp/. (Is a directory)
X [0] skin/images/page.gif  
  BROKEN: /home/hakim/apache-forrest-0.8/main/webapp/. (Is a directory)
X [0] skin/images/chapter.gif   
  BROKEN: /home/hakim/apache-forrest-0.8/main/webapp/. (Is a directory)

it also sais that "Your site would still be generated, but some pages would be 
broken."

svn status shows me that I only modified 
"src/documentation/content/xdocs/whoweare.xml"

can I proceed anyway and copy the site to the publish directory ? 





Re: Re : Welcome the newest Mahouts!

2009-09-15 Thread Isabel Drost
On Tue, 15 Sep 2009 10:11:56 + (GMT)
deneche abdelhakim  wrote:

> Got my Apache account yesterday 8D

Congratulations! And a warm welcome from me of course.


> I am an algerian PhD student, I'm expecting to use machine learning
> algorithms (probably evolutionary computing) and distributed
> computing (mahout? maybe).

Can you tell more on what you will be working on, which problems you
are trying to solve?


> The past two years I learned a lot with mahout's community, and I'm
> looking forward to learn much more.

Hope you'll enjoy your time here.

Isabel


Re: Re : Welcome the newest Mahouts!

2009-09-15 Thread deneche abdelhakim
> Can you tell more on what you will be working on, which
> problems you
> are trying to solve?

I'm expecting to work on Discrete Tomography, probably reconstruction 
algorithms. But the final decision isn't not mine, so I may end up working on 
something else =P

--- En date de : Mar 15.9.09, Isabel Drost  a écrit :

> De: Isabel Drost 
> Objet: Re: Re : Welcome the newest Mahouts!
> À: mahout-dev@lucene.apache.org
> Date: Mardi 15 Septembre 2009, 12h29
> On Tue, 15 Sep 2009 10:11:56 +
> (GMT)
> deneche abdelhakim 
> wrote:
> 
> > Got my Apache account yesterday 8D
> 
> Congratulations! And a warm welcome from me of course.
> 
> 
> > I am an algerian PhD student, I'm expecting to use
> machine learning
> > algorithms (probably evolutionary computing) and
> distributed
> > computing (mahout? maybe).
> 
> Can you tell more on what you will be working on, which
> problems you
> are trying to solve?
> 
> 
> > The past two years I learned a lot with mahout's
> community, and I'm
> > looking forward to learn much more.
> 
> Hope you'll enjoy your time here.
> 
> Isabel
> 





Re: Updating the Web site

2009-09-15 Thread Grant Ingersoll
Forrest has a bug w/ JDK 1.6, just switch to 1.5 for it and it should  
work.


On Sep 15, 2009, at 6:24 AM, deneche abdelhakim wrote:


I followed the instructions available here:

http://cwiki.apache.org/MAHOUT/howtoupdatethewebsite.html

in order to add my name to the committer list =P

when running 'forrest run' but I'm getting broken links:

X [0] skin/images/current.gif   
 BROKEN: /home/hakim/apache-forrest-0.8/main/webapp/. (Is a directory)
X [0] skin/images/page.gif  
 BROKEN: /home/hakim/apache-forrest-0.8/main/webapp/. (Is a directory)
X [0] skin/images/chapter.gif   
 BROKEN: /home/hakim/apache-forrest-0.8/main/webapp/. (Is a directory)

it also sais that "Your site would still be generated, but some  
pages would be broken."


svn status shows me that I only modified "src/documentation/content/ 
xdocs/whoweare.xml"


can I proceed anyway and copy the site to the publish directory ?







Re: Updating the Web site

2009-09-15 Thread deneche abdelhakim
I'm already using Java 1.5 !

--- En date de : Mar 15.9.09, Grant Ingersoll  a écrit :

> De: Grant Ingersoll 
> Objet: Re: Updating the Web site
> À: mahout-dev@lucene.apache.org
> Date: Mardi 15 Septembre 2009, 12h54
> Forrest has a bug w/ JDK 1.6, just
> switch to 1.5 for it and it should  
> work.
> 
> On Sep 15, 2009, at 6:24 AM, deneche abdelhakim wrote:
> 
> > I followed the instructions available here:
> >
> > http://cwiki.apache.org/MAHOUT/howtoupdatethewebsite.html
> >
> > in order to add my name to the committer list =P
> >
> > when running 'forrest run' but I'm getting broken
> links:
> >
> > X [0] skin/images/current.gif    
> >  BROKEN:
> /home/hakim/apache-forrest-0.8/main/webapp/. (Is a
> directory)
> > X [0] skin/images/page.gif    
> >  BROKEN:
> /home/hakim/apache-forrest-0.8/main/webapp/. (Is a
> directory)
> > X [0] skin/images/chapter.gif    
> >  BROKEN:
> /home/hakim/apache-forrest-0.8/main/webapp/. (Is a
> directory)
> >
> > it also sais that "Your site would still be generated,
> but some  
> > pages would be broken."
> >
> > svn status shows me that I only modified
> "src/documentation/content/ 
> > xdocs/whoweare.xml"
> >
> > can I proceed anyway and copy the site to the publish
> directory ?
> >
> >
> >
> 
> 





Re: Updating the Web site

2009-09-15 Thread Grant Ingersoll
Hmm, make sure you have proper permissions to write on the forrest  
install.  I believe Forrest downloads stuff to its directories.  I  
recall seeing similar things.  Very annoying.


On Sep 15, 2009, at 7:12 AM, deneche abdelhakim wrote:


I'm already using Java 1.5 !

--- En date de : Mar 15.9.09, Grant Ingersoll   
a écrit :



De: Grant Ingersoll 
Objet: Re: Updating the Web site
À: mahout-dev@lucene.apache.org
Date: Mardi 15 Septembre 2009, 12h54
Forrest has a bug w/ JDK 1.6, just
switch to 1.5 for it and it should
work.

On Sep 15, 2009, at 6:24 AM, deneche abdelhakim wrote:


I followed the instructions available here:

http://cwiki.apache.org/MAHOUT/howtoupdatethewebsite.html

in order to add my name to the committer list =P

when running 'forrest run' but I'm getting broken

links:


X [0] skin/images/current.gif
  BROKEN:

/home/hakim/apache-forrest-0.8/main/webapp/. (Is a
directory)

X [0] skin/images/page.gif
  BROKEN:

/home/hakim/apache-forrest-0.8/main/webapp/. (Is a
directory)

X [0] skin/images/chapter.gif
  BROKEN:

/home/hakim/apache-forrest-0.8/main/webapp/. (Is a
directory)


it also sais that "Your site would still be generated,

but some

pages would be broken."

svn status shows me that I only modified

"src/documentation/content/

xdocs/whoweare.xml"

can I proceed anyway and copy the site to the publish

directory ?













--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



[jira] Created: (MAHOUT-178) Rationalize 'utils' and 'common' stuff

2009-09-15 Thread Sean Owen (JIRA)
Rationalize 'utils' and 'common' stuff
--

 Key: MAHOUT-178
 URL: https://issues.apache.org/jira/browse/MAHOUT-178
 Project: Mahout
  Issue Type: Improvement
Affects Versions: 0.1
Reporter: Sean Owen
Assignee: Sean Owen
Priority: Minor


Every project needs a common area for code that is not obviously part of any 
specific piece of the project, typically because it's used in many places. This 
is good as it promotes reuse. I would like to make an explicit effort to 
rationalize this project's approach to 'common', starting with some basic 
reshuffling, which will then pave the way to unify more of the code that is 
duplicated now (thinking: caches, distance measures, Hadoop integration, etc.)

Right now we have this common code in three places, when it seems like there 
should be basically one:
- mahout-core: org.apache.mahout.utils
- mahout-core: org.apache.mahout.common
- mahout-utils

I suggest that of the two packages named above, 'common' is slightly 
preferable; one could easily just merge these packages. I also would like to 
ask whether it makes sense to have a mahout-utils module? It's like having a 
mahout-core-core, in my opinion. It appears to serve exactly the same role as 
the other utils/common package. Would it ever be used as a standalone build 
product?

Renaming may sound like a trivial change, but I think the above is merely 
symptomatic of several developers having independent ideas about where to stash 
common stuff. I want to force the issue and push everyone's stuff together to 
begin the hard but necessary work of refactoring the code base into something 
more unified.


So far, I propose pushing all code together into org.apache.mahout.common. This 
is enough of a big-bang that will break patches that I want to propose it, and 
if agreed, plan when to commit.

(Also, shouldn't stuff like the distance measure classes be in a package?)

Anyway, partial patch will be attached shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAHOUT-145) PartialData mapreduce Random Forests

2009-09-15 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim updated MAHOUT-145:


Attachment: partial_Sep_15.patch

* DONE: no need to load the whole dataset in memory just to extract the labels, 
this should help when dealing with large datasets 

> PartialData mapreduce Random Forests
> 
>
> Key: MAHOUT-145
> URL: https://issues.apache.org/jira/browse/MAHOUT-145
> Project: Mahout
>  Issue Type: New Feature
>  Components: Classification
>Affects Versions: 0.2
>Reporter: Deneche A. Hakim
>Priority: Minor
> Fix For: 0.2
>
> Attachments: partial_August_10.patch, partial_August_13.patch, 
> partial_August_15.patch, partial_August_17.patch, partial_August_19.patch, 
> partial_August_2.patch, partial_August_24.patch, partial_August_27.patch, 
> partial_August_31.patch, partial_August_9.patch, partial_Sep_15.patch
>
>
> This implementation is based on a suggestion by Ted:
> "modify the original algorithm to build multiple trees for different portions 
> of the data. That loses some of the solidity of the original method, but 
> could actually do better if the splits exposed non-stationary behavior."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



JIRA permission ?

2009-09-15 Thread deneche abdelhakim
now that I'm a committer ( 8D ) I suppose I can assign JIRA issues to myself. 
Do I need a special permission to do that ? because I'm not able to find a way 
to do it =P






Re: JIRA permission ?

2009-09-15 Thread Isabel Drost
On Tue, 15 Sep 2009 14:52:28 + (GMT)
deneche abdelhakim  wrote:

> now that I'm a committer ( 8D ) I suppose I can assign JIRA issues to
> myself. Do I need a special permission to do that ? because I'm not
> able to find a way to do it =P

I added you as committer to jira. You should be able to assign JIRA
issues to yourself now.

Isabel


[jira] Assigned: (MAHOUT-145) PartialData mapreduce Random Forests

2009-09-15 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim reassigned MAHOUT-145:
---

Assignee: Deneche A. Hakim

> PartialData mapreduce Random Forests
> 
>
> Key: MAHOUT-145
> URL: https://issues.apache.org/jira/browse/MAHOUT-145
> Project: Mahout
>  Issue Type: New Feature
>  Components: Classification
>Affects Versions: 0.2
>Reporter: Deneche A. Hakim
>Assignee: Deneche A. Hakim
>Priority: Minor
> Fix For: 0.2
>
> Attachments: partial_August_10.patch, partial_August_13.patch, 
> partial_August_15.patch, partial_August_17.patch, partial_August_19.patch, 
> partial_August_2.patch, partial_August_24.patch, partial_August_27.patch, 
> partial_August_31.patch, partial_August_9.patch, partial_Sep_15.patch
>
>
> This implementation is based on a suggestion by Ted:
> "modify the original algorithm to build multiple trees for different portions 
> of the data. That loses some of the solidity of the original method, but 
> could actually do better if the splits exposed non-stationary behavior."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAHOUT-140) In-memory mapreduce Random Forests

2009-09-15 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim reassigned MAHOUT-140:
---

Assignee: Deneche A. Hakim

> In-memory mapreduce Random Forests
> --
>
> Key: MAHOUT-140
> URL: https://issues.apache.org/jira/browse/MAHOUT-140
> Project: Mahout
>  Issue Type: New Feature
>  Components: Classification
>Affects Versions: 0.2
>Reporter: Deneche A. Hakim
>Assignee: Deneche A. Hakim
>Priority: Minor
> Fix For: 0.2
>
> Attachments: inmem_July19_patch.diff, mapred_jul12.diff, 
> mapred_patch.diff
>
>
> Each mapper is responsible for growing a number of trees with a whole copy of 
> the dataset loaded in memory, it uses the reference implementation's code to 
> build each tree and estimate the oob error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: JIRA permission ?

2009-09-15 Thread deneche abdelhakim
Thanks!

--- En date de : Mar 15.9.09, Isabel Drost  a écrit :

> De: Isabel Drost 
> Objet: Re: JIRA permission ?
> À: mahout-dev@lucene.apache.org
> Date: Mardi 15 Septembre 2009, 17h23
> On Tue, 15 Sep 2009 14:52:28 +
> (GMT)
> deneche abdelhakim 
> wrote:
> 
> > now that I'm a committer ( 8D ) I suppose I can assign
> JIRA issues to
> > myself. Do I need a special permission to do that ?
> because I'm not
> > able to find a way to do it =P
> 
> I added you as committer to jira. You should be able to
> assign JIRA
> issues to yourself now.
> 
> Isabel
> 





[jira] Assigned: (MAHOUT-122) Random Forests Reference Implementation

2009-09-15 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim reassigned MAHOUT-122:
---

Assignee: Deneche A. Hakim

> Random Forests Reference Implementation
> ---
>
> Key: MAHOUT-122
> URL: https://issues.apache.org/jira/browse/MAHOUT-122
> Project: Mahout
>  Issue Type: Task
>  Components: Classification
>Affects Versions: 0.2
>Reporter: Deneche A. Hakim
>Assignee: Deneche A. Hakim
> Fix For: 0.2
>
> Attachments: 2w_patch.diff, 3w_patch.diff, refimp_Jul6.diff, 
> refimp_Jul7.diff, RF reference.patch
>
>
> This is the first step of my GSOC project. Implement a simple, easy to 
> understand, reference implementation of Random Forests (Building and 
> Classification). The only requirement here is that "it works"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAHOUT-145) PartialData mapreduce Random Forests

2009-09-15 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim updated MAHOUT-145:


Status: Patch Available  (was: Open)

* This patch also includes 
[MAHOUT-140|https://issues.apache.org/jira/browse/MAHOUT-140] and 
[MAHOUT-122|https://issues.apache.org/jira/browse/MAHOUT-122].
* in-mem and partial implementations are available for Hadoop 0.19.1 
(org.apache.mahout.df.mapred.*) and Hadoop 0.20.0 
(org.apache.mahout.df.mapreduce)
* this code is not yet integrated with mahout's classifiers. I shall start on 
it, but not in time for mahout 0.2.0 


> PartialData mapreduce Random Forests
> 
>
> Key: MAHOUT-145
> URL: https://issues.apache.org/jira/browse/MAHOUT-145
> Project: Mahout
>  Issue Type: New Feature
>  Components: Classification
>Affects Versions: 0.2
>Reporter: Deneche A. Hakim
>Assignee: Deneche A. Hakim
>Priority: Minor
> Fix For: 0.2
>
> Attachments: partial_August_10.patch, partial_August_13.patch, 
> partial_August_15.patch, partial_August_17.patch, partial_August_19.patch, 
> partial_August_2.patch, partial_August_24.patch, partial_August_27.patch, 
> partial_August_31.patch, partial_August_9.patch, partial_Sep_15.patch
>
>
> This implementation is based on a suggestion by Ted:
> "modify the original algorithm to build multiple trees for different portions 
> of the data. That loses some of the solidity of the original method, but 
> could actually do better if the splits exposed non-stationary behavior."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAHOUT-157) Frequent Pattern Mining using Parallel FP-Growth

2009-09-15 Thread Robin Anil (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robin Anil updated MAHOUT-157:
--

Assignee: Robin Anil

> Frequent Pattern Mining using Parallel FP-Growth
> 
>
> Key: MAHOUT-157
> URL: https://issues.apache.org/jira/browse/MAHOUT-157
> Project: Mahout
>  Issue Type: New Feature
>Affects Versions: 0.2
>Reporter: Robin Anil
>Assignee: Robin Anil
> Fix For: 0.2
>
> Attachments: MAHOUT-157-August-17.patch, MAHOUT-157-August-24.patch, 
> MAHOUT-157-August-31.patch, MAHOUT-157-August-6.patch, 
> MAHOUT-157-Combinations-BSD-License.patch, 
> MAHOUT-157-Combinations-BSD-License.patch, 
> MAHOUT-157-inProgress-August-5.patch, MAHOUT-157-September-5.patch
>
>
> Implement: http://infolab.stanford.edu/~echang/recsys08-69.pdf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.