Re: Washing machines - Mahout algorithm advice

2012-03-03 Thread Sean Owen
I answered on SO:

The only thing I can think of that sounds like this problem is
PageRank. It's computed by a sort of iterative simluation. Each page
has some influence (color) which flows via its links (socks its washed
with) and at some point the page influence reaches a steady state
(final color). You can look up PageRank algorithms but it is
essentially a matter of calculating eigenvectors of a big, erm, sock
color matrix.

On Sat, Mar 3, 2012 at 5:25 PM, Zoltán Tóth-Czifra t...@tcz.hu wrote:
 Hi,

 I posted this on stack overflow, but I've been informed that the mailing
 list is the appropriate forum to ask questions like this.

 What I need is actually just a hint where I can start, as I'm not sure
 which direction to go. Although this is not a classic machine learning
 problem, AFAIK Mahout is not only for those. I was thinking it's an
 appropriate tool, because I work with a large set of data and the output
 has similar characteristics than a classic Mahout job. I think a problem
 below can be implemented with MapReduce, but I don't know if there is any
 existing, similar algorithm. I need this to be scalable, that's why I think
 the Mahout on top of Hadoop's MapReduce power could be an appropriate
 solution.

 Quickly what I want to do is:

 The final goal is to define one scalar (a score) of each one of a set of
 entities based on some known entities. The entities interact with each
 other, known scores influence and define the unknown ones. You can imagine
 with the following example.

 I have a lot if white clothes and a few pieces of colorful ones; red, blue,
 green... I put them into the washing machine. I want to know what colors
 the white ones will get after the wash.

 Things to take into account:

   - we make a series of washing with different actors... some clothes
   are washed in the 1st and 3rd washing, some of them only in the 2nd, some
   of them are washed in all
   - in consecutive washes the clothes that were white before but now
   colored also influence the rest, but not as strong (as they are not as
   colored)
   - some colors don't color as much as others. for example red has a
   strong effect on most of the clothes, but green not so much
   - coloring effect also depends on how many clothes are in one washing.
   If you wash a red shirt with a white t-shirt, it gets much more colored,
   than if there were 100 other white t-shirt
   - clothes don't lose their color when influencing others

 You can see that while calculating, entities actually have 2 assigned
 scalars:

   - the color hue (this also defines coloring power as mentioned above).
   The hue can be represented as a number, from 0 to 1, let's say. The
   coherence between the coloring power and the color number is not linear. It
   is more like the ends of the scale have more coloring power (0 and 1) while
   the middle (0.5) has less
   - the color lightness (how much an entity is colored, for originally
   colored clothes it's 1, for white ones it's 0), which in the same time also
   defines coloring power regardless of the hue

 So, again, what I know:

   - which clothes where washed in which consecutive washing
   - I know the original color of some of them, the rest is white in the
   beginning

 What I want to know:


   - the hue of all clothes in the end of the washing

 The problem is that I don't know what (type) of algorithm should I start
 with. If you were so kind to read so far, please suggest me something (or
 further reading).

 Obviously I don't ask for any detailed thing, again, only hints on the
 algorithm.

 Thank you!


Re: Washing machines - Mahout algorithm advice

2012-03-03 Thread Jack Tanner
It may further help to note that PageRank is a special case of a Markov 
chain, and this washing system may well be described by a Markov chain 
model.


On 3/3/2012 12:35 PM, Sean Owen wrote:

I answered on SO:

The only thing I can think of that sounds like this problem is
PageRank. It's computed by a sort of iterative simluation. Each page
has some influence (color) which flows via its links (socks its washed
with) and at some point the page influence reaches a steady state
(final color). You can look up PageRank algorithms but it is
essentially a matter of calculating eigenvectors of a big, erm, sock
color matrix.

On Sat, Mar 3, 2012 at 5:25 PM, Zoltán Tóth-Czifrat...@tcz.hu  wrote:

Hi,

I posted this on stack overflow, but I've been informed that the mailing
list is the appropriate forum to ask questions like this.

What I need is actually just a hint where I can start, as I'm not sure
which direction to go. Although this is not a classic machine learning
problem, AFAIK Mahout is not only for those. I was thinking it's an
appropriate tool, because I work with a large set of data and the output
has similar characteristics than a classic Mahout job. I think a problem
below can be implemented with MapReduce, but I don't know if there is any
existing, similar algorithm. I need this to be scalable, that's why I think
the Mahout on top of Hadoop's MapReduce power could be an appropriate
solution.

Quickly what I want to do is:

The final goal is to define one scalar (a score) of each one of a set of
entities based on some known entities. The entities interact with each
other, known scores influence and define the unknown ones. You can imagine
with the following example.

I have a lot if white clothes and a few pieces of colorful ones; red, blue,
green... I put them into the washing machine. I want to know what colors
the white ones will get after the wash.

Things to take into account:

   - we make a series of washing with different actors... some clothes
   are washed in the 1st and 3rd washing, some of them only in the 2nd, some
   of them are washed in all
   - in consecutive washes the clothes that were white before but now
   colored also influence the rest, but not as strong (as they are not as
   colored)
   - some colors don't color as much as others. for example red has a
   strong effect on most of the clothes, but green not so much
   - coloring effect also depends on how many clothes are in one washing.
   If you wash a red shirt with a white t-shirt, it gets much more colored,
   than if there were 100 other white t-shirt
   - clothes don't lose their color when influencing others

You can see that while calculating, entities actually have 2 assigned
scalars:

   - the color hue (this also defines coloring power as mentioned above).
   The hue can be represented as a number, from 0 to 1, let's say. The
   coherence between the coloring power and the color number is not linear. It
   is more like the ends of the scale have more coloring power (0 and 1) while
   the middle (0.5) has less
   - the color lightness (how much an entity is colored, for originally
   colored clothes it's 1, for white ones it's 0), which in the same time also
   defines coloring power regardless of the hue

So, again, what I know:

   - which clothes where washed in which consecutive washing
   - I know the original color of some of them, the rest is white in the
   beginning

What I want to know:


   - the hue of all clothes in the end of the washing

The problem is that I don't know what (type) of algorithm should I start
with. If you were so kind to read so far, please suggest me something (or
further reading).

Obviously I don't ask for any detailed thing, again, only hints on the
algorithm.

Thank you!







Re: Washing machines - Mahout algorithm advice

2012-03-03 Thread Ted Dunning
And further linear Markov chains can be expressed as matrix products which
can be computed efficiently using SVD's.

Zoltan, is this literally the problem that you are working on?  Or is this
a shadow of the problem that you are interested in?

On Sat, Mar 3, 2012 at 9:55 AM, Jack Tanner i...@hotmail.com wrote:

 It may further help to note that PageRank is a special case of a Markov
 chain, and this washing system may well be described by a Markov chain
 model.


 On 3/3/2012 12:35 PM, Sean Owen wrote:

 I answered on SO:

 The only thing I can think of that sounds like this problem is
 PageRank. It's computed by a sort of iterative simluation. Each page
 has some influence (color) which flows via its links (socks its washed
 with) and at some point the page influence reaches a steady state
 (final color). You can look up PageRank algorithms but it is
 essentially a matter of calculating eigenvectors of a big, erm, sock
 color matrix.

 On Sat, Mar 3, 2012 at 5:25 PM, Zoltán Tóth-Czifrat...@tcz.hu  wrote:

 Hi,

 I posted this on stack overflow, but I've been informed that the mailing
 list is the appropriate forum to ask questions like this.

 What I need is actually just a hint where I can start, as I'm not sure
 which direction to go. Although this is not a classic machine learning
 problem, AFAIK Mahout is not only for those. I was thinking it's an
 appropriate tool, because I work with a large set of data and the output
 has similar characteristics than a classic Mahout job. I think a problem
 below can be implemented with MapReduce, but I don't know if there is any
 existing, similar algorithm. I need this to be scalable, that's why I
 think
 the Mahout on top of Hadoop's MapReduce power could be an appropriate
 solution.

 Quickly what I want to do is:

 The final goal is to define one scalar (a score) of each one of a set
 of
 entities based on some known entities. The entities interact with each
 other, known scores influence and define the unknown ones. You can
 imagine
 with the following example.

 I have a lot if white clothes and a few pieces of colorful ones; red,
 blue,
 green... I put them into the washing machine. I want to know what colors
 the white ones will get after the wash.

 Things to take into account:

   - we make a series of washing with different actors... some clothes
   are washed in the 1st and 3rd washing, some of them only in the 2nd,
 some
   of them are washed in all
   - in consecutive washes the clothes that were white before but now
   colored also influence the rest, but not as strong (as they are not as
   colored)
   - some colors don't color as much as others. for example red has a
   strong effect on most of the clothes, but green not so much
   - coloring effect also depends on how many clothes are in one washing.
   If you wash a red shirt with a white t-shirt, it gets much more
 colored,
   than if there were 100 other white t-shirt
   - clothes don't lose their color when influencing others

 You can see that while calculating, entities actually have 2 assigned
 scalars:

   - the color hue (this also defines coloring power as mentioned
 above).
   The hue can be represented as a number, from 0 to 1, let's say. The
   coherence between the coloring power and the color number is not
 linear. It
   is more like the ends of the scale have more coloring power (0 and 1)
 while
   the middle (0.5) has less
   - the color lightness (how much an entity is colored, for originally
   colored clothes it's 1, for white ones it's 0), which in the same time
 also
   defines coloring power regardless of the hue

 So, again, what I know:

   - which clothes where washed in which consecutive washing
   - I know the original color of some of them, the rest is white in the
   beginning

 What I want to know:


   - the hue of all clothes in the end of the washing

 The problem is that I don't know what (type) of algorithm should I start
 with. If you were so kind to read so far, please suggest me something (or
 further reading).

 Obviously I don't ask for any detailed thing, again, only hints on the
 algorithm.

 Thank you!







Re: Washing machines - Mahout algorithm advice

2012-03-03 Thread Zoltán Tóth-Czifra
Sean, Jack, the PR is an excellent idea, I was not even thinking about it,
but a tweaked version of the PageRank algorithm can absolutely work for
this, and is relatively easy to implement.

Ted, straight question, thank you. You are right, I'm not working for a
washing powder company nor a laundry. I was thinking that it's easier to
understand the idea with this example.

The real problem is the following:

Given a large set of professional profiles with their job history, I need
to find a general, automated way that can help selection process of job
applicants. People are clothes, companies are washing machines.

The idea is that people working in the same company and same field interact
with each other, and effect each others' professional level. Also,
companies for one field have standard hiring requirements. Having said
that, if we can set levels of some known professionals initially (or maybe
companies as well), we can estimate the professional level of the
individuals in their network.

For example if I worked together with Frank Gehry, I'm most likely a good
architect.

In my example hue is the level of a professional (red is excellent, blue is
poor, if you like it), while the coloring power means how certain we are
about the level. For the initial cornerstone professionals we are very
sure, so they are deep red/blue. Things like the size of the company and
the duration while they worked together influence the effect of the people
on each other. That's roughly it.

Of course it is very approximate and has a lot of guessing and rounding. I
don't believe it can substitute classic selection but it might be a good
aid. I need to test this.

On Sat, Mar 3, 2012 at 8:09 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 And further linear Markov chains can be expressed as matrix products which
 can be computed efficiently using SVD's.

 Zoltan, is this literally the problem that you are working on?  Or is this
 a shadow of the problem that you are interested in?

 On Sat, Mar 3, 2012 at 9:55 AM, Jack Tanner i...@hotmail.com wrote:

  It may further help to note that PageRank is a special case of a Markov
  chain, and this washing system may well be described by a Markov chain
  model.
 
 
  On 3/3/2012 12:35 PM, Sean Owen wrote:
 
  I answered on SO:
 
  The only thing I can think of that sounds like this problem is
  PageRank. It's computed by a sort of iterative simluation. Each page
  has some influence (color) which flows via its links (socks its washed
  with) and at some point the page influence reaches a steady state
  (final color). You can look up PageRank algorithms but it is
  essentially a matter of calculating eigenvectors of a big, erm, sock
  color matrix.
 
  On Sat, Mar 3, 2012 at 5:25 PM, Zoltán Tóth-Czifrat...@tcz.hu  wrote:
 
  Hi,
 
  I posted this on stack overflow, but I've been informed that the
 mailing
  list is the appropriate forum to ask questions like this.
 
  What I need is actually just a hint where I can start, as I'm not sure
  which direction to go. Although this is not a classic machine learning
  problem, AFAIK Mahout is not only for those. I was thinking it's an
  appropriate tool, because I work with a large set of data and the
 output
  has similar characteristics than a classic Mahout job. I think a
 problem
  below can be implemented with MapReduce, but I don't know if there is
 any
  existing, similar algorithm. I need this to be scalable, that's why I
  think
  the Mahout on top of Hadoop's MapReduce power could be an appropriate
  solution.
 
  Quickly what I want to do is:
 
  The final goal is to define one scalar (a score) of each one of a set
  of
  entities based on some known entities. The entities interact with
 each
  other, known scores influence and define the unknown ones. You can
  imagine
  with the following example.
 
  I have a lot if white clothes and a few pieces of colorful ones; red,
  blue,
  green... I put them into the washing machine. I want to know what
 colors
  the white ones will get after the wash.
 
  Things to take into account:
 
- we make a series of washing with different actors... some clothes
are washed in the 1st and 3rd washing, some of them only in the 2nd,
  some
of them are washed in all
- in consecutive washes the clothes that were white before but now
colored also influence the rest, but not as strong (as they are not
 as
colored)
- some colors don't color as much as others. for example red has a
strong effect on most of the clothes, but green not so much
- coloring effect also depends on how many clothes are in one
 washing.
If you wash a red shirt with a white t-shirt, it gets much more
  colored,
than if there were 100 other white t-shirt
- clothes don't lose their color when influencing others
 
  You can see that while calculating, entities actually have 2 assigned
  scalars:
 
- the color hue (this also defines coloring power as mentioned
  above).
The hue