Re: Washing machines - Mahout algorithm advice
I answered on SO: The only thing I can think of that sounds like this problem is PageRank. It's computed by a sort of iterative simluation. Each page has some influence (color) which flows via its links (socks its washed with) and at some point the page influence reaches a steady state (final color). You can look up PageRank algorithms but it is essentially a matter of calculating eigenvectors of a big, erm, sock color matrix. On Sat, Mar 3, 2012 at 5:25 PM, Zoltán Tóth-Czifra t...@tcz.hu wrote: Hi, I posted this on stack overflow, but I've been informed that the mailing list is the appropriate forum to ask questions like this. What I need is actually just a hint where I can start, as I'm not sure which direction to go. Although this is not a classic machine learning problem, AFAIK Mahout is not only for those. I was thinking it's an appropriate tool, because I work with a large set of data and the output has similar characteristics than a classic Mahout job. I think a problem below can be implemented with MapReduce, but I don't know if there is any existing, similar algorithm. I need this to be scalable, that's why I think the Mahout on top of Hadoop's MapReduce power could be an appropriate solution. Quickly what I want to do is: The final goal is to define one scalar (a score) of each one of a set of entities based on some known entities. The entities interact with each other, known scores influence and define the unknown ones. You can imagine with the following example. I have a lot if white clothes and a few pieces of colorful ones; red, blue, green... I put them into the washing machine. I want to know what colors the white ones will get after the wash. Things to take into account: - we make a series of washing with different actors... some clothes are washed in the 1st and 3rd washing, some of them only in the 2nd, some of them are washed in all - in consecutive washes the clothes that were white before but now colored also influence the rest, but not as strong (as they are not as colored) - some colors don't color as much as others. for example red has a strong effect on most of the clothes, but green not so much - coloring effect also depends on how many clothes are in one washing. If you wash a red shirt with a white t-shirt, it gets much more colored, than if there were 100 other white t-shirt - clothes don't lose their color when influencing others You can see that while calculating, entities actually have 2 assigned scalars: - the color hue (this also defines coloring power as mentioned above). The hue can be represented as a number, from 0 to 1, let's say. The coherence between the coloring power and the color number is not linear. It is more like the ends of the scale have more coloring power (0 and 1) while the middle (0.5) has less - the color lightness (how much an entity is colored, for originally colored clothes it's 1, for white ones it's 0), which in the same time also defines coloring power regardless of the hue So, again, what I know: - which clothes where washed in which consecutive washing - I know the original color of some of them, the rest is white in the beginning What I want to know: - the hue of all clothes in the end of the washing The problem is that I don't know what (type) of algorithm should I start with. If you were so kind to read so far, please suggest me something (or further reading). Obviously I don't ask for any detailed thing, again, only hints on the algorithm. Thank you!
Re: Washing machines - Mahout algorithm advice
It may further help to note that PageRank is a special case of a Markov chain, and this washing system may well be described by a Markov chain model. On 3/3/2012 12:35 PM, Sean Owen wrote: I answered on SO: The only thing I can think of that sounds like this problem is PageRank. It's computed by a sort of iterative simluation. Each page has some influence (color) which flows via its links (socks its washed with) and at some point the page influence reaches a steady state (final color). You can look up PageRank algorithms but it is essentially a matter of calculating eigenvectors of a big, erm, sock color matrix. On Sat, Mar 3, 2012 at 5:25 PM, Zoltán Tóth-Czifrat...@tcz.hu wrote: Hi, I posted this on stack overflow, but I've been informed that the mailing list is the appropriate forum to ask questions like this. What I need is actually just a hint where I can start, as I'm not sure which direction to go. Although this is not a classic machine learning problem, AFAIK Mahout is not only for those. I was thinking it's an appropriate tool, because I work with a large set of data and the output has similar characteristics than a classic Mahout job. I think a problem below can be implemented with MapReduce, but I don't know if there is any existing, similar algorithm. I need this to be scalable, that's why I think the Mahout on top of Hadoop's MapReduce power could be an appropriate solution. Quickly what I want to do is: The final goal is to define one scalar (a score) of each one of a set of entities based on some known entities. The entities interact with each other, known scores influence and define the unknown ones. You can imagine with the following example. I have a lot if white clothes and a few pieces of colorful ones; red, blue, green... I put them into the washing machine. I want to know what colors the white ones will get after the wash. Things to take into account: - we make a series of washing with different actors... some clothes are washed in the 1st and 3rd washing, some of them only in the 2nd, some of them are washed in all - in consecutive washes the clothes that were white before but now colored also influence the rest, but not as strong (as they are not as colored) - some colors don't color as much as others. for example red has a strong effect on most of the clothes, but green not so much - coloring effect also depends on how many clothes are in one washing. If you wash a red shirt with a white t-shirt, it gets much more colored, than if there were 100 other white t-shirt - clothes don't lose their color when influencing others You can see that while calculating, entities actually have 2 assigned scalars: - the color hue (this also defines coloring power as mentioned above). The hue can be represented as a number, from 0 to 1, let's say. The coherence between the coloring power and the color number is not linear. It is more like the ends of the scale have more coloring power (0 and 1) while the middle (0.5) has less - the color lightness (how much an entity is colored, for originally colored clothes it's 1, for white ones it's 0), which in the same time also defines coloring power regardless of the hue So, again, what I know: - which clothes where washed in which consecutive washing - I know the original color of some of them, the rest is white in the beginning What I want to know: - the hue of all clothes in the end of the washing The problem is that I don't know what (type) of algorithm should I start with. If you were so kind to read so far, please suggest me something (or further reading). Obviously I don't ask for any detailed thing, again, only hints on the algorithm. Thank you!
Re: Washing machines - Mahout algorithm advice
And further linear Markov chains can be expressed as matrix products which can be computed efficiently using SVD's. Zoltan, is this literally the problem that you are working on? Or is this a shadow of the problem that you are interested in? On Sat, Mar 3, 2012 at 9:55 AM, Jack Tanner i...@hotmail.com wrote: It may further help to note that PageRank is a special case of a Markov chain, and this washing system may well be described by a Markov chain model. On 3/3/2012 12:35 PM, Sean Owen wrote: I answered on SO: The only thing I can think of that sounds like this problem is PageRank. It's computed by a sort of iterative simluation. Each page has some influence (color) which flows via its links (socks its washed with) and at some point the page influence reaches a steady state (final color). You can look up PageRank algorithms but it is essentially a matter of calculating eigenvectors of a big, erm, sock color matrix. On Sat, Mar 3, 2012 at 5:25 PM, Zoltán Tóth-Czifrat...@tcz.hu wrote: Hi, I posted this on stack overflow, but I've been informed that the mailing list is the appropriate forum to ask questions like this. What I need is actually just a hint where I can start, as I'm not sure which direction to go. Although this is not a classic machine learning problem, AFAIK Mahout is not only for those. I was thinking it's an appropriate tool, because I work with a large set of data and the output has similar characteristics than a classic Mahout job. I think a problem below can be implemented with MapReduce, but I don't know if there is any existing, similar algorithm. I need this to be scalable, that's why I think the Mahout on top of Hadoop's MapReduce power could be an appropriate solution. Quickly what I want to do is: The final goal is to define one scalar (a score) of each one of a set of entities based on some known entities. The entities interact with each other, known scores influence and define the unknown ones. You can imagine with the following example. I have a lot if white clothes and a few pieces of colorful ones; red, blue, green... I put them into the washing machine. I want to know what colors the white ones will get after the wash. Things to take into account: - we make a series of washing with different actors... some clothes are washed in the 1st and 3rd washing, some of them only in the 2nd, some of them are washed in all - in consecutive washes the clothes that were white before but now colored also influence the rest, but not as strong (as they are not as colored) - some colors don't color as much as others. for example red has a strong effect on most of the clothes, but green not so much - coloring effect also depends on how many clothes are in one washing. If you wash a red shirt with a white t-shirt, it gets much more colored, than if there were 100 other white t-shirt - clothes don't lose their color when influencing others You can see that while calculating, entities actually have 2 assigned scalars: - the color hue (this also defines coloring power as mentioned above). The hue can be represented as a number, from 0 to 1, let's say. The coherence between the coloring power and the color number is not linear. It is more like the ends of the scale have more coloring power (0 and 1) while the middle (0.5) has less - the color lightness (how much an entity is colored, for originally colored clothes it's 1, for white ones it's 0), which in the same time also defines coloring power regardless of the hue So, again, what I know: - which clothes where washed in which consecutive washing - I know the original color of some of them, the rest is white in the beginning What I want to know: - the hue of all clothes in the end of the washing The problem is that I don't know what (type) of algorithm should I start with. If you were so kind to read so far, please suggest me something (or further reading). Obviously I don't ask for any detailed thing, again, only hints on the algorithm. Thank you!
Re: Washing machines - Mahout algorithm advice
Sean, Jack, the PR is an excellent idea, I was not even thinking about it, but a tweaked version of the PageRank algorithm can absolutely work for this, and is relatively easy to implement. Ted, straight question, thank you. You are right, I'm not working for a washing powder company nor a laundry. I was thinking that it's easier to understand the idea with this example. The real problem is the following: Given a large set of professional profiles with their job history, I need to find a general, automated way that can help selection process of job applicants. People are clothes, companies are washing machines. The idea is that people working in the same company and same field interact with each other, and effect each others' professional level. Also, companies for one field have standard hiring requirements. Having said that, if we can set levels of some known professionals initially (or maybe companies as well), we can estimate the professional level of the individuals in their network. For example if I worked together with Frank Gehry, I'm most likely a good architect. In my example hue is the level of a professional (red is excellent, blue is poor, if you like it), while the coloring power means how certain we are about the level. For the initial cornerstone professionals we are very sure, so they are deep red/blue. Things like the size of the company and the duration while they worked together influence the effect of the people on each other. That's roughly it. Of course it is very approximate and has a lot of guessing and rounding. I don't believe it can substitute classic selection but it might be a good aid. I need to test this. On Sat, Mar 3, 2012 at 8:09 PM, Ted Dunning ted.dunn...@gmail.com wrote: And further linear Markov chains can be expressed as matrix products which can be computed efficiently using SVD's. Zoltan, is this literally the problem that you are working on? Or is this a shadow of the problem that you are interested in? On Sat, Mar 3, 2012 at 9:55 AM, Jack Tanner i...@hotmail.com wrote: It may further help to note that PageRank is a special case of a Markov chain, and this washing system may well be described by a Markov chain model. On 3/3/2012 12:35 PM, Sean Owen wrote: I answered on SO: The only thing I can think of that sounds like this problem is PageRank. It's computed by a sort of iterative simluation. Each page has some influence (color) which flows via its links (socks its washed with) and at some point the page influence reaches a steady state (final color). You can look up PageRank algorithms but it is essentially a matter of calculating eigenvectors of a big, erm, sock color matrix. On Sat, Mar 3, 2012 at 5:25 PM, Zoltán Tóth-Czifrat...@tcz.hu wrote: Hi, I posted this on stack overflow, but I've been informed that the mailing list is the appropriate forum to ask questions like this. What I need is actually just a hint where I can start, as I'm not sure which direction to go. Although this is not a classic machine learning problem, AFAIK Mahout is not only for those. I was thinking it's an appropriate tool, because I work with a large set of data and the output has similar characteristics than a classic Mahout job. I think a problem below can be implemented with MapReduce, but I don't know if there is any existing, similar algorithm. I need this to be scalable, that's why I think the Mahout on top of Hadoop's MapReduce power could be an appropriate solution. Quickly what I want to do is: The final goal is to define one scalar (a score) of each one of a set of entities based on some known entities. The entities interact with each other, known scores influence and define the unknown ones. You can imagine with the following example. I have a lot if white clothes and a few pieces of colorful ones; red, blue, green... I put them into the washing machine. I want to know what colors the white ones will get after the wash. Things to take into account: - we make a series of washing with different actors... some clothes are washed in the 1st and 3rd washing, some of them only in the 2nd, some of them are washed in all - in consecutive washes the clothes that were white before but now colored also influence the rest, but not as strong (as they are not as colored) - some colors don't color as much as others. for example red has a strong effect on most of the clothes, but green not so much - coloring effect also depends on how many clothes are in one washing. If you wash a red shirt with a white t-shirt, it gets much more colored, than if there were 100 other white t-shirt - clothes don't lose their color when influencing others You can see that while calculating, entities actually have 2 assigned scalars: - the color hue (this also defines coloring power as mentioned above). The hue