Re: A Spark Design Problem

2014-11-01 Thread Steve Lewis
join seems to me the proper approach followed by keying the fits by KeyID and using combineByKey to choose the best - I am implementing that now and will report on performance On Fri, Oct 31, 2014 at 11:56 AM, Sonal Goyal sonalgoy...@gmail.com wrote: Does the following help?

A Spark Design Problem

2014-10-31 Thread Steve Lewis
The original problem is in biology but the following captures the CS issues, Assume I have a large number of locks and a large number of keys. There is a scoring function between keys and locks and a key that fits a lock will have a high score. There may be many keys fitting one lock and a key

Re: A Spark Design Problem

2014-10-31 Thread francois . garillot
Hi Steve, Are you talking about sequence alignment ? — FG On Fri, Oct 31, 2014 at 5:44 PM, Steve Lewis lordjoe2...@gmail.com wrote: The original problem is in biology but the following captures the CS issues, Assume I have a large number of locks and a large number of keys. There is a

Re: A Spark Design Problem

2014-10-31 Thread Sonal Goyal
Does the following help? JavaPairRDDbin,key join with JavaPairRDDbin,lock If you partition both RDDs by the bin id, I think you should be able to get what you want. Best Regards, Sonal Nube Technologies http://www.nubetech.co http://in.linkedin.com/in/sonalgoyal On Fri, Oct 31, 2014 at