from:"Mobius ReX"

Re: What's the best way to find the nearest neighbor in Spark? Any windowing function?

2016-09-13 Thread Mobius ReX

2000* *124* *B* * NY* *1* *5* *2001* *128* *B* > *4* *5* *0.041* *1* > * NY* *0* *6* *3* *24* *C* * NY* *1* *7* *30100* *27* *C* > *6* *7* *0.13* *1* > NY 0 6 3 24 C NY 1 9 33000 39 C 6 9 3.15 2 > *NY* *0* *8* *30200* *29* *C* * NY* *1* *7

Re: What's the best way to find the nearest neighbor in Spark? Any windowing function?

2016-09-13 Thread Mobius ReX

ue, Sep 13, 2016 at 8:45 PM, Mobius ReX <aoi...@gmail.com> wrote: > > Hi Sean, > > > > Great! > > > > Is there any sample code implementing Locality Sensitive Hashing with > Spark, > > in either scala or python? > > > > "However if your

Re: What's the best way to find the nearest neighbor in Spark? Any windowing function?

2016-09-13 Thread Mobius ReX

ive a lot of speed for a small cost in accuracy. > > However if your rule is really like "must match column A and B and > then closest value in column C then just ordering everything by A, B, > C lets you pretty much read off the answer from the result set > directly. Everything

What's the best way to find the nearest neighbor in Spark? Any windowing function?

2016-09-13 Thread Mobius ReX

Given a table > $cat data.csv > > ID,State,City,Price,Number,Flag > 1,CA,A,100,1000,0 > 2,CA,A,96,1010,1 > 3,CA,A,195,1010,1 > 4,NY,B,124,2000,0 > 5,NY,B,128,2001,1 > 6,NY,C,24,3,0 > 7,NY,C,27,30100,1 > 8,NY,C,29,30200,0 > 9,NY,C,39,33000,1

What's the best way to detect and remove outliers in a table?

2016-09-01 Thread Mobius ReX

Given a table with hundreds of columns mixed with both categorical and numerical attributes, and the distribution of values is unknown, what's the best way to detect outliers? For example, given a table Category Price A 1 A 1.3 A 100 C

Re: What's the best way to find the nearest neighbor in Spark? Any windowing function?

Re: What's the best way to find the nearest neighbor in Spark? Any windowing function?

Re: What's the best way to find the nearest neighbor in Spark? Any windowing function?

What's the best way to find the nearest neighbor in Spark? Any windowing function?

What's the best way to detect and remove outliers in a table?

5 matches

Site Navigation

Mail list logo

Footer information