Re: Joining data using Latitude, Longitude

2015-03-12 Thread Andrew Musselman
Ted Dunning and Ellen Friedman's Time Series Databases has a section on this with some approaches to geo-encoding: https://www.mapr.com/time-series-databases-new-ways-store-and-access-data http://info.mapr.com/rs/mapr/images/Time_Series_Databases.pdf On Tue, Mar 10, 2015 at 3:53 PM, John Meehan

Build error

2015-01-30 Thread Andrew Musselman
Off master, got this error; is that typical? --- T E S T S --- Running org.apache.spark.streaming.mqtt.JavaMQTTStreamSuite Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.495

Re: Row similarities

2015-01-17 Thread Andrew Musselman
AM, Andrew Musselman andrew.mussel...@gmail.com wrote: Excellent, thanks Pat. On Jan 17, 2015, at 9:27 AM, Pat Ferrel p...@occamsmachete.com wrote: Mahout’s Spark implementation of rowsimilarity is in the Scala SimilarityAnalysis class. It actually does either row or column similarity

Re: Row similarities

2015-01-17 Thread Andrew Musselman
) hasn’t been moved to spark yet. Yep, rows are not covered in the blog, my mistake. Too bad it has a lot of uses and can at very least be optimized for output matrix symmetry. On Jan 17, 2015, at 11:44 AM, Andrew Musselman andrew.mussel...@gmail.com wrote: Yeah okay, thanks. On Jan

Re: Maven out of memory error

2015-01-17 Thread Andrew Musselman
though. On Fri, Jan 16, 2015 at 8:26 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: Thanks Ted, got farther along but now have a failing test; is this a known issue? --- T E S T S

Re: Row similarities

2015-01-17 Thread Andrew Musselman
to rows that are similar to one another. On Fri, Jan 16, 2015 at 5:18 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: What's a good way to calculate similarities between all vector-rows in a matrix or RDD[Vector]? I'm seeing RowMatrix has a columnSimilarities method but I'm

Re: Row similarities

2015-01-17 Thread Andrew Musselman
the downsampling is done as LLR is calculated, so the entire similarity matrix is never actually calculated unless you disable downsampling. The primary use is for recommenders but I’ve used it (in the test suite) for row-wise text token similarity too. On Jan 17, 2015, at 9:00 AM, Andrew Musselman

Subscribe

2015-01-16 Thread Andrew Musselman

Maven out of memory error

2015-01-16 Thread Andrew Musselman
Just got the latest from Github and tried running `mvn test`; is this error common and do you have any advice on fixing it? Thanks! [INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile-first) @ spark-core_2.10 --- [WARNING] Zinc server is not available at port 3030 - reverting to normal

Re: Maven out of memory error

2015-01-16 Thread Andrew Musselman
failure: Maste... On Fri, Jan 16, 2015 at 12:06 PM, Ted Yu yuzhih...@gmail.com wrote: Can you try doing this before running mvn ? export MAVEN_OPTS=-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m What OS are you using ? Cheers On Fri, Jan 16, 2015 at 12:03 PM, Andrew Musselman

Re: Maven out of memory error

2015-01-16 Thread Andrew Musselman
building with newer Hadoop profiles and so old-Hadoop support code shows deprecation warnings on its use of old APIs. On Fri, Jan 16, 2015 at 8:03 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: Just got the latest from Github and tried running `mvn test`; is this error common and do

Row similarities

2015-01-16 Thread Andrew Musselman
What's a good way to calculate similarities between all vector-rows in a matrix or RDD[Vector]? I'm seeing RowMatrix has a columnSimilarities method but I'm not sure I'm going down a good path to transpose a matrix in order to run that.