from:"Tom Vacek"

Re: How to send user variables from Spark client to custom InputFormat or RecordReader ?

2015-02-22 Thread Tom Vacek

The SparkConf doesn't allow you to set arbitrary variables. You can use SparkContext's HadoopRDD and create a JobConf (with whatever variables you want), and then grab them out of the JobConf in your RecordReader. On Sun, Feb 22, 2015 at 4:28 PM, hnahak harihar1...@gmail.com wrote: Hi, I

Re: Is There Any Benchmarks Comparing C++ MPI with Spark

2014-06-16 Thread Tom Vacek

Spark gives you four of the classical collectives: broadcast, reduce, scatter, and gather. There are also a few additional primitives, mostly based on a join. Spark is certainly less optimized than MPI for these, but maybe that isn't such a big deal. Spark has one theoretical disadvantage

Re: Spark LIBLINEAR

2014-05-16 Thread Tom Vacek

I've done some comparisons with my own implementation of TRON on Spark. From a distributed computing perspective, it does 2x more local work per iteration than LBFGS, so the parallel isoefficiency is improved slightly. I think the truncated Newton solver holds some potential because there have

Re: is it okay to reuse objects across RDD's?

2014-04-28 Thread Tom Vacek

As to your last line: I've used RDD zipping to avoid GC since MyBaseData is large and doesn't change. I think this is a very good solution to what is being asked for. On Mon, Apr 28, 2014 at 10:44 AM, Ian O'Connell i...@ianoconnell.com wrote: A mutable map in an object should do what your

Re: is it okay to reuse objects across RDD's?

2014-04-28 Thread Tom Vacek

I'm not sure what I said came through. RDD zip is not hacky at all, as it only depends on a user not changing the partitioning. Basically, you would keep your losses as an RDD[Double] and zip whose with the RDD of examples, and update the losses. You're doing a copy (and GC) on the RDD of

Re: is it okay to reuse objects across RDD's?

2014-04-28 Thread Tom Vacek

my iPhone On Apr 28, 2014, at 9:45 AM, Tom Vacek minnesota...@gmail.com wrote: I'm not sure what I said came through. RDD zip is not hacky at all, as it only depends on a user not changing the partitioning. Basically, you would keep your losses as an RDD[Double] and zip whose with the RDD

Re: is it okay to reuse objects across RDD's?

2014-04-28 Thread Tom Vacek

Ian, I tried playing with your suggestion, but I get a task not serializable error (and some obvious things didn't fix it). Can you get that working? On Mon, Apr 28, 2014 at 10:58 AM, Tom Vacek minnesota...@gmail.com wrote: As to your last line: I've used RDD zipping to avoid GC since

Re: is it okay to reuse objects across RDD's?

2014-04-28 Thread Tom Vacek

on loss RDD (copy) ? Chester Sent from my iPhone On Apr 28, 2014, at 9:45 AM, Tom Vacek minnesota...@gmail.com wrote: I'm not sure what I said came through. RDD zip is not hacky at all, as it only depends on a user not changing the partitioning. Basically, you would keep your losses as an RDD

Re: GraphX: Help understanding the limitations of Pregel

2014-04-23 Thread Tom Vacek

Here are some out-of-the-box ideas: If the elements lie in a fairly small range and/or you're willing to work with limited precision, you could use counting sort. Moreover, you could iteratively find the median using bisection, which would be associative and commutative. It's easy to think of

internship opportunity

2014-04-22 Thread Tom Vacek

Thomson Reuters is looking for a graduate (or possibly advanced undergraduate) summer intern in Eagan, MN. This is a chance to work on an innovative project exploring how big data sets can be used by professionals such as lawyers, scientists and journalists. If you're subscribed to this mailing

Re: How to send user variables from Spark client to custom InputFormat or RecordReader ?

Re: Is There Any Benchmarks Comparing C++ MPI with Spark

Re: Spark LIBLINEAR

Re: is it okay to reuse objects across RDD's?

Re: is it okay to reuse objects across RDD's?

Re: is it okay to reuse objects across RDD's?

Re: is it okay to reuse objects across RDD's?

Re: is it okay to reuse objects across RDD's?

Re: GraphX: Help understanding the limitations of Pregel

internship opportunity

10 matches

Site Navigation

Mail list logo

Footer information