that outerJoinVertices caches the closure to be
recalculated if needed again while mapVertices actually caches the derived
values.
Is this a bug or a feature?
Kyle
On Sat, Feb 7, 2015 at 11:44 PM, Kyle Ellrott kellr...@soe.ucsc.edu wrote:
I'm trying to setup a simple iterative message/update
I'm trying to setup a simple iterative message/update problem in GraphX
(spark 1.2.0), but I'm running into issues with the caching and
re-calculation of data. I'm trying to follow the example found in the
Pregel implementation of materializing and cacheing messages and graphs and
then
I'd like to tag a question onto this; has anybody attempted to deploy spark
under Kubernetes
https://github.com/googlecloudplatform/kubernetes or Kubernetes mesos (
https://github.com/mesosphere/kubernetes-mesos ) .
On Wednesday, December 3, 2014, Matei Zaharia matei.zaha...@gmail.com
wrote:
I'd
sample at GroupedGradientDescent.scala:157
Kyle
On Tue, Jul 15, 2014 at 2:45 PM, Kyle Ellrott kellr...@soe.ucsc.edu wrote:
Yes, this is a proposed patch to MLLib so that you can use 1 RDD to train
multiple models at the same time. I am hoping that by multiplexing several
models in the same
definitely happens before then.
Kyle
On Tue, Jul 15, 2014 at 12:00 PM, Aaron Davidson ilike...@gmail.com wrote:
Ah, I didn't realize this was non-MLLib code. Do you mean to be sending
stochasticLossHistory
in the closure as well?
On Sun, Jul 13, 2014 at 1:05 AM, Kyle Ellrott kellr
I'm working of a patch to MLLib that allows for multiplexing several
different model optimization using the same RDD ( SPARK-2372:
https://issues.apache.org/jira/browse/SPARK-2372 )
In testing larger datasets, I've started to see some memory errors (
java.lang.OutOfMemoryError and exceeds max
= SVMWithSGD.train(rdd)
models(i) = model
Using BT broadcast factory would improve the performance of broadcasting.
Best,
Xiangrui
On Fri, Jun 27, 2014 at 3:06 PM, Kyle Ellrott kellr...@soe.ucsc.edu
wrote:
1) I'm using the static SVMWithSGD.train, with no options.
2) I have about 20,000 features
`setIntercept(true)`?
2) How many features?
I'm a little worried about driver's load because the final aggregation
and weights update happen on the driver. Did you check driver's memory
usage as well?
Best,
Xiangrui
On Fri, Jun 27, 2014 at 8:10 AM, Kyle Ellrott kellr...@soe.ucsc.edu
wrote
I'm working to set up a calculation that involves calling
mllib's SVMWithSGD.train several thousand times on different permutations
of the data. I'm trying to run the separate jobs using a threadpool to
dispatch the different requests to a spark context connected a Mesos's
cluster, using course
, Kyle Ellrott kellr...@soe.ucsc.edu
wrote:
I'm working on a problem learning several different sets of responses
against the same set of training features. Right now I've written the
program to cycle through all of the different label sets, attached them to
the training data and run
I looks like I was running into
https://issues.apache.org/jira/browse/SPARK-2204
The issues went away when I changed to spark.mesos.coarse.
Kyle
On Fri, Jun 20, 2014 at 10:36 AM, Kyle Ellrott kellr...@soe.ucsc.edu
wrote:
I've tried to parallelize the separate regressions using
I'm working on a problem learning several different sets of responses
against the same set of training features. Right now I've written the
program to cycle through all of the different label sets, attached them to
the training data and run LogisticRegressionWithSGD on each of them. ie
foreach
What is the most efficient way to an RDD of GraphX vertices and their
connected edges? Initially I though I could use mapReduceTriplet, but I
realized that would neglect vertices that aren't connected to anything
Would I have to do a mapReduceTriplet and then do a join with all of the
vertices to
13 matches
Mail list logo