hey,
what is best practice to aggregate the vectors from onehotencoders in pyspark?
udafs are still not available in python.
is there any way to do it with spark sql?
or do you have to switch to rdds and do it with a reduceByKey for example?
thanks,
sebastian
Hi there,
what is the best way to get from:
pyspark.mllib.feature.ChiSqSelector(numTopFeatures)
the vector indices of the selected vectors from the original input vector?
Shouldn't the model contain this information?
Thanks!
Hello,
I am planning to use from the pyspark.mllib.stat package the corr() function to
compute a correlation matrix.
Will this happen in a distributed fashion and does it scale up well, if you
have Vectors with a length of over a million columns?
Thanks,
Sebastian
Hey,
with collect() RDDs elements are send as a list back to the driver.
If have a 4 node cluster (based on Mesos) in a datacenter and I have my local
dev machine.
I work with a small 200MB dataset just for testing during development right now.
The collect() tasks are running for times
I could actually figure out, that it had to do with the Mesos Run Mode of Spark.
Setting spark.mesos.coarse to true made all the difference.
So the primary performance bummer was actually the fine-grained mode and
therefore Mesos overhead.
Thanks!
Sebastian
2015-11-03 20:07 GMT+01:00 Sebastian
Hey,
I have a Mesos cluster with a single Master. If I run the following directly on
the master machine:
pyspark --master mesos://host:5050
everything works just fine. If I try to connect from to the master starting a
driver from my laptop everything stops after the following log output
Hey,
I try to figure out the best practice on saving and loading models which have
bin fitted with the ML package - i.e. with the RandomForest classifier.
There is PMML support in the MLib package afaik but not in ML - is that correct?
How do you approach this, so that you do not have to fit
Hey,
the 1.5.0 release note say, that there are now model summaries for logistic
regressions available.
But I can't find them in the current documentary.
?
Any help very much appreciated!
Thanks
Sebastian