Hi,
If your features are numeric, try feature scaling and feed it to Spark
Logistic Regression, It might increase rate%
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Mllib-Logistic-Regression-performance-relative-to-Mahout-tp26346p26358.html
Sent from the
Hi,
To connect to Spark from a remote location and submit jobs, you can try
Spark - Job Server.Its been open sourced now.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Integration-Patterns-tp26354p26357.html
Sent from the Apache Spark User List
Hi,
DStream->Discretized Streams are made up of multiple RDDs
You can unpersist each RDD by accessing the individual RDD's using
dstreamrdd.foreachRDD
{
rdd.unpersist().
}
--
View this message in context:
Hi,
1. The main difference between SparkR and R is that "SparkR" can handle
bigdata.
Yes, you can use other core libraries inside SparkR(not algos like
lm(),glm(),kmean())
2.Yes, core R libraries will not be distributed. You can use function from
these libraries which are applicabe for mapper
HI,
In the first rdd transformation (eg: reading from a file
sc.textfile("path",partition)), the partition you specify will be applied to
all further transformations and actions from this rdd.
In few places repartitioning your rdd will give a added advantage.
Repartition is usually done during
Hi vkutsenko,
Can you just give partitions to the input labeled rdd, like:
data = MLUtils.loadLibSVMFile(jsc.sc(),
"s3://somebucket/somekey/plaintext_libsvm_file").toJavaRDD().*repartition(5)*;
Here, i used 5, since you have have 5 cores.
Also for further benchmark and performance tuning:
Hi,
I guess, the double values are number of visits
rather than a visit flag (obviously it should be more useful than visit flag
i.e 1/0)
this is based on the assumption that while doing matrix factorisation,
rating trained using implicit cannot be binary, as it gives poor feature
values. In