Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4654#issuecomment-76953094
@jkbradley I wrote a script here
(https://gist.github.com/MechCoder/5939294f74f105e5c499) to compare the timings
in this branch and master, It seems to me
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4654#issuecomment-76701605
Hmm. I came up with this, but surely there should be a more elegant way of
doing it.
import scala.util.Random
rng = Random
rng.setSeed(0
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4654#issuecomment-76669702
Hi, Sorry for taking so much time to get back to this. I want to generate
some random data to write this script using breeze. But I unable to understand
how the random
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4834#issuecomment-76622887
Great. Do you have any more comments?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4819#issuecomment-76527936
Also, the present code is unoptimized since there are two runs across the
data RDD. one to update the residual, and the other to calculate the error. But
that can
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4819#issuecomment-76527811
@jkbradley I am assuming that this is what you intended. It works but I'm
not sure about the present design, which differs from the design that you had
posted
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4834#issuecomment-76541233
Hmm. I get an a accuracy of zero for the given example. Not sure where I'm
going wrong though :(
---
If your project is set up for it, you can reply to this email
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4834#issuecomment-76541036
cc: @mengxr Would you be able to verify this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
GitHub user MechCoder opened a pull request:
https://github.com/apache/spark/pull/4834
[SPARK-6083] [MLLib] [DOC] Make Python API example consistent in NaiveBayes
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MechCoder/spark
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4834#issuecomment-76578092
I changed the randomSplit seed and it works better. It should look good now.
---
If your project is set up for it, you can reply to this email and have your
reply
GitHub user MechCoder opened a pull request:
https://github.com/apache/spark/pull/4819
[SPARK-6025] Add helper method to efficiently compute error in GBT's
While computing the error, with and without validation, for every
iteration, the feature prediction of the previous trees
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4819#issuecomment-76472686
@jkbradley Is this similar to what you had in mind?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4654#issuecomment-75745207
Great, I'll do it tomorrow after I'm done with my exams.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4677#issuecomment-75745123
@jkbradley I have fixed up your comments ! Hopefully good to go.
[off-topic]
It would be really great and helpful if Spark would be interested in taking
Github user MechCoder commented on a diff in the pull request:
https://github.com/apache/spark/pull/4677#discussion_r25150164
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/tree/GradientBoostedTreesSuite.scala
---
@@ -158,6 +158,63 @@ class GradientBoostedTreesSuite
Github user MechCoder commented on a diff in the pull request:
https://github.com/apache/spark/pull/4677#discussion_r25143389
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/tree/GradientBoostedTreesSuite.scala
---
@@ -158,6 +158,63 @@ class GradientBoostedTreesSuite
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4677#issuecomment-75487872
@jkbradley Addressed all your comments except the inline one.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user MechCoder commented on a diff in the pull request:
https://github.com/apache/spark/pull/4677#discussion_r25142849
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala ---
@@ -76,8 +77,44 @@ class GradientBoostedTrees(private val
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4677#issuecomment-75223810
@mengxr Fixed !
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user MechCoder commented on a diff in the pull request:
https://github.com/apache/spark/pull/4677#discussion_r25062848
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala ---
@@ -76,8 +77,42 @@ class GradientBoostedTrees(private val
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4654#issuecomment-75225799
Just to clarify, by cluster mode do you mean running `./bin/spark-shell
--master spark://manoj-X550LD:7077` where the url is generated by doing
`./sbin/start-master.sh
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4654#issuecomment-75285426
@tgaloppo Sorry for my noobness, all my work on MLlib has been on a single
machine. I am not really sure how to run it on a cluster (and hence was
verifying if my
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4677#issuecomment-75026265
@jkbradley I have fixed up your comments.
Btw, why are there are both a train and a run, which seems to me do the
same thing. Is it not better to have one way
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4677#issuecomment-75195111
@jkbradley I've fixed up your comments.
a] Negative tol is allowed.
b] It makes sense to return based on the best validationError rather than
the previous
GitHub user MechCoder opened a pull request:
https://github.com/apache/spark/pull/4672
[Minor] Minor doc fix in GBT classification example
numClassesForClassification has been renamed to numClasses.
You can merge this pull request into a Git repository by running:
$ git pull
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4672#issuecomment-74834330
ping @jkbradley ? I was not sure if I had to open a JIRA for this, as it is
minor.
---
If your project is set up for it, you can reply to this email and have your
GitHub user MechCoder opened a pull request:
https://github.com/apache/spark/pull/4677
[SPARK-5436] [MLlib] Validate GradientBoostedTrees during train
One can early stop if the decrease in error rate is lesser than a certain
tol, or if the error increases if the training data
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4677#issuecomment-74953724
@jkbradley I just wanted to know if this is in the right direction.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user MechCoder commented on a diff in the pull request:
https://github.com/apache/spark/pull/4654#discussion_r24877024
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala ---
@@ -168,16 +182,26 @@ class GaussianMixture private
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4231#issuecomment-74819739
Thanks ! Looking forward to learn lot's more
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4654#issuecomment-74740596
@tgaloppo I've addressed the issue with distributing the Gaussian updates,
in the latest commit. But it breaks tests (Note that I've set
distributeGaussian explicitly
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4654#issuecomment-74734084
I could not distribute the other Gaussian update, since this line
(https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4654#issuecomment-74734555
Err.. No. Should have been some other error while I tested it. Will update
that in a while.
---
If your project is set up for it, you can reply to this email and have
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4231#issuecomment-74723052
@manishamde Thanks. The LGTM suggests that this should be good to go in! ;)
---
If your project is set up for it, you can reply to this email and have your
reply
GitHub user MechCoder opened a pull request:
https://github.com/apache/spark/pull/4654
[SPARK-5016] Distribute Gaussian Initialization in GaussianMixture
Following discussion in the JIRA
You can merge this pull request into a Git repository by running:
$ git pull https
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4654#issuecomment-74734747
Do you want me to time any specific data?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user MechCoder commented on a diff in the pull request:
https://github.com/apache/spark/pull/4654#discussion_r24846111
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala ---
@@ -135,25 +135,39 @@ class GaussianMixture private
Github user MechCoder commented on a diff in the pull request:
https://github.com/apache/spark/pull/4654#discussion_r24848796
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala ---
@@ -168,16 +182,26 @@ class GaussianMixture private
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4231#issuecomment-74609225
@jkbradley fixed!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user MechCoder commented on a diff in the pull request:
https://github.com/apache/spark/pull/4231#discussion_r24789326
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -1064,9 +1045,12 @@ object DecisionTree extends Serializable
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4459#issuecomment-73831296
Thanks @tgaloppo and @mengxr . Any idea what to touch in GaussianMixture
next? The parallelized Gaussian initialization.
---
If your project is set up for it, you can
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4459#issuecomment-73667895
@mengxr Just to make it easier for you, a small description.
GaussianMixture used to support sparse input, by converting it to DenseVectors,
which is non-optimal
Github user MechCoder commented on a diff in the pull request:
https://github.com/apache/spark/pull/4459#discussion_r24439928
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
@@ -235,12 +235,23 @@ private[spark] object BLAS extends Serializable
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4459#issuecomment-73777170
@mengxr Fixed up your comments. Let me know if there is anything else.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4459#issuecomment-73649664
@tgaloppo Alright, thanks for the explanation. What makes you think that
the covariance matrix is wrong. I calculated it manually and it seems to be
right. I added
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4459#issuecomment-73554216
Also a noob question, but what is the significance of the negative variance
in the tests?
---
If your project is set up for it, you can reply to this email and have
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4459#issuecomment-73551176
@tgaloppo I fixed it up. Can you have a look?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4459#issuecomment-73508202
@tgaloppo Thanks for your valuable feedback. Do you have anything more to
add as of now?
---
If your project is set up for it, you can reply to this email and have
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4231#issuecomment-73348866
@jkbradley Thanks for your reviews. I fixed them up.
Anyone else want to have a final look?
cc @manishamde @mengxr ?
---
If your project is set up
GitHub user MechCoder opened a pull request:
https://github.com/apache/spark/pull/4459
[SPARK-5021] Gaussian Mixture now supports Sparse Input
Following discussion in the Jira.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4459#issuecomment-73429922
ping @tgaloppo and @jkbradley (whenever you are back!)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user MechCoder commented on a diff in the pull request:
https://github.com/apache/spark/pull/4459#discussion_r24310707
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala ---
@@ -215,20 +217,29 @@ private object ExpectationSum {
def
Github user MechCoder commented on a diff in the pull request:
https://github.com/apache/spark/pull/4459#discussion_r24310793
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/clustering/GaussianMixtureSuite.scala
---
@@ -40,10 +41,15 @@ class GaussianMixtureSuite extends
Github user MechCoder commented on a diff in the pull request:
https://github.com/apache/spark/pull/4459#discussion_r24308815
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/clustering/GaussianMixtureSuite.scala
---
@@ -40,10 +41,15 @@ class GaussianMixtureSuite extends
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4459#issuecomment-73456587
@tgaloppo Thanks. I shall fix them in a while. However, does the general
code look good to you?
---
If your project is set up for it, you can reply to this email
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4231#issuecomment-73300352
ping @jkbradley ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4231#issuecomment-72696085
@jkbradley Sorry for being impatient, but would you be able to have a look
anytime soon?
---
If your project is set up for it, you can reply to this email and have
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4231#issuecomment-72331831
Thanks :) Looking forward to your reviews. I will work on other stuff till
then.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4231#issuecomment-72253005
Just out of curiosity, what happens if the deadline for code freeze
passes?, i.e tomorrow.
---
If your project is set up for it, you can reply to this email and have
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4231#issuecomment-72090175
@jkbradley Can I please get a pass or comments on this? Or maybe others who
are familiar with the tree code (@mengxr )
---
If your project is set up for it, you can
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4231#issuecomment-71724940
ping @jkbradley . Would be great if you could have a look.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
GitHub user MechCoder opened a pull request:
https://github.com/apache/spark/pull/4231
[SPARK-3381] [MLlib] Eliminate bins for unordered features
For unordered features, it is sufficient to use splits since the threshold
of the split corresponds to the the threshold
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4231#issuecomment-71779796
It also returns empty bins, just to be compatible with the present API.
Hopefully that's not a problem.
---
If your project is set up for it, you can reply
Github user MechCoder commented on a diff in the pull request:
https://github.com/apache/spark/pull/4231#discussion_r23665261
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impl/TreePoint.scala ---
@@ -96,14 +96,12 @@ private[tree] object TreePoint {
* Find bin
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-71584396
@mengxr This can also be viewd as a bugfix which prevents overwriting of
the param `subSamplingRate`, which was hardcoded to 1.0
---
If your project is set up
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-71335512
ping @jkbradley Could you please have a final look?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-71349920
@jkbradley Fixed. I can haz merge?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70982719
@jkbradley Thanks for the tip. Fixed. Anything more?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70983304
Repushed after fixing the style checks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user MechCoder commented on a diff in the pull request:
https://github.com/apache/spark/pull/4073#discussion_r23250726
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/tree/RandomForestSuite.scala ---
@@ -196,6 +196,24 @@ class RandomForestSuite extends FunSuite
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4099#issuecomment-70788974
Thanks for the comment.
I meant that now udt is a ClassTag. If udt were to be an Object or an
instance of a class, then I'm not sure if it possible
Github user MechCoder closed the pull request at:
https://github.com/apache/spark/pull/4099
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4099#issuecomment-70789550
Alright, closing could you please mark the issue as fixed?
Sad that I've had to close two Pull Request's for the same reasons.
Hopefully I'll find something
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4099#issuecomment-70717091
@jkbradley Thanks for the clarification. I gave a few attempts, but I don't
think I'm doing it the right way. This is what you mean right?
Presently
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4099#issuecomment-70458668
@jkbradley I'm also not sure how the `SQLUserDefinedType` is supposed to
work. Do you mind giving me a short note?
---
If your project is set up for it, you can reply
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4096#issuecomment-70487065
@mengxr The other PR that you refer to implements sorting, this however
implements O(size) check for sparse vectors where size is typically really
small (It does
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70538422
@mengxr @jkbradley Any more comments? Sorry for spamming, but I would like
to work on other issues related to GBRT and RandomForests as well.
---
If your project
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4096#issuecomment-70537775
Alright, closing please mark the issue as resolved.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user MechCoder closed the pull request at:
https://github.com/apache/spark/pull/4096
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user MechCoder opened a pull request:
https://github.com/apache/spark/pull/4099
[SPARK-5022] [Sql] Change VectorUDT to object
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MechCoder/spark spark-5022
Alternatively you
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4099#issuecomment-70450637
cc @rxin I am unable to understand how to change this line
`@SQLUserDefinedType(udt = classOf[VectorUDT])` . I tried doing
`@SQLUserDefinedType(udt
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4096#issuecomment-70441989
Alright, but maybe the documentation can be updated that the indices should
be non-negative?
---
If your project is set up for it, you can reply to this email
GitHub user MechCoder opened a pull request:
https://github.com/apache/spark/pull/4096
[SPARK-5257] SparseVector indices must be non-negative
Scala afaik does not support negative indexing. And fail early if negative
indices are used.
You can merge this pull request into a Git
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4096#issuecomment-70410393
@davies @mengxr This is a minor PR. Can you please review?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4096#issuecomment-70420474
Sorry for the phony Jenkins comments. Is there a way I can turn it off,
until I explicitly turn it on,
---
If your project is set up for it, you can reply
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4096#issuecomment-70421943
I am extremely sorry for the phony comments due to Jenkins. Is there any
way to test it only when asked?
---
If your project is set up for it, you can reply
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4096#issuecomment-70414610
Hmm.. If I do that, would it seem odd, if `SparseVector` in the PythonApi
allows negative indexing, but not negative indices in the constructor?
I implemented
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4096#issuecomment-70414854
Thinking again, negative indexing makes sense, but not supplying an array
of negative integers as indices. Let me know what you think about it.
---
If your project
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4096#issuecomment-70415840
@davies Thanks. I've fixed it up. Anything more?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70369703
Could you please tell me what is the preferred way to generate random data
in spark?
---
If your project is set up for it, you can reply to this email and have your
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70375368
@jkbradley I've added a test according to the other tests in the
`RandomForestSuite` . Let me know if there is anything left.
---
If your project is set up
GitHub user MechCoder opened a pull request:
https://github.com/apache/spark/pull/4073
[SPARK-3726] [MLlib] Allow sampling_rate not equal to 1.0
I've added support for sampling_rate not equal to 1.0 . I have two major
questions.
1. A Scala style test is failing, since
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70283608
@jkbradley @mengxr it would be great if you could have a look.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70297536
I've made changes such that this not break anything, i.e everything is
backward compat.
---
If your project is set up for it, you can reply to this email and have
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70300939
@jkbradley, the issue is that the function `train` has more than 10 args.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70302451
@jkbradley Oops, the comments got deleted somehow. I meant that this is
because there are 10 arguments in `trainClassifier` and `trainRegressor`
---
If your project
Github user MechCoder closed the pull request at:
https://github.com/apache/spark/pull/4073
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70304290
Oh well, but still if I'm not mistaken, the `subSamplingRate` is overriden
by the condition `numTrees 1`. This should not be the case as having a lower
sampling
GitHub user MechCoder reopened a pull request:
https://github.com/apache/spark/pull/4073
[SPARK-3726] [MLlib] Allow sampling_rate not equal to 1.0
I've added support for sampling_rate not equal to 1.0 . I have two major
questions.
1. A Scala style test is failing, since
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70306545
Thanks, Also a design decision, is it worthy enough to add this as an
option to `train` given that it is now within the style limit?
---
If your project is set up
801 - 900 of 920 matches
Mail list logo