Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-139660661
closing per discussion on parent Jira
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user sabhyankar closed the pull request at:
https://github.com/apache/spark/pull/8241
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-139096528
@sabhyankar Thanks for discussing these issues. Given the various
arguments, I'd vote for:
* eliminating the trait --- It's a nice idea but seems like it saves too
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135895566
@sabhyankar Yes, the first issue also applies to the current approach. But
I was expecting to solve it with this effort (broadcast less and only once). It
is awkward to d
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135881841
Hi @mengxr The current approach of rebroadcasting on every predict
broadcasts the entire model and so I suppose the first issue you identified
also applies to the cur
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135629255
Couple issues. Some were already mentioned by @feynmanliang :
1. Broadcasting the entire model. For example, we might just need some
pieces of it. For linear mode
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135482010
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135482014
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135481837
[Test build #41697 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41697/console)
for PR 8241 at commit
[`965aaec`](https://github.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135469703
[Test build #41697 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41697/consoleFull)
for PR 8241 at commit
[`965aaec`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135469158
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135469129
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not h
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135468036
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enab
Github user feynmanliang commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135254343
LGTM pending tests
@jkbradley @mengxr can you trigger tests?
---
If your project is set up for it, you can reply to this email and have your
reply appear o
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135241069
Sounds good @jkbradley . I dont think there is anything pending on my side
for this PR. Let me know if you see something otherwise. I will update the
other PRs once t
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135169071
I'm OK with it being private to spark (not mllib) as long as the docs warn
about usage.
---
If your project is set up for it, you can reply to this email and have you
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135121895
@feynmanliang - That sounds good - I can document the intent of the
Broadcastable trait in the scaladoc. The reason I made this trait
private[spark] is because we wan
Github user sabhyankar commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r38011826
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -21,16 +21,15 @@ import java.lang.{Iterable => JIterable}
Github user feynmanliang commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135118506
@jkbradley and I talked offline, summary: lets make it `private[mllib]` and
clearly document in the trait's scaladoc that it's to prevent rebroadcasting
during pred
Github user feynmanliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r38010012
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -21,16 +21,15 @@ import java.lang.{Iterable => JIterable}
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135046260
@jkbradley @feynmanliang I can certainly update the PR and change the
filename and trait name to be the same (Broadcastable).
I understand the concern regard
Github user sabhyankar commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r37980863
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -21,16 +21,15 @@ import java.lang.{Iterable => JIterable}
Github user feynmanliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r37946459
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -21,16 +21,15 @@ import java.lang.{Iterable => JIterable}
Github user feynmanliang commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-134831829
OK, I understand why the ClassTag is necessary, thanks.
@jkbradley @sabhyankar I have some concerns with this PR:
1. In the other PRs, `lclBcModel`
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-134826539
@jkbradley I have resolved the conflicts and updated the PR with the
changes identified. Let me know if you see any issues. Since I am not combining
PRs, this will ha
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-134810307
I think the classtag is needed because Broadcast needs the classtag.
@feynmanliang This trait is for spark.mllib too, not just spark.ml. (Model
is only a conc
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-134809744
@sabhyankar Actually, I'd recommend just keeping 1 PR per class. I think
that's easier for avoiding conflicts, even if the changes are all pretty
similar.
---
If yo
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-134806330
@feynmanliang without the ClassTag, the compiler complains that no ClassTag
is available for our type T.
No ClassTag available for T
[error] case No
Github user feynmanliang commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-134803105
@sabhyankar why is the ClassTag necessary? What's the error you're getting?
You can tag multiple JIRA issues in a single PR, see #7507 for example
---
If y
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-134796319
@feynmanliang @jkbradley thank you for the review guys. Traits will not
allow context bound (to ClassTag) for type parameters and so I ended up using
implicits in the
Github user feynmanliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r37938624
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/util/ModelBroadcast.scala ---
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foun
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r37936395
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/util/ModelBroadcast.scala ---
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundat
Github user feynmanliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r37934185
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/util/ModelBroadcast.scala ---
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foun
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-134771964
I like this approach. I'll check again after the conflicts have been
resolved. Thanks!
---
If your project is set up for it, you can reply to this email and have yo
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r37933321
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/util/ModelBroadcast.scala ---
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundat
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r37933318
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/util/ModelBroadcast.scala ---
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundat
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r37933322
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/util/ModelBroadcast.scala ---
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundat
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-134771502
I'll make some comments, but can you please resolved conflicts here?
---
If your project is set up for it, you can reply to this email and have your
reply appear on Gi
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-134771542
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user feynmanliang commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-134412363
@sabhyankar looks good overall:
* Do you mind resolving merge conflicts?
* Can you put #8248 #8247 #8243 #8241 and #8249 all in one PR and close
the o
Github user holdenk commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-132836131
Glad to help, maybe @jkbradley can say if this is ok to test to jenkins :)
---
If your project is set up for it, you can reply to this email and have your
reply appear o
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-132805558
I forgot to mention, I am also only passing in the SparkContext now. I
should have done that to begin with :(
---
If your project is set up for it, you can reply to
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-132805093
Thanks @holdenk ! That makes sense. I moved the private var to the trait.
Let me know if you see something else that is out of place. I believe the
generic types I am
Github user holdenk commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-132746927
I know my own PRs get triggered by jenkins, lets see if I can tell jenkins
this is good to test or if we will need to get someone else.
---
If your project is set up fo
Github user holdenk commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-132745648
Awesome, great work :) Looking at the trait, it seems like you could maybe
even move the var inside of the trait. Also passing the RDD around to get the
context out of i
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-132631199
Hi @holdenk I have moved the common logic to a trait and am mixin it with
the model. Let me know if you see something that needs to be corrected. I am
still getting f
Github user sabhyankar commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r37410650
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -83,9 +86,13 @@ class NaiveBayesModel private[spark] (
Github user holdenk commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r37365280
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -83,9 +86,13 @@ class NaiveBayesModel private[spark] (
}
Github user holdenk commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r37364912
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -19,6 +19,8 @@ package org.apache.spark.mllib.classification
Github user holdenk commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-132387819
I can take a look at some of the others too, I'm not a committer though so
will still need a follow up review, but I can take a first pass :)
---
If your project is set
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-132374365
@holdenk Not sure if you are reviewing the other PRs, but the fix should
now be in all of them. Thx!
---
If your project is set up for it, you can reply to this emai
Github user holdenk commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-132365264
Great :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabl
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-132364417
@holdenk yep :) I am updating those as well!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your p
Github user holdenk commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-132363721
Awesome, it also seemed to be in many of your related PRs, you might want
to update those as well.
---
If your project is set up for it, you can reply to this email and
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-132362277
Thanks for pointing that out @holdenk ! I have pushed a change to the PR!
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user holdenk commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r37352728
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -83,9 +86,12 @@ class NaiveBayesModel private[spark] (
}
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-131855133
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your pr
GitHub user sabhyankar opened a pull request:
https://github.com/apache/spark/pull/8241
[SPARK-10017] [MLlib]: ML model broadcasts should be stored in private
vars: mllib NaiveBayes
ML model broadcasts should be stored in private vars: mllib NaiveBayes
You can merge this pull reque
58 matches
Mail list logo