[GitHub] spark pull request: [Docs] Added Scaladoc for countApprox and coun...

2016-05-06 Thread ntietz
Github user ntietz commented on the pull request:

https://github.com/apache/spark/pull/12955#issuecomment-217547803
  
I'll dig deeper into the documentation and will update!

I've got the bandwidth to work on this. I'll add a JIRA for it tonight and 
work on it over the course of the week.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Docs] Added Scaladoc for countApprox and coun...

2016-05-06 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/12955#issuecomment-217528845
  
Hm, then there are actually similar methods for key-value RDDs in 
`PairRDDFunctions` and `JavaPairRDD` -- basically the non "distinct" ones. 

I think we could also say a little bit more about what confidence means. I 
believe it's fair to say that, if called repeatedly, this is the fraction of 
results whose bounds are expected to contain the true count.

If you want to keep going ... man, I see that this value isn't even checked 
to be in [0,1]. Really (0,1) is all that makes sense. Everywhere in the code 
that computes an inverse CDF doesn't check this arg, and it will already throw 
an exception if so.

If you're up for it, make a JIRA to describe the expanded scope. Otherwise 
I'll do it at some point if you don't have bandwidth.

More points for describing in some more detail what "relativeSD" means for 
the "approx distinct" count methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Docs] Added Scaladoc for countApprox and coun...

2016-05-06 Thread ntietz
Github user ntietz commented on the pull request:

https://github.com/apache/spark/pull/12955#issuecomment-217496563
  
I've added documentation for the Java version. Anything else that should go 
in with this PR?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Docs] Added Scaladoc for countApprox and coun...

2016-05-06 Thread ntietz
Github user ntietz commented on the pull request:

https://github.com/apache/spark/pull/12955#issuecomment-217467133
  
Good call, I will add it to the Java version as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Docs] Added Scaladoc for countApprox and coun...

2016-05-06 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/12955#issuecomment-217465951
  
How about doc'ing the Java version as well in JavaRDDLike.scala?
You're welcome to expand on the java/scaladoc of lots of these methods. 
It'd be nicer to have more complete doc of method args and return type for such 
a central API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Docs] Added Scaladoc for countApprox and coun...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12955#issuecomment-217452079
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Docs] Added Scaladoc for countApprox and coun...

2016-05-06 Thread ntietz
GitHub user ntietz opened a pull request:

https://github.com/apache/spark/pull/12955

[Docs] Added Scaladoc for countApprox and countByValueApprox parameters

This pull request simply adds Scaladoc documentation of the parameters for 
countApprox and countByValueApprox.

This is an important documentation change, as it clarifies what should be 
passed in for the timeout. Without units, this was previously unclear.

I did not open a JIRA ticket per my understanding of the project 
contribution guidelines; as they state, the description in the ticket would be 
essentially just what is in the PR. If I should open one, let me know and I 
will do so.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ntietz/spark rdd-countapprox-docs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12955.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12955


commit a28014dc45981b79df6e6c18f473565eb740638c
Author: Nicholas Tietz 
Date:   2016-05-06T14:07:21Z

Added Scaladoc for countApprox and countByValueApprox




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org