[GitHub] spark pull request: [SPARKR] [SPARK-10328] Fix generic for na.omit
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8495#issuecomment-135624869 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR] [SPARK-10328] Fix generic for na.omit
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8495#issuecomment-135624510 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR] [SPARK-10328] Fix generic for na.omit
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8495#issuecomment-135624503 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10049][SPARKR] Support collecting data ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8458#issuecomment-135623885 [Test build #41723 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41723/consoleFull) for PR 8458 at commit [`2bc97ad`](https://github.com/apache/spark/commit/2bc97adc8a081301e0bc7394d35dd617a9ae49a5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5753] [SQL] add JDBCRDD support for pos...
Github user jakajancar commented on the pull request: https://github.com/apache/spark/pull/4549#issuecomment-135623856 @lepfhty Has any progress been made on this? Can you point me to a PR/branch/JIRA issue/... that I can subscribe to? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR] [SPARK-10328] Fix generic for na.omit
Github user yu-iskw commented on a diff in the pull request: https://github.com/apache/spark/pull/8495#discussion_r38168825 --- Diff: R/pkg/R/generics.R --- @@ -413,7 +413,7 @@ setGeneric("dropna", #' @rdname nafunctions #' @export setGeneric("na.omit", --- End diff -- I'm just testing if we need this generic or not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10049][SPARKR] Support collecting data ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8458#issuecomment-135623380 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10323] [SQL] fix nullability of In/InSe...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/8492#issuecomment-135623342 OK. I guess the main question at here is if we want to have a different semantic with hive on `array_contains` regarding `null`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10049][SPARKR] Support collecting data ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8458#issuecomment-135623371 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR] [SPARK-10328] Fix generic for na.omit
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8495#issuecomment-135622466 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR] [SPARK-10328] Fix generic for na.omit
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8495#issuecomment-135622468 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41721/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR] [SPARK-10328] Fix generic for na.omit
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8495#issuecomment-135621721 [Test build #41722 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41722/consoleFull) for PR 8495 at commit [`4758a87`](https://github.com/apache/spark/commit/4758a87ea3b74914ffd2870e1a736472944c4a04). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR] [SPARK-10328] Fix generic for na.omit
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8495#issuecomment-135621561 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10323] [SQL] fix nullability of In/InSe...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/8492#issuecomment-135621549 postgresql's output regarding `in`... ``` yhuai=# select cast(null as char(10)) in ('1', cast(null as char(10))); ?column? -- (1 row) yhuai=# select cast(null as char(10)) in ('1', cast(null as char(10))) is null; ?column? -- t (1 row) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR] [SPARK-10328] Fix generic for na.omit
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8495#issuecomment-135621545 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR] [SPARK-10328] Fix generic for na.omit
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8495#issuecomment-135621454 @shivaram All right. I am checking this on my local machine. Please give me some minutes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR] [SPARK-10328] Fix generic for na.omit
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/8495#issuecomment-135620933 @yu-iskw I also found a minor bug in lint-r that I just fixed. Please let me know if that is good. With this change lint-r passes on my machine --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR] [SPARK-10328] Fix generic for na.omit
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8495#issuecomment-135620803 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR] [SPARK-10328] Fix generic for na.omit
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8495#issuecomment-135620819 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10323] [SQL] fix nullability of In/InSe...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/8492#issuecomment-135620750 Here is the output of some sample tests using hive 1.2.1 ``` hive> select cast(null as string) in ('1', cast(null as string)); OK NULL Time taken: 0.042 seconds, Fetched: 1 row(s) hive> select array_contains(array('1', cast(null as string)), cast(null as string)); OK false Time taken: 0.042 seconds, Fetched: 1 row(s) hive> ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR] [SPARK-10328] Fix generic for na.omit
GitHub user shivaram opened a pull request: https://github.com/apache/spark/pull/8495 [SPARKR] [SPARK-10328] Fix generic for na.omit S3 function is at https://stat.ethz.ch/R-manual/R-patched/library/stats/html/na.fail.html You can merge this pull request into a Git repository by running: $ git pull https://github.com/shivaram/spark-1 na-omit-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8495.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8495 commit ff733d2aef7bc204a9361bcbb0415b97841a71b1 Author: Shivaram Venkataraman Date: 2015-08-28T03:01:32Z Fix generic for na.omit --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR] [SPARK-10328] Fix generic for na.omit
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/8495#issuecomment-135620501 cc @yu-iskw --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8343#issuecomment-135620315 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8343#issuecomment-135620319 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41718/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8343#issuecomment-135620132 [Test build #41718 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41718/console) for PR 8343 at commit [`472c767`](https://github.com/apache/spark/commit/472c76714c25b909e281d8079b7ead6c152d4512). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10327][SQL] Cache Table is not working ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8494#issuecomment-135620007 [Test build #41720 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41720/consoleFull) for PR 8494 at commit [`bfd40d9`](https://github.com/apache/spark/commit/bfd40d999b6530bc04fc03ea6591c0093e10e534). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9741][SQL] Approximate Count Distinct u...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8362#issuecomment-135619890 [Test build #41719 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41719/consoleFull) for PR 8362 at commit [`1ea722b`](https://github.com/apache/spark/commit/1ea722b44745036ef568447f9db93a7ebade8b12). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10327][SQL] Cache Table is not working ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8494#issuecomment-135619546 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10327][SQL] Cache Table is not working ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8494#issuecomment-135619597 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/8343#issuecomment-135619437 I think I found the problem. Our `setGeneric` for `na.omit` is wrong. It is being too restrictive. We need a diff which looks like ``` - function(x, how = c("any", "all"), minNonNulls = NULL, cols = NULL) { + function(object, ...) { ``` I'll send a PR in a minute --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9741][SQL] Approximate Count Distinct u...
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/8362#issuecomment-135619245 Implemented initial non-sparse HLL++. I am going to take a look at the sparse version next week. The results are still equal to the Clearspring HLL+ implementation in non-sparse mode. I also need to clean-up the docs for the main HLL++ class a bit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10327][SQL] Cache Table is not working ...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/8494#issuecomment-135618510 cc @marmbrus --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10327][SQL] Cache Table is not working ...
GitHub user chenghao-intel opened a pull request: https://github.com/apache/spark/pull/8494 [SPARK-10327][SQL] Cache Table is not working while subquery has alias in its project list ```scala import org.apache.spark.sql.hive.execution.HiveTableScan sql("select key, value, key + 1 from src").registerTempTable("abc") cacheTable("abc") val sparkPlan = sql( """select a.key, b.key, c.key from |abc a join abc b on a.key=b.key |join abc c on a.key=c.key""".stripMargin).queryExecution.sparkPlan assert(sparkPlan.collect { case e: InMemoryColumnarTableScan => e }.size === 3) // failed assert(sparkPlan.collect { case e: HiveTableScan => e }.size === 0) // failed ``` The actual plan is: ``` == Parsed Logical Plan == 'Project [unresolvedalias('a.key),unresolvedalias('b.key),unresolvedalias('c.key)] 'Join Inner, Some(('a.key = 'c.key)) 'Join Inner, Some(('a.key = 'b.key)) 'UnresolvedRelation [abc], Some(a) 'UnresolvedRelation [abc], Some(b) 'UnresolvedRelation [abc], Some(c) == Analyzed Logical Plan == key: int, key: int, key: int Project [key#14,key#61,key#66] Join Inner, Some((key#14 = key#66)) Join Inner, Some((key#14 = key#61)) Subquery a Subquery abc Project [key#14,value#15,(key#14 + 1) AS _c2#16] MetastoreRelation default, src, None Subquery b Subquery abc Project [key#61,value#62,(key#61 + 1) AS _c2#58] MetastoreRelation default, src, None Subquery c Subquery abc Project [key#66,value#67,(key#66 + 1) AS _c2#63] MetastoreRelation default, src, None == Optimized Logical Plan == Project [key#14,key#61,key#66] Join Inner, Some((key#14 = key#66)) Project [key#14,key#61] Join Inner, Some((key#14 = key#61)) Project [key#14] InMemoryRelation [key#14,value#15,_c2#16], true, 1, StorageLevel(true, true, false, true, 1), (Project [key#14,value#15,(key#14 + 1) AS _c2#16]), Some(abc) Project [key#61] MetastoreRelation default, src, None Project [key#66] MetastoreRelation default, src, None == Physical Plan == TungstenProject [key#14,key#61,key#66] BroadcastHashJoin [key#14], [key#66], BuildRight TungstenProject [key#14,key#61] BroadcastHashJoin [key#14], [key#61], BuildRight ConvertToUnsafe InMemoryColumnarTableScan [key#14], (InMemoryRelation [key#14,value#15,_c2#16], true, 1, StorageLevel(true, true, false, true, 1), (Project [key#14,value#15,(key#14 + 1) AS _c2#16]), Some(abc)) ConvertToUnsafe HiveTableScan [key#61], (MetastoreRelation default, src, None) ConvertToUnsafe HiveTableScan [key#66], (MetastoreRelation default, src, None) ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenghao-intel/spark weird_cache Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8494.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8494 commit bfd40d999b6530bc04fc03ea6591c0093e10e534 Author: Cheng Hao Date: 2015-08-28T02:41:56Z Cache Table is not working while subquery has alias in its project list --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8484#issuecomment-135618198 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41714/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8484#issuecomment-135618193 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8484#issuecomment-135618028 [Test build #41714 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41714/console) for PR 8484 at commit [`35371fb`](https://github.com/apache/spark/commit/35371fb629217ee27ccda451c931d04137c05f93). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9741][SQL] Approximate Count Distinct u...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8362#issuecomment-135617993 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8343#issuecomment-135617969 @shivaram I see. I'll investigate the cause. Thank you for letting me know. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9741][SQL] Approximate Count Distinct u...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8362#issuecomment-135617983 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/8343#issuecomment-135617728 @yu-iskw I actually an error in Jenkins which says. I guess this is from the PR that added na.omit to the NAMESPACE yesterday ``` Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'na.omit' for signature '"integer"' Calls: lint_package ... ends -> as.igraph.es -> inherits -> na.omit -> ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8505][SparkR] Add settings to kick `lin...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/7883#issuecomment-135617611 @shivaram thank you for merging it. I keep watching the Jenkins in a couple of hours. If it will go well, I will inform the community about this lint script. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8343#issuecomment-135617402 [Test build #41718 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41718/consoleFull) for PR 8343 at commit [`472c767`](https://github.com/apache/spark/commit/472c76714c25b909e281d8079b7ead6c152d4512). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8343#issuecomment-135616536 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8343#issuecomment-135616551 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/8343#issuecomment-135616422 @lresende I'm just retesting this as we merged a R style checker. I'm sure this PR should be fine, but just want to run this through to make sure things are working fine. FYI @yu-iskw --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8505][SparkR] Add settings to kick `lin...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/7883 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/8343#issuecomment-135616345 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/8343#issuecomment-135616119 Thanks @lresende -- LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8813][SQL] Combine files when there're ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/8125#discussion_r38167285 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/CombineSmallFile.scala --- @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources + +import org.apache.hadoop.fs.{FileStatus, FileSystem, Path} +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.SQLContext + +object CombineSmallFile { + def combineWithFiles[T](rdd: RDD[T], sqlContext: SQLContext, inputFiles: Array[FileStatus]) + : RDD[T] = { +if (sqlContext.conf.combineSmallFile) { + val totalLen = inputFiles.map { file => +if (file.isDir) 0L else file.getLen + }.sum + val numPartitions = (totalLen / sqlContext.conf.splitSize + 1).toInt + rdd.coalesce(numPartitions) --- End diff -- I think this is a very hack way to solve this problem. As we can not tell how the the data source to be split, even for Hadoop, the split size just a hint, use that for computing the partition number probably too risky for a generic data process framework. And the `RDD.coalesce` actually will combine the splits in a arbitrary way, it's probably causes the data skew, as we most likely combine the large partitions into a a single task. IMO, I'd like to deep investigate how Hive to combine the small partitions, by using the `CombineHiveInputFormat` or `HiveInputFormat`, which seems has a strategy to select the partitions according to both input format, and also keep the balance. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8505][SparkR] Add settings to kick `lin...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/7883#issuecomment-135615906 Alright I'm going to merge this as its better to do so before more breaking style changes get in. Will watch Jenkins for the next couple of hours to make sure things are fine --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/8464#discussion_r38166499 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/local/LimitNode.scala --- @@ -0,0 +1,45 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql.execution.local + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Attribute + + +case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode { --- End diff -- I think we still need `filter`, or `map` for these iterator trees. @rxin is there anything I misunderstand for the `LocalNode` design? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10188] [Pyspark] Pyspark CrossValidator...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8399#issuecomment-135605403 [Test build #1700 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1700/console) for PR 8399 at commit [`bada453`](https://github.com/apache/spark/commit/bada4539227a3705337beea7e08bdc45183e2903). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7878#issuecomment-135604744 [Test build #41711 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41711/console) for PR 7878 at commit [`cf58c49`](https://github.com/apache/spark/commit/cf58c49c3be31c8e33639ba68eca16398f98c7f6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7878#issuecomment-135604771 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41711/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7878#issuecomment-135604767 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8464#issuecomment-135603859 [Test build #41717 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41717/consoleFull) for PR 8464 at commit [`7dcd502`](https://github.com/apache/spark/commit/7dcd502fc7278978fab5a233f4a81fefcca8bf72). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8464#issuecomment-135603152 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/8464#discussion_r38165750 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/local/LimitNode.scala --- @@ -0,0 +1,45 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql.execution.local + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Attribute + + +case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode { + + private[this] var count = 0 + + override def output: Seq[Attribute] = child.output + + override def open(): Unit = child.open() + + override def close(): Unit = child.close() + + override def get(): InternalRow = child.get() --- End diff -- Renamed to `fetch` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10188] [Pyspark] Pyspark CrossValidator...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8399#issuecomment-135603184 [Test build #1700 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1700/consoleFull) for PR 8399 at commit [`bada453`](https://github.com/apache/spark/commit/bada4539227a3705337beea7e08bdc45183e2903). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8464#issuecomment-135603138 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/8464#discussion_r38165742 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/local/LocalNodeTest.scala --- @@ -0,0 +1,189 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql.execution.local + +import scala.util.control.NonFatal + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow} +import org.apache.spark.sql.catalyst.util._ +import org.apache.spark.sql.{DataFrame, Row} +import org.apache.spark.sql.types.StructType + +class LocalNodeTest extends SparkFunSuite { + + /** + * Runs the LocalNode and makes sure the answer matches the expected result. + * @param input the input data to be used. + * @param nodeFunction a function which accepts the input LocalNode and uses it to instantiate + * the local physical operator that's being tested. + * @param expectedAnswer the expected result in a [[Seq]] of [[Row]]s. + * @param sortAnswers if true, the answers will be sorted by their toString representations prior + *to being compared. + */ + protected def checkAnswer( + input: DataFrame, + nodeFunction: LocalNode => LocalNode, + expectedAnswer: Seq[Row], + sortAnswers: Boolean = true): Unit = { +doCheckAnswer( + input :: Nil, + nodes => nodeFunction(nodes.head), + expectedAnswer, + sortAnswers) + } + + /** + * Runs the LocalNode and makes sure the answer matches the expected result. + * @param left the left input data to be used. + * @param right the right input data to be used. + * @param nodeFunction a function which accepts the input LocalNode and uses it to instantiate + * the local physical operator that's being tested. + * @param expectedAnswer the expected result in a [[Seq]] of [[Row]]s. + * @param sortAnswers if true, the answers will be sorted by their toString representations prior + *to being compared. + */ + protected def checkAnswer2( + left: DataFrame, + right: DataFrame, + nodeFunction: (LocalNode, LocalNode) => LocalNode, + expectedAnswer: Seq[Row], + sortAnswers: Boolean = true): Unit = { +doCheckAnswer( + left :: right :: Nil, + nodes => nodeFunction(nodes(0), nodes(1)), + expectedAnswer, + sortAnswers) + } + + /** + * Runs the `LocalNode`s and makes sure the answer matches the expected result. + * @param input the input data to be used. + * @param nodeFunction a function which accepts a sequence of input `LocalNode`s and uses them to + * instantiate the local physical operator that's being tested. + * @param expectedAnswer the expected result in a [[Seq]] of [[Row]]s. + * @param sortAnswers if true, the answers will be sorted by their toString representations prior + *to being compared. + */ + protected def doCheckAnswer( +input: Seq[DataFrame], +nodeFunction: Seq[LocalNode] => LocalNode, +expectedAnswer: Seq[Row], +sortAnswers: Boolean = true): Unit = { +LocalNodeTest.checkAnswer( + input.map(dataFrameToSeqScanNode), nodeFunction, expectedAnswer, sortAnswers) match { + case Some(errorMessage) => fail(errorMessage) + case None => +} + } + + protected def dataFrameToSeqScanNode(df: DataFrame): SeqScanNode = { +new SeqScanNode( + df.queryExecution.sparkPlan.output, + df.queryExecution.toRdd.map(_.copy()).collect()) + } + +} + +/** + * Helper methods for writing tests of individual local physical operators. + */ +object LocalNodeTest { + + /** + * Runs the `LocalNode`s and makes
[GitHub] spark pull request: [SPARK-10188] [Pyspark] Pyspark CrossValidator...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/8399#issuecomment-135603112 Ping @mengxr In case I can't check this soon, it would be great to get this into 1.5 if there is an RC3. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10311][Streaming]Reload appId and attem...
Github user XuTingjun commented on the pull request: https://github.com/apache/spark/pull/8477#issuecomment-135602834 Sorry that I don't declare the problem clearly. When an app starts with CheckPoint file using [getOrCreate method](https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala#L829), the new AM process will new a SparkContext object, but just using the [old SparkConf](https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala#L140), So the new attemptId set by new AM process doesn't do anything. Also the appId is the same. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [CORE][MINOR] Whitespace fixes in RangePartiti...
Github user ihainan closed the pull request at: https://github.com/apache/spark/pull/8480 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10326] [yarn] Fix app submission on win...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8493#issuecomment-135602276 [Test build #41716 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41716/consoleFull) for PR 8493 at commit [`a14dba5`](https://github.com/apache/spark/commit/a14dba5233526f844a68d77c5d765d98b0534e2a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/8464#discussion_r38165165 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/local/LocalNodeTest.scala --- @@ -0,0 +1,189 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql.execution.local + +import scala.util.control.NonFatal + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow} +import org.apache.spark.sql.catalyst.util._ +import org.apache.spark.sql.{DataFrame, Row} +import org.apache.spark.sql.types.StructType + +class LocalNodeTest extends SparkFunSuite { + + /** + * Runs the LocalNode and makes sure the answer matches the expected result. + * @param input the input data to be used. + * @param nodeFunction a function which accepts the input LocalNode and uses it to instantiate + * the local physical operator that's being tested. + * @param expectedAnswer the expected result in a [[Seq]] of [[Row]]s. + * @param sortAnswers if true, the answers will be sorted by their toString representations prior + *to being compared. + */ + protected def checkAnswer( + input: DataFrame, + nodeFunction: LocalNode => LocalNode, + expectedAnswer: Seq[Row], + sortAnswers: Boolean = true): Unit = { +doCheckAnswer( + input :: Nil, + nodes => nodeFunction(nodes.head), + expectedAnswer, + sortAnswers) + } + + /** + * Runs the LocalNode and makes sure the answer matches the expected result. + * @param left the left input data to be used. + * @param right the right input data to be used. + * @param nodeFunction a function which accepts the input LocalNode and uses it to instantiate + * the local physical operator that's being tested. + * @param expectedAnswer the expected result in a [[Seq]] of [[Row]]s. + * @param sortAnswers if true, the answers will be sorted by their toString representations prior + *to being compared. + */ + protected def checkAnswer2( --- End diff -- It needs to be `checkAnswer2` because there is a default parameter `sortAnswers` and it cannot work with `overload`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10326] [yarn] Fix app submission on win...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8493#issuecomment-135601503 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10326] [yarn] Fix app submission on win...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8493#issuecomment-135601511 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10326] [yarn] Fix app submission on win...
GitHub user vanzin opened a pull request: https://github.com/apache/spark/pull/8493 [SPARK-10326] [yarn] Fix app submission on windows. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vanzin/spark SPARK-10326 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8493.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8493 commit a14dba5233526f844a68d77c5d765d98b0534e2a Author: Marcelo Vanzin Date: 2015-08-28T01:38:41Z [SPARK-10326] [yarn] Fix app submission on windows. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10049][SPARKR] Support collecting data ...
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/8458#issuecomment-135601195 @davies , @shivaram , Could you help to review it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/8464#discussion_r38164639 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/local/LimitNode.scala --- @@ -0,0 +1,45 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql.execution.local + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Attribute + + +case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode { + + private[this] var count = 0 + + override def output: Seq[Attribute] = child.output + + override def open(): Unit = child.open() --- End diff -- LocalNode cannot be reused, just like Iterator. So it's not necessary to reset it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8057][Core]Call TaskAttemptContext.getT...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/6599#issuecomment-135598700 > I think that we should also backport this to branch-1.4. +1 since we fix it in 1.5.0. Just confirmed this one didn't have conflicts with branch-1.4. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8343#issuecomment-135598617 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41715/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8343#issuecomment-135598611 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8343#issuecomment-135598213 [Test build #41715 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41715/console) for PR 8343 at commit [`472c767`](https://github.com/apache/spark/commit/472c76714c25b909e281d8079b7ead6c152d4512). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8343#issuecomment-135596210 [Test build #41715 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41715/consoleFull) for PR 8343 at commit [`472c767`](https://github.com/apache/spark/commit/472c76714c25b909e281d8079b7ead6c152d4512). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8343#issuecomment-135595941 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8343#issuecomment-135595930 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4980] [MLlib] Add decay factors to stre...
Github user rotationsymmetry commented on a diff in the pull request: https://github.com/apache/spark/pull/8022#discussion_r38163035 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingDecay.scala --- @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.regression + +import org.apache.spark.Logging +import org.apache.spark.annotation.Experimental + +/** + * :: Experimental :: + * Supplies an interface for the discount value in + * the forgetful update rule in StreamingLinearAlgorithm. + * Actual implementation is provided in StreamingDecaySetter[T]. + */ +@Experimental +trait StreamingDecay { + /** + * Derive the discount factor. + * + * @param numNewDataPoints number of data points for the RDD arriving at time t. + * @return Discount factor + */ + def getDiscount(numNewDataPoints: Long): Double +} + +/** + * :: Experimental :: + * StreamingDecaySetter provides the concrete implementation + * of getDiscount in StreamingDecay and setters for decay factor + * and half-life. + */ +@Experimental +private[mllib] trait StreamingDecaySetter[T] extends Logging { + self: T => + private var decayFactor: Double = 0 + private var timeUnit: String = StreamingDecay.BATCHES + + /** + * Set the decay factor for the forgetful algorithms. + * The decay factor should be between 0 and 1, inclusive. + * decayFactor = 0: only the data from the most recent RDD will be used. + * decayFactor = 1: all data since the beginning of the DStream will be used. + * decayFactor is default to zero. + * + * @param decayFactor the decay factor + */ + def setDecayFactor(decayFactor: Double): T = { +this.decayFactor = decayFactor +this + } + + + /** + * Set the half life and time unit ("batches" or "points") for the forgetful algorithm. + * The half life along with the time unit provides an alternative way to specify decay factor. + * The decay factor is calculated such that, for data acquired at time t, + * its contribution by time t + halfLife will have dropped to 0.5. + * The unit of time can be specified either as batches or points; + * see StreamingDecay companion object. + * + * @param halfLife the half life + * @param timeUnit the time unit + */ + def setHalfLife(halfLife: Double, timeUnit: String): T = { +if (timeUnit != StreamingDecay.BATCHES && timeUnit != StreamingDecay.POINTS) { + throw new IllegalArgumentException("Invalid time unit for decay: " + timeUnit) +} +this.decayFactor = math.exp(math.log(0.5) / halfLife) +logInfo("Setting decay factor to: %g ".format (this.decayFactor)) +this.timeUnit = timeUnit +this + } + + /** + * Derive the discount factor. + * + * @param numNewDataPoints number of data points for the RDD arriving at time t. + * @return Discount factor + */ + def getDiscount(numNewDataPoints: Long): Double = timeUnit match { +case StreamingDecay.BATCHES => decayFactor +case StreamingDecay.POINTS => math.pow(decayFactor, numNewDataPoints) + } +} + +/** + * :: Experimental :: + * Provides the String constants for allowed time unit in the forgetful algorithm. + */ +@Experimental +object StreamingDecay { + /** + * Each RDD in the DStream will be treated as 1 time unit. + * + */ + final val BATCHES = "batches" --- End diff -- I am all for this approach because if offers much higher type safety and the IDE goodies. The reason I have the `String` implementation is to follow [StreamingKMeans](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.clustering.StreamingKMeans). Shall we consolidate StreamingKMeans to use case object? If so, I will open a JIRA for that.
[GitHub] spark pull request: [SPARK-4980] [MLlib] Add decay factors to stre...
Github user rotationsymmetry commented on a diff in the pull request: https://github.com/apache/spark/pull/8022#discussion_r38162690 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingLinearRegressionWithSGD.scala --- @@ -47,6 +52,7 @@ class StreamingLinearRegressionWithSGD private[mllib] ( private var numIterations: Int, private var miniBatchFraction: Double) extends StreamingLinearAlgorithm[LinearRegressionModel, LinearRegressionWithSGD] + with StreamingDecaySetter[StreamingLinearRegressionWithSGD] --- End diff -- Thanks for the suggestions. I agree we can consolidate `StreamingDecay` and `StreamingDecaySetter` into one class. Regarding the implementation. If we do the following ``` scala trait StreamingDecay[T] { self: T => def setX: T = this } class StreamingLinearAlgorithm extends StreamingDecay[StreamingLinearAlgorithm] class StreamingLinearRegressionWithSGD extends StreamingLinearAlgorithm val s = new StreamingLinearRegressionWithSGD() ``` Then the return type of `s.setX` will be `StreamingLinearAlgorithm` since this is what the generic `T` refers to. So I propose we override the `setX` method to get the correct type. ``` scala trait StreamingDecay { def setX: this.type = this } class StreamingLinearAlgorithm extends StreamingDecay class StreamingLinearRegressionWithSGD extends StreamingLinearAlgorithm { override def setX: this.type = { super.setX this } } val s = new StreamingLinearRegressionWithSGD() ``` I have run this proposed implementation and the code is working. Shall we proceed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8484#issuecomment-135592006 [Test build #41714 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41714/consoleFull) for PR 8484 at commit [`35371fb`](https://github.com/apache/spark/commit/35371fb629217ee27ccda451c931d04137c05f93). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10323] [SQL] fix nullability of In/InSe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8492#issuecomment-135591739 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10323] [SQL] fix nullability of In/InSe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8492#issuecomment-135591740 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41712/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10323] [SQL] fix nullability of In/InSe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8492#issuecomment-135591715 [Test build #41712 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41712/console) for PR 8492 at commit [`c3c65f8`](https://github.com/apache/spark/commit/c3c65f864d2c39dc9bebd652cc009cfe56790c90). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9905][ML][Doc] Adds LinearRegressionSum...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8491#issuecomment-135591495 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41713/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9905][ML][Doc] Adds LinearRegressionSum...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8491#issuecomment-135591494 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8484#issuecomment-135591410 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9905][ML][Doc] Adds LinearRegressionSum...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8491#issuecomment-135591430 [Test build #41713 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41713/console) for PR 8491 at commit [`4e5aaeb`](https://github.com/apache/spark/commit/4e5aaebcbef92287887906071637cf65407c85c9). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class LinearRegressionWithElasticNetExample ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8484#issuecomment-135591401 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/8484#issuecomment-135591299 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9905][ML][Doc] Adds LinearRegressionSum...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8491#issuecomment-135589491 [Test build #41713 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41713/consoleFull) for PR 8491 at commit [`4e5aaeb`](https://github.com/apache/spark/commit/4e5aaebcbef92287887906071637cf65407c85c9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10323] [SQL] fix nullability of In/InSe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8492#issuecomment-135589035 [Test build #41712 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41712/consoleFull) for PR 8492 at commit [`c3c65f8`](https://github.com/apache/spark/commit/c3c65f864d2c39dc9bebd652cc009cfe56790c90). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10323] [SQL] fix nullability of In/InSe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8492#issuecomment-135588705 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10323] [SQL] fix nullability of In/InSe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8492#issuecomment-135588718 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9905][ML][Doc] Adds LinearRegressionSum...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8491#issuecomment-135588731 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9905][ML][Doc] Adds LinearRegressionSum...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8491#issuecomment-135588711 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9905] Adds LinearRegressionSummary user...
GitHub user feynmanliang opened a pull request: https://github.com/apache/spark/pull/8491 [SPARK-9905] Adds LinearRegressionSummary user guide * Adds user guide for LinearRegressionSummary * Fixes unresolved issues in #8197 You can merge this pull request into a Git repository by running: $ git pull https://github.com/feynmanliang/spark SPARK-9905 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8491.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8491 commit c5d731e5c5c5da24840ddd7394e075ea4c17128c Author: Feynman Liang Date: 2015-08-27T23:29:11Z Cleans up Manoj's work commit 4e5aaebcbef92287887906071637cf65407c85c9 Author: Feynman Liang Date: 2015-08-28T00:03:04Z Adds LinearRegressionSummary docs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10323] [SQL] fix nullability of In/InSe...
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/8492 [SPARK-10323] [SQL] fix nullability of In/InSet/ArrayContain After this PR, In/InSet/ArrayContain will return null if value is null, instead of false. They also will return null even if there is a null in the set/array. You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark fix_in Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8492.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8492 commit c3c65f864d2c39dc9bebd652cc009cfe56790c90 Author: Davies Liu Date: 2015-08-28T00:02:16Z fix nullability of In/InSet/ArrayContain --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10323] [SQL] fix nullability of In/InSe...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/8492#issuecomment-135588372 cc @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10321] sizeInBytes in HadoopFsRelation
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8490 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org