[GitHub] spark pull request: [SPARK-5604] remove checkpointDir from LDA
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/4390 [SPARK-5604] remove checkpointDir from LDA `checkpointDir` is a Spark global configuration. Users should set it outside LDA. This PR also hides some methods under `private[clustering] object LDA`, so they don't show up in the generated Java doc (SPARK-5610). You can merge this pull request into a Git repository by running: $ git pull https://github.com/mengxr/spark SPARK-5604 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4390.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4390 commit a34bb3995101875601bac89bc3f22c43ff4b2af2 Author: Xiangrui Meng m...@databricks.com Date: 2015-02-05T08:15:37Z remove checkpointDir from LDA --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5278][SQL] Introduce UnresolvedGetField...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4068#issuecomment-73009841 [Test build #26833 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26833/consoleFull) for PR 4068 at commit [`085619c`](https://github.com/apache/spark/commit/085619cdaed55fdaac8ad0d55a077c074d3a656b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][HiveConsole][DOC] HiveConsole `correct h...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4389#issuecomment-73010186 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26830/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][HiveConsole][DOC] HiveConsole `correct h...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4389#issuecomment-73010179 [Test build #26830 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26830/consoleFull) for PR 4389 at commit [`843eed9`](https://github.com/apache/spark/commit/843eed951569d6745c2bad549587011bbc08173d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5068] [SQL] Fix bug query data when pat...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/4356#discussion_r24148995 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -207,13 +217,22 @@ class HadoopTableReader( * If `filterOpt` is defined, then it will be used to filter files from `path`. These files are * returned in a single, comma-separated string. */ - private def applyFilterIfNeeded(path: Path, filterOpt: Option[PathFilter]): String = { -filterOpt match { - case Some(filter) = -val fs = path.getFileSystem(sc.hiveconf) -val filteredFiles = fs.listStatus(path, filter).map(_.getPath.toString) -filteredFiles.mkString(,) - case None = path.toString + private def applyFilterIfNeeded(path: Path, filterOpt: Option[PathFilter]): Option[String] = { +val fs = path.getFileSystem(sc.hiveconf) +if (fs.exists(path)) { --- End diff -- My concern is similar to what @marmbrus mentioned in #3981. It's pretty expensive to check each path in serial for tables with lots of partitions. Especially when the data reside on S3. Can we use `listStatus` or `globStatus` to retrieve all `FileStatus` objects under some path(s), and then do the filtering locally? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5460][MLlib] Wrapped `Try` around `dele...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4347#issuecomment-73008387 [Test build #26832 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26832/consoleFull) for PR 4347 at commit [`cdd3fa2`](https://github.com/apache/spark/commit/cdd3fa2d02bde1ebaccc5543c00eab393a5f178b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5460][MLlib] Wrapped `Try` around `dele...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4347#issuecomment-73009123 [Test build #26832 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26832/consoleFull) for PR 4347 at commit [`cdd3fa2`](https://github.com/apache/spark/commit/cdd3fa2d02bde1ebaccc5543c00eab393a5f178b). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5460][MLlib] Wrapped `Try` around `dele...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4347#issuecomment-73009125 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26832/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5604[MLLIB] remove checkpointDir from L...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4390#issuecomment-73010318 [Test build #26834 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26834/consoleFull) for PR 4390 at commit [`a34bb39`](https://github.com/apache/spark/commit/a34bb3995101875601bac89bc3f22c43ff4b2af2). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4384#issuecomment-73180459 [Test build #26888 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26888/consoleFull) for PR 4384 at commit [`bb65232`](https://github.com/apache/spark/commit/bb65232c008d66c7895e83e9736353881b5d719e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4384#issuecomment-73180462 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26888/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4216#issuecomment-73182193 [Test build #26892 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26892/consoleFull) for PR 4216 at commit [`d2b1ef8`](https://github.com/apache/spark/commit/d2b1ef84edf1ccc869acba8394bd91286a38d5fc). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class MasterStateResponse(` * `class LocalSparkCluster(` * ` * (4) the main class for the child` * ` case class BoundPortsResponse(actorPort: Int, webUIPort: Int, restPort: Option[Int])` * ` throw new SubmitRestMissingFieldException(Main class is missing.)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4874] [CORE] Collect record count metri...
Github user ksakellis commented on the pull request: https://github.com/apache/spark/pull/4067#issuecomment-73182058 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4384#issuecomment-73182394 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26891/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5643][SQL] Add a show method to print t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4416#issuecomment-73182404 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26890/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4384#issuecomment-73182388 [Test build #26891 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26891/consoleFull) for PR 4384 at commit [`e4abf69`](https://github.com/apache/spark/commit/e4abf69b63bb6bfa94823bcefd27bcbe821b1f2e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5643][SQL] Add a show method to print t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4416#issuecomment-73182398 [Test build #26890 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26890/consoleFull) for PR 4416 at commit [`1a04d8b`](https://github.com/apache/spark/commit/1a04d8bb41532f30303ad12c6610793a0ffd994f). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4216#issuecomment-73183285 @andrewor14 okay I think this time you are causing the test failure :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5595][SPARK-5603][SQL] Add a rule to do...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4373#issuecomment-73183514 [Test build #26893 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26893/consoleFull) for PR 4373 at commit [`08237a7`](https://github.com/apache/spark/commit/08237a7bb6703580645db000fb29a69a72531dc5). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5595][SPARK-5603][SQL] Add a rule to do...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4373#issuecomment-73183518 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26893/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4874] [CORE] Collect record count metri...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4067#discussion_r24223212 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala --- @@ -193,12 +194,11 @@ private[spark] class DiskBlockObjectWriter( } objOut.writeObject(value) +numRecordsWritten += 1 --- End diff -- What about just adding a class level comment that it can't be used after it is closed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4874] [CORE] Collect record count metri...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4067#discussion_r24223198 --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala --- @@ -358,5 +374,12 @@ class ShuffleWriteMetrics extends Serializable { private[spark] def incShuffleWriteTime(value: Long) = _shuffleWriteTime += value private[spark] def decShuffleWriteTime(value: Long) = _shuffleWriteTime -= value - + /** + * Total number of records written to the shuffle by this task + */ + @volatile private var _recordsWritten: Long = _ --- End diff -- @ksakellis any thoughts on this one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3249#issuecomment-73187723 LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5563][mllib] online lda initial checkin
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4419#issuecomment-73188654 [Test build #26901 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26901/consoleFull) for PR 4419 at commit [`f41c5ca`](https://github.com/apache/spark/commit/f41c5ca2d2bb11394882d4212fd4138ae9a972a1). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4216#issuecomment-73188670 [Test build #26902 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26902/consoleFull) for PR 4216 at commit [`b9e2a08`](https://github.com/apache/spark/commit/b9e2a08e2665ef710a7dd47dd61c7744d548b54d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4874] [CORE] Collect record count metri...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4067#discussion_r24225336 --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala --- @@ -334,6 +342,14 @@ class ShuffleReadMetrics extends Serializable { * Number of blocks fetched in this shuffle by this task (remote or local) */ def totalBlocksFetched = _remoteBlocksFetched + _localBlocksFetched + + /** + * Total number of records read from the shuffle by this task + */ + private var _recordsRead: Long = _ --- End diff -- @ksakellis mind fixing this one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Remove unused function
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/4418#issuecomment-73190743 @srowen ok. if it is useful later; we should change it like this def hasShutdownDeleteTachyonDir(file: TachyonFile): Boolean = { val absolutePath = file.getPath() shutdownDeleteTachyonPaths.synchronized { shutdownDeleteTachyonPaths.contains(absolutePath) } } --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5013] [MLlib] [WIP] Added documentation...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/4401#issuecomment-73197096 Did you follow the `docs/README.md`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5604][MLLIB] remove checkpointDir from ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4407 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4216#issuecomment-73198179 [Test build #26903 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26903/consoleFull) for PR 4216 at commit [`dfe4bd7`](https://github.com/apache/spark/commit/dfe4bd77714f27fc15b8ccbe1f7316df3e11661e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class MasterStateResponse(` * `class LocalSparkCluster(` * ` * (4) the main class for the child` * ` case class BoundPortsResponse(actorPort: Int, webUIPort: Int, restPort: Option[Int])` * ` throw new SubmitRestMissingFieldException(Main class is missing.)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4216#issuecomment-73198185 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26903/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4789] [SPARK-4942] [SPARK-5031] [mllib]...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3637 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4216#issuecomment-73178748 [Test build #26892 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26892/consoleFull) for PR 4216 at commit [`d2b1ef8`](https://github.com/apache/spark/commit/d2b1ef84edf1ccc869acba8394bd91286a38d5fc). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...
Github user florianverhein commented on a diff in the pull request: https://github.com/apache/spark/pull/4385#discussion_r24221789 --- Diff: ec2/spark_ec2.py --- @@ -145,6 +145,14 @@ def parse_args(): default=DEFAULT_SPARK_GITHUB_REPO, help=Github repo from which to checkout supplied commit hash (default: %default)) parser.add_option( +--spark-ec2-git-repo, +default=DEFAULT_SPARK_EC2_GITHUB_REPO, +help=Github repo from which to checkout spark-ec2 (default: %default)) +parser.add_option( +--spark-ec2-branch, --- End diff -- I'll do the former so that it's consistent with the existing --spark-git-repo arg --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5498][SQL]fix bug when query the data w...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4289#discussion_r24222130 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -315,9 +335,17 @@ private[hive] object HadoopTableReader extends HiveInspectors { } } +/** + * when the soi and deserializer.getObjectInspector is equal, + * we will get `IdentityConverter`,which mean it won't convert the + * value when schema match + */ +val partTblObjectInspectorConverter = ObjectInspectorConverters.getConverter( + deserializer.getObjectInspector, soi) + // Map each tuple to a row object iterator.map { value = - val raw = deserializer.deserialize(value) + val raw = partTblObjectInspectorConverter.convert(deserializer.deserialize(value)) --- End diff -- There logic here is a little confusing for me. As we already have the `converter` here, probably we don't need to call the `getConvertedOI`, and the `soi` should be the expected output `ObjectInspector`, which supposed to be the output object inspector of from the table deserializer. right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5498][SQL]fix bug when query the data w...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4289#discussion_r24222165 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -315,9 +335,17 @@ private[hive] object HadoopTableReader extends HiveInspectors { } } +/** + * when the soi and deserializer.getObjectInspector is equal, + * we will get `IdentityConverter`,which mean it won't convert the + * value when schema match + */ +val partTblObjectInspectorConverter = ObjectInspectorConverters.getConverter( + deserializer.getObjectInspector, soi) + // Map each tuple to a row object iterator.map { value = - val raw = deserializer.deserialize(value) + val raw = partTblObjectInspectorConverter.convert(deserializer.deserialize(value)) --- End diff -- We can discuss that offline if you feel confusing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/4385#issuecomment-73185820 Thanks @florianverhein for the change - This is a pretty useful change as I often modify these variables inline for my experiments. @nchammas @JoshRosen could you take a look at the python style changes and make sure they are okay ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5642] [SQL] Apply column pruning on unu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4415#issuecomment-73185836 [Test build #26898 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26898/consoleFull) for PR 4415 at commit [`b6420cb`](https://github.com/apache/spark/commit/b6420cb1342c7d52abb4bfed5376c0cfd52bd9a2). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5563][mllib] online lda initial checkin
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4419#issuecomment-73185880 [Test build #26899 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26899/consoleFull) for PR 4419 at commit [`26dca1b`](https://github.com/apache/spark/commit/26dca1bddd98203e90e3cb36de4f3d16fbfbf6cc). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5563][mllib] online lda initial checkin
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4419#issuecomment-73185833 [Test build #26899 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26899/consoleFull) for PR 4419 at commit [`26dca1b`](https://github.com/apache/spark/commit/26dca1bddd98203e90e3cb36de4f3d16fbfbf6cc). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3249#issuecomment-73187137 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26897/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3249#issuecomment-73187135 [Test build #26897 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26897/consoleFull) for PR 3249 at commit [`eb0a13b`](https://github.com/apache/spark/commit/eb0a13b2d2fb87b04899c05b62ce82c237dff750). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class SubqueryExpression(subquery: LogicalPlan) extends Expression ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4874] [CORE] Collect record count metri...
Github user ksakellis commented on a diff in the pull request: https://github.com/apache/spark/pull/4067#discussion_r24223586 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala --- @@ -193,12 +194,11 @@ private[spark] class DiskBlockObjectWriter( } objOut.writeObject(value) +numRecordsWritten += 1 --- End diff -- Yes, i guess that is the least we can do. Having an explicit check i think would be better. So if we are okay with it, i can add a boolean that tracks if the blockwriter has been opened before and if so, don't allow it to be reopened. Thoughts? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2450 Adds executor log links to Web UI
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3486#issuecomment-73185071 [Test build #26894 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26894/consoleFull) for PR 3486 at commit [`d190936`](https://github.com/apache/spark/commit/d190936f753ff66586f9aa3cc522fd9c5ba4a321). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class RegisterExecutor(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] [WIP] Multiple thriftserver...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-73186905 @mallman this PR exactly aims to fix the bug you mentioned, and it passed the tested in my local machine. However, I am still figuring out some of the unit testing failures, hopefully I can update the title by removing the WIP soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5636] Ramp up faster in dynamic allocat...
Github user ksakellis commented on a diff in the pull request: https://github.com/apache/spark/pull/4409#discussion_r24224468 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -78,7 +78,7 @@ private[spark] class ExecutorAllocationManager( // How long there must be backlogged tasks for before an addition is triggered --- End diff -- nit: Maybe mention in the comment that this is measured in seconds - not clear from the config name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Remove unused function
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4418#issuecomment-73187803 These do appear unused, at the moment, but what's the need to delete them? They could plausibly be useful later; it's not completely useless code. (Normally changes need a JIRA too, although this is borderline) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4216#issuecomment-73189047 [Test build #26903 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26903/consoleFull) for PR 4216 at commit [`dfe4bd7`](https://github.com/apache/spark/commit/dfe4bd77714f27fc15b8ccbe1f7316df3e11661e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX] [SQL] Disables Metastore Parquet tabl...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4413#issuecomment-73177683 [Test build #26882 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26882/consoleFull) for PR 4413 at commit [`5291289`](https://github.com/apache/spark/commit/5291289ecc28016015f678f89091bfd0f4c38e49). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `protected class CaseInsensitiveMap(map: Map[String, String]) extends Map[String, String]` * `trait CreatableRelationProvider ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5155] [PySpark]
GitHub user lazyman500 opened a pull request: https://github.com/apache/spark/pull/4417 [SPARK-5155] [PySpark] add examples for PySpark You can merge this pull request into a Git repository by running: $ git pull https://github.com/lazyman500/spark SPARK-5616 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4417.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4417 commit f7f7f2249b8b0adf0a3671c5ee94a609c80d5cb0 Author: lazyman lazyman...@gmail.com Date: 2015-02-06T03:30:32Z 1.add boardcast example for PySpark 2.add module example for PySpark --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: remove unused function
GitHub user viper-kun opened a pull request: https://github.com/apache/spark/pull/4418 remove unused function hasShutdownDeleteTachyonDir(file: TachyonFile) should use shutdownDeleteTachyonPaths(not shutdownDeletePaths) to determine Whether contain file. To solve it ,delete two unused function. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viper-kun/spark deleteunusedfun Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4418.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4418 commit 2bc397edf3793cc8f3e13b1d7a3f31efb7e8f9c2 Author: xukun 00228947 xukun...@huawei.com Date: 2015-02-06T03:26:34Z deleteunusedfun --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5639][SQL] Support DataFrame.renameColu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4410#issuecomment-73178269 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26886/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2450 Adds executor log links to Web UI
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3486#issuecomment-73178283 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26883/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5498][SQL]fix bug when query the data w...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4289#discussion_r24221959 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -264,15 +268,31 @@ private[hive] object HadoopTableReader extends HiveInspectors { * @param nonPartitionKeyAttrs Attributes that should be filled together with their corresponding * positions in the output schema * @param mutableRow A reusable `MutableRow` that should be filled + * @param convertdeserializer The `Deserializer` covert the `deserializer` * @return An `Iterator[Row]` transformed from `iterator` */ def fillObject( iterator: Iterator[Writable], deserializer: Deserializer, nonPartitionKeyAttrs: Seq[(Attribute, Int)], - mutableRow: MutableRow): Iterator[Row] = { + mutableRow: MutableRow, + convertdeserializer: Option[Deserializer] = None): Iterator[Row] = { --- End diff -- Change the `convertdeserializer` to `outputStructObjectInspector`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...
Github user florianverhein commented on a diff in the pull request: https://github.com/apache/spark/pull/4385#discussion_r24224925 --- Diff: ec2/spark_ec2.py --- @@ -1007,6 +1022,14 @@ def real_main(): print stderr, ebs-vol-num cannot be greater than 8 sys.exit(1) +# Limit naming to avoid breaking things, as we rely on the repo --- End diff -- Sorry, misread/understood. Will change and test. May as well keep the CLI check, but use it to ensure there's no trailing / or .git. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4874] [CORE] Collect record count metri...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4067#issuecomment-73196678 [Test build #26906 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26906/consoleFull) for PR 4067 at commit [`bd919be`](https://github.com/apache/spark/commit/bd919be5817e29dad476213a0b3b407d28ee0f24). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4789] [SPARK-4942] [SPARK-5031] [mllib]...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3637#issuecomment-73198344 LGTM. Merged into master and branch-1.3. Thanks everyone for the discussion! @jkbradley We can remove mima excludes in another PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4789] [SPARK-4942] [SPARK-5031] [mllib]...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3637#issuecomment-73177838 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26885/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5643][SQL] Add a show method to print t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4416#issuecomment-73177847 [Test build #26890 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26890/consoleFull) for PR 4416 at commit [`1a04d8b`](https://github.com/apache/spark/commit/1a04d8bb41532f30303ad12c6610793a0ffd994f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4789] [SPARK-4942] [SPARK-5031] [mllib]...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3637#issuecomment-73177831 [Test build #26885 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26885/consoleFull) for PR 3637 at commit [`405bfb8`](https://github.com/apache/spark/commit/405bfb8e4e54f1dd2619c1d9d35698aa9ab8efc3). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class JavaDeveloperApiExample ` * `// where index i corresponds to class i (i = 0, 1).` * ` * Here, we have a trait to be mixed in with the Estimator and Model (MyLogisticRegression` * ` * class since the maxIter parameter is only used during training (not in the Model).` * `// where index i corresponds to class i (i = 0, 1).` * `class DoubleParam(parent: Params, name: String, doc: String, defaultValue: Option[Double])` * `class IntParam(parent: Params, name: String, doc: String, defaultValue: Option[Int])` * `class FloatParam(parent: Params, name: String, doc: String, defaultValue: Option[Float])` * `class LongParam(parent: Params, name: String, doc: String, defaultValue: Option[Long])` * `class BooleanParam(parent: Params, name: String, doc: String, defaultValue: Option[Boolean])` * `new Param(this, probabilityCol, column name for predicted class conditional probabilities,` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5616] [PySpark]
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4417#issuecomment-73178429 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: remove unused function
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4418#issuecomment-73178427 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5563][mllib] online lda initial checkin
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4419#issuecomment-73181007 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26895/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5563][mllib] online lda initial checkin
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4419#issuecomment-73181005 [Test build #26895 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26895/consoleFull) for PR 4419 at commit [`d640d9c`](https://github.com/apache/spark/commit/d640d9c58cd4f3caa6eac462b947b3a891dabbda). * This patch **fails Scala style tests**. * This patch **does not merge cleanly**. * This patch adds the following public classes _(experimental)_: * ` class OnlineLDAOptimizer(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4763] All-pairs shortest paths algorith...
Github user deeppradhan commented on the pull request: https://github.com/apache/spark/pull/3619#issuecomment-73182475 Is this for undirected graphs or directed graphs. I ran this for directed graphs and my answers are not matching. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4874] [CORE] Collect record count metri...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4067#issuecomment-73186608 [Test build #26896 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26896/consoleFull) for PR 4067 at commit [`e156560`](https://github.com/apache/spark/commit/e1565607622a118cf7da2f00379749141e927a73). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4874] [CORE] Collect record count metri...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4067#issuecomment-73186614 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26896/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4808] Configurable spillable memory thr...
Github user mingyukim commented on the pull request: https://github.com/apache/spark/pull/4420#issuecomment-73187536 This is following up @andrewor14's comments on #3656. It makes the threshold and frequency configurable rather than completely removing them. Please let me know if I should add documentation for these configurations as well! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5213] [SQL] Pluggable SQL Parser Suppor...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4015#issuecomment-73192145 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26900/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5604][MLLIB] remove checkpointDir from ...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/4407#issuecomment-73197053 Merged into master and branch-1.3. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5628] Add version option to spark-ec2
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4414#issuecomment-73179443 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26887/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5628] Add version option to spark-ec2
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4414#issuecomment-73179437 [Test build #26887 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26887/consoleFull) for PR 4414 at commit [`914cab5`](https://github.com/apache/spark/commit/914cab57940cd11c4885ee29cb3f92f9ea2ee27f). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4808] Configurable spillable memory thr...
GitHub user mingyukim opened a pull request: https://github.com/apache/spark/pull/4420 [SPARK-4808] Configurable spillable memory threshold + sampling rate In the general case, Spillable's heuristic of checking for memory stress on every 32nd item after 1000 items are read is good enough. In general, we do not want to be enacting the spilling checks until later on in the job; checking for disk-spilling too early can produce unacceptable performance impact in trivial cases. However, there are non-trivial cases, particularly if each serialized object is large, where checking for the necessity to spill too late would allow the memory to overflow. Consider if every item is 1.5 MB in size, and the heap size is 1000 MB. Then clearly if we only try to spill the in-memory contents to disk after 1000 items are read, we would have already accumulated 1500 MB of RAM and overflowed the heap. Patch #3656 attempted to circumvent this by checking the need to spill on every single item read, but that would cause unacceptable performance in the general case. However, the convoluted cases above should not be forced to be refactored to shrink the data items. Therefore it makes sense that the memory spilling thresholds be configurable. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mccheah/spark memory-spill-configurable Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4420.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4420 commit 84afd105a0f27ef92c7034cbea97b74ea286232c Author: mcheah mch...@palantir.com Date: 2015-02-05T15:02:11Z [SPARK-4808] Configurable spillable memory threshold + sampling rate In the general case, Spillable's heuristic of checking for memory stress on every 32nd item after 1000 items are read is good enough. In general, we do not want to be enacting the spilling checks until later on in the job; checking for disk-spilling too early can produce unacceptable performance impact in trivial cases. However, there are non-trivial cases, particularly if each serialized object is large, where checking for the necessity to spill too late would allow the memory to overflow. Consider if every item is 1.5 MB in size, and the heap size is 1000 MB. Then clearly if we only try to spill the in-memory contents to disk after 1000 items are read, we would have already accumulated 1500 MB of RAM and overflowed the heap. Patch #3656 attempted to circumvent this by checking the need to spill on every single item read, but that would cause unacceptable performance in the general case. However, the convoluted cases above should not be forced to be refactored to shrink the data items. Therefore it makes sense that the memory spilling thresholds be configurable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4874] [CORE] Collect record count metri...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4067#discussion_r24224638 --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala --- @@ -358,5 +374,12 @@ class ShuffleWriteMetrics extends Serializable { private[spark] def incShuffleWriteTime(value: Long) = _shuffleWriteTime += value private[spark] def decShuffleWriteTime(value: Long) = _shuffleWriteTime -= value - + /** + * Total number of records written to the shuffle by this task + */ + @volatile private var _recordsWritten: Long = _ --- End diff -- it is a bit redundant, but many other fields are already named as such, so figured for consistency it was best. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4874] [CORE] Collect record count metri...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4067#discussion_r24224673 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala --- @@ -472,12 +512,12 @@ private[ui] class StagePage(parent: StagesTab) extends WebUIPage(stage) { }} {if (hasInput) { td sorttable_customkey={inputSortable} -{inputReadable} +{s$inputReadable / $inputRecords} --- End diff -- Yeah, that was my thought. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4874] [CORE] Collect record count metri...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4067#discussion_r24224655 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala --- @@ -193,12 +194,11 @@ private[spark] class DiskBlockObjectWriter( } objOut.writeObject(value) +numRecordsWritten += 1 --- End diff -- Sure, you can enforce that it is never re-opened if you want. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4874] [CORE] Collect record count metri...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4067#discussion_r24225268 --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/BlockStoreShuffleFetcher.scala --- @@ -25,7 +25,7 @@ import org.apache.spark._ import org.apache.spark.serializer.Serializer import org.apache.spark.shuffle.FetchFailedException import org.apache.spark.storage.{BlockId, BlockManagerId, ShuffleBlockFetcherIterator, ShuffleBlockId} -import org.apache.spark.util.CompletionIterator +import org.apache.spark.util.{CompletionIterator} --- End diff -- you don't need braces here if it is a single import. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Delete tmp dir when sc is stop
Github user Sephiroth-Lin commented on the pull request: https://github.com/apache/spark/pull/4412#issuecomment-73190453 @srowen we run a process as a service which will not stop. In this service process we will create SparkContext and run job and then stop it, because we only call sc.stop but not exit this service process so the tmp dirs created by HttpFileServer and SparkEnv will not be deleted after SparkContext is stopped, and this will lead to creating too many tmp dirs if we create many SparkContext to run job in this service process. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5642] [SQL] Apply column pruning on unu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4415#issuecomment-73190540 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26898/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5642] [SQL] Apply column pruning on unu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4415#issuecomment-73190533 [Test build #26898 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26898/consoleFull) for PR 4415 at commit [`b6420cb`](https://github.com/apache/spark/commit/b6420cb1342c7d52abb4bfed5376c0cfd52bd9a2). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5642] [SQL] Apply column pruning on unu...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4415#issuecomment-73192625 Is it possible to add a unit test? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5642] [SQL] Apply column pruning on unu...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4415#issuecomment-73192723 (I understand unit test coverage for the optimizer is pretty low - but that would be great to change increase) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5324][SQL] Results of describe can't be...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4249#issuecomment-73195127 [Test build #26905 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26905/consoleFull) for PR 4249 at commit [`6fee13d`](https://github.com/apache/spark/commit/6fee13d3ecd70bab34c13514145a93d680947c09). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5213] [SQL] Pluggable SQL Parser Suppor...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4015#issuecomment-73195076 @marmbrus can you review the code for me? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/4216#issuecomment-73178600 Thanks @JoshRosen good to know I'm not causing them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4874] [CORE] Collect record count metri...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4067#issuecomment-73182187 [Test build #26896 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26896/consoleFull) for PR 4067 at commit [`e156560`](https://github.com/apache/spark/commit/e1565607622a118cf7da2f00379749141e927a73). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4216#issuecomment-73182199 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26892/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...
Github user florianverhein commented on a diff in the pull request: https://github.com/apache/spark/pull/4385#discussion_r24222907 --- Diff: ec2/spark_ec2.py --- @@ -643,12 +654,14 @@ def setup_cluster(conn, master_nodes, slave_nodes, opts, deploy_ssh_key): # NOTE: We should clone the repository before running deploy_files to # prevent ec2-variables.sh from being overwritten +repo_branch={r} -b {b}.format(r=opts.spark_ec2_git_repo, b=opts.spark_ec2_branch) +print Cloning spark-ec2 scripts from {rb} on masterformat(rb=repo_branch) ssh( host=master, opts=opts, command=rm -rf spark-ec2 + -+ git clone https://github.com/mesos/spark-ec2.git -b {b}.format(b=MESOS_SPARK_EC2_BRANCH) ++ git clone {rb}.format(rb=repo_branch) --- End diff -- Good point. I'll enforce this by checking the CLI option. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4216#issuecomment-73184662 I did a quick pass, this is looking good, but there are some comments on the JIRA worth addressing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4808] Configurable spillable memory thr...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4420#issuecomment-73187302 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5639][SQL] Support DataFrame.renameColu...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4410 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5213] [SQL] Pluggable SQL Parser Suppor...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4015#issuecomment-73192132 [Test build #26900 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26900/consoleFull) for PR 4015 at commit [`7656776`](https://github.com/apache/spark/commit/76567768bcbd7bc27cd771f79d9975a6cf027cf9). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class SQLDialect ` * `class DefaultSQLDialect extends SQLDialect ` * `class HiveQLDialect extends SQLDialect ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5563][mllib] online lda initial checkin
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4419#issuecomment-73197604 [Test build #26901 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26901/consoleFull) for PR 4419 at commit [`f41c5ca`](https://github.com/apache/spark/commit/f41c5ca2d2bb11394882d4212fd4138ae9a972a1). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3249#issuecomment-73182749 [Test build #26897 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26897/consoleFull) for PR 3249 at commit [`eb0a13b`](https://github.com/apache/spark/commit/eb0a13b2d2fb87b04899c05b62ce82c237dff750). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4216#discussion_r24222835 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -56,8 +55,16 @@ private[spark] class SparkSubmitArguments(args: Seq[String], env: Map[String, St var verbose: Boolean = false var isPython: Boolean = false var pyFiles: String = null + var action: SparkSubmitAction = null val sparkProperties: HashMap[String, String] = new HashMap[String, String]() + // Standalone cluster mode only + var useRest: Boolean = true --- End diff -- this is not actually settable by the user, so for this one it might be good to indicate it's used only internally --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...
Github user florianverhein commented on a diff in the pull request: https://github.com/apache/spark/pull/4385#discussion_r24222849 --- Diff: ec2/spark_ec2.py --- @@ -643,12 +654,14 @@ def setup_cluster(conn, master_nodes, slave_nodes, opts, deploy_ssh_key): # NOTE: We should clone the repository before running deploy_files to # prevent ec2-variables.sh from being overwritten +repo_branch={r} -b {b}.format(r=opts.spark_ec2_git_repo, b=opts.spark_ec2_branch) +print Cloning spark-ec2 scripts from {rb} on masterformat(rb=repo_branch) --- End diff -- missed that, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2450 Adds executor log links to Web UI
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3486#issuecomment-73185077 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26894/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...
Github user florianverhein commented on the pull request: https://github.com/apache/spark/pull/4385#issuecomment-73185215 Thanks for prompt feedback @nchammas. Much appreciated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5213] [SQL] Pluggable SQL Parser Suppor...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4015#issuecomment-73186446 [Test build #26900 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26900/consoleFull) for PR 4015 at commit [`7656776`](https://github.com/apache/spark/commit/76567768bcbd7bc27cd771f79d9975a6cf027cf9). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5557: Explicitly include servlet API in ...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4411#issuecomment-73186449 Ok but isn't it more straightforward to at last depend on the real servlet API artifact? This is just Jetty's random copy. Maybe just fine or necessary for a reason I miss. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org