[GitHub] spark pull request: [SQL] Implement Describe Table for SQLContext
Github user yanbohappy closed the pull request at: https://github.com/apache/spark/pull/4207 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Implement Describe Table for SQLContext
Github user yanbohappy commented on the pull request: https://github.com/apache/spark/pull/4207#issuecomment-71794484 @OopsOutOfMemory Since you have go deep into this issue and I agree your PR is more mature. So close this one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/3732#discussion_r23670603 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala --- @@ -252,7 +252,7 @@ trait Row extends Serializable { * * @throws ClassCastException when data type does not match. */ - def getDate(i: Int): java.sql.Date = apply(i).asInstanceOf[java.sql.Date] + def getDate(i: Int): java.sql.Date = DateUtils.toJavaDate(getInt(i)) --- End diff -- Oh, sure... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/3732#discussion_r23670522 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala --- @@ -252,7 +252,7 @@ trait Row extends Serializable { * * @throws ClassCastException when data type does not match. */ - def getDate(i: Int): java.sql.Date = apply(i).asInstanceOf[java.sql.Date] + def getDate(i: Int): java.sql.Date = DateUtils.toJavaDate(getInt(i)) --- End diff -- this line should be reverted since you changed ScalaReflection.convertRowToScala right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3732#issuecomment-71794073 [Test build #26213 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26213/consoleFull) for PR 3732 at commit [`c37832b`](https://github.com/apache/spark/commit/c37832bc3a48493639b7a74d3277c11349942526). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Implement Describe Table for SQLContext
Github user yanbohappy commented on the pull request: https://github.com/apache/spark/pull/4207#issuecomment-71793953 @lianhuiwang In this PR https://github.com/apache/spark/pull/3948, CommandStrategy had been removed and command had been refactor. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3715#issuecomment-71793808 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26209/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/3732#discussion_r23670356 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala --- @@ -252,7 +252,7 @@ trait Row extends Serializable { * * @throws ClassCastException when data type does not match. */ - def getDate(i: Int): java.sql.Date = apply(i).asInstanceOf[java.sql.Date] + def getDate(i: Int): java.sql.Date = DateUtils.toJavaDate(getInt(i)) --- End diff -- Thanks, code updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3715#issuecomment-71793802 [Test build #26209 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26209/consoleFull) for PR 3715 at commit [`23b039a`](https://github.com/apache/spark/commit/23b039a896497c8f4cae1bf963274ff295841c37). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class KafkaUtils(object):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5444][Network]Add a retry to deal with ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4240#issuecomment-71793668 [Test build #26212 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26212/consoleFull) for PR 4240 at commit [`cc926d2`](https://github.com/apache/spark/commit/cc926d2d4f737dd76a9fa593c0f93b183d2ca21f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add a retry to deal with the conflict port in ...
GitHub user SaintBacchus opened a pull request: https://github.com/apache/spark/pull/4240 Add a retry to deal with the conflict port in netty server. If the `spark.blockMnager.port` had conflicted with a specific port, Spark will throw an exception and exit. So add a retry to avoid this situation. You can merge this pull request into a Git repository by running: $ git pull https://github.com/SaintBacchus/spark NettyPortConflict Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4240.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4240 commit cc926d2d4f737dd76a9fa593c0f93b183d2ca21f Author: huangzhaowei Date: 2015-01-28T06:21:27Z Add a retry to deal with the conflict port in netty server. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Don't return `ERROR 500` when have missing arg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4239#issuecomment-71792811 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Don't return `ERROR 500` when have missing arg...
GitHub user catap opened a pull request: https://github.com/apache/spark/pull/4239 Don't return `ERROR 500` when have missing args Spark web UI return `HTTP ERROR 500` when GET arguments is missing. You can merge this pull request into a Git repository by running: $ git pull https://github.com/catap/spark ui_500 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4239.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4239 commit 4faba92526e93f44c11962724180e8e201015e7a Author: Kirill A. Korinskiy Date: 2015-01-28T07:26:55Z Don't return `ERROR 500` when have missing args Spark web UI return `HTTP ERROR 500` when GET arguments is missing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/3732#discussion_r23669844 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala --- @@ -252,7 +252,7 @@ trait Row extends Serializable { * * @throws ClassCastException when data type does not match. */ - def getDate(i: Int): java.sql.Date = apply(i).asInstanceOf[java.sql.Date] + def getDate(i: Int): java.sql.Date = DateUtils.toJavaDate(getInt(i)) --- End diff -- You can change the one in ScalaReflection.convertRowToScala to make it work for both scala and java --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5395] [PySpark] fix python process leak...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4238#issuecomment-71792089 [Test build #26211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26211/consoleFull) for PR 4238 at commit [`24ed322`](https://github.com/apache/spark/commit/24ed3223f96ec8a2c93fe01f51e846b3e8d92c54). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/3732#discussion_r23669693 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala --- @@ -252,7 +252,7 @@ trait Row extends Serializable { * * @throws ClassCastException when data type does not match. */ - def getDate(i: Int): java.sql.Date = apply(i).asInstanceOf[java.sql.Date] + def getDate(i: Int): java.sql.Date = DateUtils.toJavaDate(getInt(i)) --- End diff -- Now I add the conversion in DataTypeConversion, which is only valid to java class. For scala, We need to write Row(DateUtils.fromJavaDate(...)). Is this OK with you? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5395] [PySpark] fix python process leak...
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/4238 [SPARK-5395] [PySpark] fix python process leak while coalesce() Currently, the Python process is released into pool only after the task had finished, it cause many process forked if coalesce() is called. This PR will change it to release the process as soon as read all the data from it (finish the partition), then a process could be reused to process multiple partitions in a single task. You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark py_leak Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4238.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4238 commit 24ed3223f96ec8a2c93fe01f51e846b3e8d92c54 Author: Davies Liu Date: 2015-01-28T07:21:55Z fix python process leak while coalesce() --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3732#issuecomment-71791681 [Test build #26210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26210/consoleFull) for PR 3732 at commit [`f0005b1`](https://github.com/apache/spark/commit/f0005b166a705f7b1c52960b72c4ff29d010e5ff). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4387][PySpark] Refactoring python profi...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/3901#issuecomment-71789766 @JoshRosen Should we include this in 1.3? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3715#issuecomment-71789728 [Test build #26209 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26209/consoleFull) for PR 3715 at commit [`23b039a`](https://github.com/apache/spark/commit/23b039a896497c8f4cae1bf963274ff295841c37). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/3715#discussion_r23668817 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala --- @@ -374,49 +375,63 @@ private[spark] object PythonRDD extends Logging { // The right way to implement this would be to use TypeTags to get the full // type of T. Since I don't want to introduce breaking changes throughout the // entire Spark API, I have to use this hacky approach: --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/3715#discussion_r23668764 --- Diff: make-distribution.sh --- @@ -188,6 +188,7 @@ echo "Build flags: $@" >> "$DISTDIR/RELEASE" # Copy jars cp "$SPARK_HOME"/assembly/target/scala*/*assembly*hadoop*.jar "$DISTDIR/lib/" cp "$SPARK_HOME"/examples/target/scala*/spark-examples*.jar "$DISTDIR/lib/" +cp "$SPARK_HOME"/external/kafka/scala*/*kafka*assembly*.jar "$DISTDIR/lib/" --- End diff -- The motivation for the assembly jars is to simplify the process for Python programmers, can #4215 help in this case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-71788951 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26208/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-71788946 [Test build #26208 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26208/consoleFull) for PR 3951 at commit [`6e4ead8`](https://github.com/apache/spark/commit/6e4ead88855170a13806274d3103cbc4bc2a8563). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class TreeEnsembleModel(JavaModelWrapper):` * `class DecisionTreeModel(JavaModelWrapper):` * `class RandomForestModel(TreeEnsembleModel):` * `class GradientBoostedTreesModel(TreeEnsembleModel):` * `class GradientBoostedTrees(object):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5428]: Declare the 'assembly' module at...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/4232#issuecomment-71787984 I am curious as to what is the benefit of this change ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3222#issuecomment-71787785 [Test build #26207 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26207/consoleFull) for PR 3222 at commit [`de47aaf`](https://github.com/apache/spark/commit/de47aafc5f721167d64ebc7b987b43375ef26798). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class AdaGradUpdater(` * `class DBN(val stackedRBM: StackedRBM)` * `class MLP(` * `class MomentumUpdater(val momentum: Double) extends Updater ` * `class RBM(` * `class StackedRBM(val innerRBMs: Array[RBM])` * `case class MinstItem(label: Int, data: Array[Int]) ` * `class MinstDatasetReader(labelsFile: String, imagesFile: String)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3222#issuecomment-71787790 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26207/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-71786983 [Test build #26206 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26206/consoleFull) for PR 3798 at commit [`19406cc`](https://github.com/apache/spark/commit/19406cce66672d74bd0b9c1d98cd8486c186f8ee). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class KafkaCluster(val kafkaParams: Map[String, String]) extends Serializable ` * ` case class LeaderOffset(host: String, port: Int, offset: Long)` * `class KafkaRDDPartition(` * `trait OffsetRange ` * `trait HasOffsetRanges ` * ` class DeterministicKafkaInputDStreamCheckpointData extends DStreamCheckpointData(this) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-71786989 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26206/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5291][CORE] Add timestamp and reason wh...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4082#issuecomment-71786315 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26204/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5291][CORE] Add timestamp and reason wh...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4082#issuecomment-71786313 [Test build #26204 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26204/consoleFull) for PR 4082 at commit [`a026ff2`](https://github.com/apache/spark/commit/a026ff236510c1ab242e71981102c7d0590c8dd6). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class SparkListenerExecutorAdded(time: Long, executorId: String, executorInfo: ExecutorInfo)` * `case class SparkListenerExecutorRemoved(time: Long, executorId: String, reason: String)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5188][BUILD] make-distribution.sh shoul...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3988#issuecomment-71786254 [Test build #26205 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26205/consoleFull) for PR 3988 at commit [`0f546e0`](https://github.com/apache/spark/commit/0f546e06fb8e5d4e5cf762fbc8d8cc7d11e1935f). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5188][BUILD] make-distribution.sh shoul...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3988#issuecomment-71786261 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26205/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/4068#discussion_r23667563 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -285,11 +285,22 @@ class Analyzer(catalog: Catalog, result // Resolve field names using the resolver. - case f @ GetField(child, fieldName) if !f.resolved && child.resolved => + case f @ GetField(child, fieldName) if child.resolved => child.dataType match { case StructType(fields) => -val resolvedFieldName = fields.map(_.name).find(resolver(_, fieldName)) -resolvedFieldName.map(n => f.copy(fieldName = n)).getOrElse(f) +val actualField = fields.filter(f => resolver(f.name, fieldName)) +if (actualField.length == 0) { + sys.error( +s"No such struct field $fieldName in ${fields.map(_.name).mkString(", ")}") --- End diff -- If we leave it unchanged, `CheckResolution` can't catch it. The reason is that, we need `Resolver` to check if a `GetField` is resolved, but we can't get `Resolver` inside `GetField`. Fortunately, we can catch it at runtime, as `GetField` will report error if it can't find the required field. Which way should we prefer? Leaving it unchanged or reporting error right away? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5097][SQL] DataFrame
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/4173#discussion_r23667454 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala --- @@ -0,0 +1,606 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql + +import scala.language.implicitConversions +import scala.reflect.ClassTag +import scala.collection.JavaConversions._ + +import java.util.{ArrayList, List => JList} + +import com.fasterxml.jackson.core.JsonFactory +import net.razorvine.pickle.Pickler + +import org.apache.spark.annotation.Experimental +import org.apache.spark.rdd.RDD +import org.apache.spark.api.java.JavaRDD +import org.apache.spark.api.python.SerDeUtil +import org.apache.spark.storage.StorageLevel +import org.apache.spark.sql.catalyst.ScalaReflection +import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.expressions.{Literal => LiteralExpr} +import org.apache.spark.sql.catalyst.plans.{JoinType, Inner} +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.execution.{LogicalRDD, EvaluatePython} +import org.apache.spark.sql.json.JsonRDD +import org.apache.spark.sql.types.{NumericType, StructType} +import org.apache.spark.util.Utils + + +/** + * A collection of rows that have the same columns. + * + * A [[DataFrame]] is equivalent to a relational table in Spark SQL, and can be created using + * various functions in [[SQLContext]]. + * {{{ + * val people = sqlContext.parquetFile("...") + * }}} + * + * Once created, it can be manipulated using the various domain-specific-language (DSL) functions + * defined in: [[DataFrame]] (this class), [[Column]], and [[dsl]] for Scala DSL. + * + * To select a column from the data frame, use the apply method: + * {{{ + * val ageCol = people("age") // in Scala + * Column ageCol = people.apply("age") // in Java + * }}} + * + * Note that the [[Column]] type can also be manipulated through its various functions. + * {{ + * // The following creates a new column that increases everybody's age by 10. + * people("age") + 10 // in Scala + * }} + * + * A more concrete example: + * {{{ + * // To create DataFrame using SQLContext + * val people = sqlContext.parquetFile("...") + * val department = sqlContext.parquetFile("...") + * + * people.filter("age" > 30) + * .join(department, people("deptId") === department("id")) + * .groupBy(department("name"), "gender") + * .agg(avg(people("salary")), max(people("age"))) + * }}} + */ +// TODO: Improve documentation. +class DataFrame protected[sql]( +val sqlContext: SQLContext, +private val baseLogicalPlan: LogicalPlan, +operatorsEnabled: Boolean) + extends DataFrameSpecificApi with RDDApi[Row] { + + protected[sql] def this(sqlContext: Option[SQLContext], plan: Option[LogicalPlan]) = +this(sqlContext.orNull, plan.orNull, sqlContext.isDefined && plan.isDefined) + + protected[sql] def this(sqlContext: SQLContext, plan: LogicalPlan) = this(sqlContext, plan, true) + + @transient protected[sql] lazy val queryExecution = sqlContext.executePlan(baseLogicalPlan) + + @transient protected[sql] val logicalPlan: LogicalPlan = baseLogicalPlan match { +// For various commands (like DDL) and queries with side effects, we force query optimization to +// happen right away to let these side effects take place eagerly. +case _: Command | _: InsertIntoTable | _: CreateTableAsSelect[_] |_: WriteToFile => + LogicalRDD(queryExecution.analyzed.output, queryExecution.toRdd)(sqlContext) +case _ => + baseLogicalPlan + } + + /** + * An implicit conversion function internal to this class for us to avoid doing + * "n
[GitHub] spark pull request: [SPARK-5430] move treeReduce and treeAggregate...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/4228#issuecomment-71785338 Not a very strong preference, but my take would be to keep them separate as you only want users to use `treeReduce` when they know they want an aggregation tree. Also the way `reduce` works right now is very familiar to existing users and it'll be better not to touch that or add extra options to it etc. Also thanks @mengxr for pulling this out to core. I've definitely found this useful in many other places --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-5406][MLlib] LocalLAPACK mode in RowMat...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/4200#issuecomment-71783950 1x1 is definitely doable with multi-threaded native BLAS on a single machine. But usually the full SVD is not necessary for the application. This is why I want to put a soft limit and throw a warning message, which might help users re-consider whether they need a full SVD. That's interesting. Did GitHub show "added some commits 20 hours in the future"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-71783971 [Test build #26208 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26208/consoleFull) for PR 3951 at commit [`6e4ead8`](https://github.com/apache/spark/commit/6e4ead88855170a13806274d3103cbc4bc2a8563). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5430] move treeReduce and treeAggregate...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/4228#issuecomment-71783095 I don't have strong preference. But note that `treeReduce` and `reduce` are quite different. `treeReduce` works better when there are large task results returned at around the same time (which is common for ML tasks), while `reduce` works better when there are many small task results returned in batches. If we put them together, users may think that better `depth` gives better scalability, which is not true. Again, no strong preference. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3222#issuecomment-71782811 [Test build #26207 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26207/consoleFull) for PR 3222 at commit [`de47aaf`](https://github.com/apache/spark/commit/de47aafc5f721167d64ebc7b987b43375ef26798). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-71782133 [Test build #26206 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26206/consoleFull) for PR 3798 at commit [`19406cc`](https://github.com/apache/spark/commit/19406cce66672d74bd0b9c1d98cd8486c186f8ee). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5188][BUILD] make-distribution.sh shoul...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3988#issuecomment-71781548 [Test build #26205 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26205/consoleFull) for PR 3988 at commit [`0f546e0`](https://github.com/apache/spark/commit/0f546e06fb8e5d4e5cf762fbc8d8cc7d11e1935f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5291][CORE] Add timestamp and reason wh...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4082#issuecomment-71781550 [Test build #26204 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26204/consoleFull) for PR 4082 at commit [`a026ff2`](https://github.com/apache/spark/commit/a026ff236510c1ab242e71981102c7d0590c8dd6). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/4068#discussion_r23665528 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveResolutionSuite.scala --- @@ -38,6 +38,15 @@ class HiveResolutionSuite extends HiveComparisonTest { sql("SELECT a[0].A.A from nested").queryExecution.analyzed } + test("SPARK-5278: check ambiguous reference to fields") { +jsonRDD(sparkContext.makeRDD( + """{"a": [{"b": 1, "B": 2}]}""" :: Nil)).registerTempTable("nested") +val exception = intercept[RuntimeException] { + println(sql("SELECT a[0].b from nested").queryExecution.analyzed) --- End diff -- Can you add a comment at here to explain what we are expecting? Also, you can remove `println`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/4068#discussion_r23665510 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -285,11 +285,22 @@ class Analyzer(catalog: Catalog, result // Resolve field names using the resolver. - case f @ GetField(child, fieldName) if !f.resolved && child.resolved => + case f @ GetField(child, fieldName) if child.resolved => child.dataType match { case StructType(fields) => -val resolvedFieldName = fields.map(_.name).find(resolver(_, fieldName)) -resolvedFieldName.map(n => f.copy(fieldName = n)).getOrElse(f) +val actualField = fields.filter(f => resolver(f.name, fieldName)) +if (actualField.length == 0) { + sys.error( +s"No such struct field $fieldName in ${fields.map(_.name).mkString(", ")}") --- End diff -- I think `CheckResolution` should catch it. If we cannot resolve it, just leave it unchanged. Can you see if there is a unit test for this? If not, can you add one? Maybe we can also log it like what `LogicalPlan.resolve` does. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-71779796 It also returns empty bins, just to be compatible with the present API. Hopefully that's not a problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/4231#discussion_r23665261 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/TreePoint.scala --- @@ -96,14 +96,12 @@ private[tree] object TreePoint { * Find bin for one (labeledPoint, feature). * * @param featureArity 0 for continuous features; number of categories for categorical features. - * @param isUnorderedFeature (only applies if feature is categorical) --- End diff -- @jkbradley I removed this param as it is unused. I don't think it is a problem since all tests pass. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...
Github user nightwolfzor commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-71779075 Any chance this one will make it into the 1.3 release? We'd really like to see this one! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5196][SQL] Support `comment` in Create ...
Github user OopsOutOfMemory commented on the pull request: https://github.com/apache/spark/pull/3999#issuecomment-71778526 ping @marmbrus @yhuai I think this is ready to go. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5440][pyspark] Add toLocalIterator to p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4237#issuecomment-71777208 [Test build #26203 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26203/consoleFull) for PR 4237 at commit [`0cdc8f8`](https://github.com/apache/spark/commit/0cdc8f87a02c5bf20f4f61a4dbd83d16431a1af9). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5440][pyspark] Add toLocalIterator to p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4237#issuecomment-71777210 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26203/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5198][Mesos] Change executorId more uni...
Github user jongyoul commented on the pull request: https://github.com/apache/spark/pull/3994#issuecomment-71776175 I'll also close this PR. I've misunderstood mesos #4170 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5198][Mesos] Change executorId more uni...
Github user jongyoul closed the pull request at: https://github.com/apache/spark/pull/3994 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5376][Mesos] MesosExecutor should have ...
Github user jongyoul closed the pull request at: https://github.com/apache/spark/pull/4170 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5376][Mesos] MesosExecutor should have ...
Github user jongyoul commented on the pull request: https://github.com/apache/spark/pull/4170#issuecomment-71776099 I'll close this PR. It's wrong approach. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5376][Mesos] MesosExecutor should have ...
Github user jongyoul commented on the pull request: https://github.com/apache/spark/pull/4170#issuecomment-71775404 @tnachen @mateiz So sorry for taking up a lot of time. I've found that only one executor as a process runs at any time, and I understand executor can have multiple tasks at the same time. I've believed each executor is launched separately when driver launchTasks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4586][MLLIB] Python API for ML pipeline...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4151#issuecomment-71774770 [Test build #26202 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26202/consoleFull) for PR 4151 at commit [`fc59a02`](https://github.com/apache/spark/commit/fc59a022f767750e0b4796b83fa7f1da1e28fb5e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4586][MLLIB] Python API for ML pipeline...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4151#issuecomment-71774775 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26202/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-5406][MLlib] LocalLAPACK mode in RowMat...
Github user hhbyyh commented on the pull request: https://github.com/apache/spark/pull/4200#issuecomment-71774727 @mengxr Sorry I was on something else yesterday. Are you suggesting putting a soft limit in the `auto` mode for local and keep the hard limit in case LocalLAPACK ? I agree with the general idea. Just when I tried on my local machine, it took only about 2 hours to compute the full svd for a 10K * 10K matrix. And I even haven't install the NativeSystemBLAS. So I guess the limit for a single machine will be quite near the hard limit (17515). I'll try with distribute mode today. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5425: Use synchronised methods in system...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/4220#issuecomment-71774621 LGTM. I was worried about `System.getProperty()`'s thread-safety, but I assume it's ultimately synchronized since the underlying store is a `Properties` object. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5425: Use synchronised methods in system...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4220#discussion_r23663471 --- Diff: core/src/test/scala/org/apache/spark/SparkConfSuite.scala --- @@ -17,6 +17,10 @@ package org.apache.spark +import java.util.concurrent.{TimeUnit, Executors} --- End diff -- ultra nit: sort imports --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1934 [CORE] "this" reference escape to "...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/4225#issuecomment-71774422 LGTM. > @zsxwing also reported a similar problem in BlockManager in the JIRA, but I can't find a similar pattern there. Maybe it was subsequently fixed? I checked the history. It's already fixed in #3087 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5376][Mesos] MesosExecutor should have ...
Github user tnachen commented on the pull request: https://github.com/apache/spark/pull/4170#issuecomment-71774383 @jongyoul So an executor can only "launch" one task at a time, but can have multiple tasks running simultaneously as you mentioned. It doesn't matter if they're all part of the same launchTasks message or seperate, as long as the framework and executor id are the same it will be launched in the same executor. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5376][Mesos] MesosExecutor should have ...
Github user jongyoul commented on the pull request: https://github.com/apache/spark/pull/4170#issuecomment-71773877 @tnachen Yes, I fully understand reusing executor while a framework is alive. However, we launch two task on a same executor? What you've answered is they are launched at the same time, isn't it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4809] Rework Guava library shading.
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/3658#issuecomment-71773844 No, I'm done with it. Thanks for taking a look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5376][Mesos] MesosExecutor should have ...
Github user tnachen commented on the pull request: https://github.com/apache/spark/pull/4170#issuecomment-71773647 If you read the fine-grained mode source code, you'll notice that Spark is using the slave id as the executor id, which is what we discussed on the mesos mailing list, that the executor will be re-used if all tasks reuse the same executor id. Therefore, it's only launching one executor per slave, and if the executor dies Mesos will relaunch it when the task asks for it again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3715#discussion_r23663046 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala --- @@ -313,6 +313,7 @@ private object SpecialLengths { val PYTHON_EXCEPTION_THROWN = -2 val TIMING_DATA = -3 val END_OF_STREAM = -4 + val NULL = -5 --- End diff -- @tdas I think that this same null-handling change has been proposed before but until now I don't think we had a great reason to pull it in, since none of our internal APIs relied on it and we were worried that it might mask the presence of bugs. Now that we have a need for it, though, it might be okay to pull in here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5376][Mesos] MesosExecutor should have ...
Github user jongyoul commented on the pull request: https://github.com/apache/spark/pull/4170#issuecomment-71773477 I believed that when we launch mesos driver launchTasks, container run the command `bin/spark-class` everytime running task. And in my qna email for mesos, @tnachen answers that one container run multiple command simultaneously. And my some tests show two tasks runs simutaneously because they write a same log file at the same time. And my digging codes results no limit to launch task on a mesos container. However, @mateiz told me that one executor only runs a single JVM and launch a single task at any time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5440][pyspark] Add toLocalIterator to p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4237#issuecomment-71773354 [Test build #26203 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26203/consoleFull) for PR 4237 at commit [`0cdc8f8`](https://github.com/apache/spark/commit/0cdc8f87a02c5bf20f4f61a4dbd83d16431a1af9). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5440][pyspark] Add toLocalIterator to p...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4237#issuecomment-71773281 Jenkins, this is ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5361]python tuple not supported while c...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4146#issuecomment-71773058 @wingchen Actually, just to be clear here, is this problem related to tuple handling, or is the actual issue related to multiple Java <-> Python conversions not working correctly? If there's nothing tuple-specific about this, do you mind editing the PR title, description, and JIRA to reflect this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-71772711 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26201/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-71772703 [Test build #26201 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26201/consoleFull) for PR 3951 at commit [`7dc1aab`](https://github.com/apache/spark/commit/7dc1aab286d47565b734b472623626b79417b442). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class TreeEnsembleModel(JavaModelWrapper):` * `class DecisionTreeModel(JavaModelWrapper):` * `class RandomForestModel(TreeEnsembleModel):` * `class GradientBoostedTreesModel(TreeEnsembleModel):` * `class GradientBoostedTrees(object):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5376][Mesos] MesosExecutor should have ...
Github user jongyoul commented on the pull request: https://github.com/apache/spark/pull/4170#issuecomment-71772544 I don't know the behaviour in coarse-grained mode, but in fine-grained mode, we use multiple JVM for running tasks. we run spark-class by launcher. This means we launch JVM by running per task. Am I wrong? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Implement Describe Table for SQLContext
Github user OopsOutOfMemory commented on the pull request: https://github.com/apache/spark/pull/4207#issuecomment-71772475 hi, @yanbohappy Thanks for working on this. But by the way, this JIRA(SPARK-5324) `Results of describe can't be queried` is mainly focus on `make describe command` `can be query like a table` but not `Add a describe command in sqlContext`, shall we make this PR focus on it's own JIRA issue? And you have no more test suites to `demonstrate bug fixing`. Would you mind close this PR, and if you have some good advices for `add describe table` you can refer to `SPARK-5135` #4227 , and comment on my PR. I'd be very pleasure : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/3715#discussion_r23662537 --- Diff: make-distribution.sh --- @@ -188,6 +188,7 @@ echo "Build flags: $@" >> "$DISTDIR/RELEASE" # Copy jars cp "$SPARK_HOME"/assembly/target/scala*/*assembly*hadoop*.jar "$DISTDIR/lib/" cp "$SPARK_HOME"/examples/target/scala*/spark-examples*.jar "$DISTDIR/lib/" +cp "$SPARK_HOME"/external/kafka/scala*/*kafka*assembly*.jar "$DISTDIR/lib/" --- End diff -- I not inclined to block this PR on #4215 . We can make the doc fix separately later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/3715#discussion_r23662499 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala --- @@ -313,6 +313,7 @@ private object SpecialLengths { val PYTHON_EXCEPTION_THROWN = -2 val TIMING_DATA = -3 val END_OF_STREAM = -4 + val NULL = -5 --- End diff -- So this patch tries to fix a bug in Python regarding null values? If so, that probably should be a different patch from this Kafka patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/3715#discussion_r23662457 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala --- @@ -374,49 +375,63 @@ private[spark] object PythonRDD extends Logging { // The right way to implement this would be to use TypeTags to get the full // type of T. Since I don't want to introduce breaking changes throughout the // entire Spark API, I have to use this hacky approach: +def write(bytes: Array[Byte]) { + if (bytes == null) { +dataOut.writeInt(SpecialLengths.NULL) + } else { +dataOut.writeInt(bytes.length) +dataOut.write(bytes) + } +} + +def writeS(str: String) { + if (str == null) { +dataOut.writeInt(SpecialLengths.NULL) + } else { +writeUTF(str, dataOut) + } +} + if (iter.hasNext) { val first = iter.next() val newIter = Seq(first).iterator ++ iter first match { case arr: Array[Byte] => - newIter.asInstanceOf[Iterator[Array[Byte]]].foreach { bytes => -dataOut.writeInt(bytes.length) -dataOut.write(bytes) - } + newIter.asInstanceOf[Iterator[Array[Byte]]].foreach(write) case string: String => - newIter.asInstanceOf[Iterator[String]].foreach { str => -writeUTF(str, dataOut) - } + newIter.asInstanceOf[Iterator[String]].foreach(writeS) case stream: PortableDataStream => newIter.asInstanceOf[Iterator[PortableDataStream]].foreach { stream => -val bytes = stream.toArray() -dataOut.writeInt(bytes.length) -dataOut.write(bytes) +write(stream.toArray()) } case (key: String, stream: PortableDataStream) => newIter.asInstanceOf[Iterator[(String, PortableDataStream)]].foreach { case (key, stream) => - writeUTF(key, dataOut) - val bytes = stream.toArray() - dataOut.writeInt(bytes.length) - dataOut.write(bytes) + writeS(key) + write(stream.toArray()) } case (key: String, value: String) => newIter.asInstanceOf[Iterator[(String, String)]].foreach { case (key, value) => - writeUTF(key, dataOut) - writeUTF(value, dataOut) + writeS(key) + writeS(value) } case (key: Array[Byte], value: Array[Byte]) => newIter.asInstanceOf[Iterator[(Array[Byte], Array[Byte])]].foreach { case (key, value) => - dataOut.writeInt(key.length) - dataOut.write(key) - dataOut.writeInt(value.length) - dataOut.write(value) + write(key) + write(value) } +// key is null +case (null, v:Array[Byte]) => --- End diff -- nit: Also, for consistency with other "cases", v --> value. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5376][Mesos] MesosExecutor should have ...
Github user tnachen commented on the pull request: https://github.com/apache/spark/pull/4170#issuecomment-71771833 @jongyul sorry didn't get to finish reviewing the PR, and I agree with matei that in spark usage of mesos it doesn't make sense to give tasks memory, as we share the same executor that is kept running. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5361]python tuple not supported while c...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4146#issuecomment-71771191 Let me take one final look to see if I can pull this in for 1.2.1 (since we're cutting a new RC tonight). In general, this looks safe since only adds new code paths in cases where we'd otherwise throw exception, as opposed to changing the behavior of existing code paths. If things check out, I'll pull it in for both 1.3.0 and 1.2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5135][SQL] Add support for describe tab...
Github user OopsOutOfMemory commented on the pull request: https://github.com/apache/spark/pull/4227#issuecomment-71770856 yeah, @rxin, would you like to talk with @marmbrus for `what we'd like to show in describe extended table` in SQLContext and then file a `JIRA` issues? So that we can do it separately but not in this PR ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5376][Mesos] MesosExecutor should have ...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/4170#issuecomment-71770662 Right, as I said, it doesn't make sense to offer task memory twice. Each executor is a *single* JVM, and JVMs cannot scale their memory up and down. The executor's memory is set to the same value that we configure that JVM with, with `-Xmx`. There's no way to make tasks use more memory than that, no matter how many tasks are running on there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Implement Describe Table for SQLContext
Github user OopsOutOfMemory commented on a diff in the pull request: https://github.com/apache/spark/pull/4207#discussion_r23661855 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -1034,4 +1034,11 @@ class SQLQuerySuite extends QueryTest with BeforeAndAfterAll { rdd.registerTempTable("distinctData") checkAnswer(sql("SELECT COUNT(DISTINCT key,value) FROM distinctData"), Row(2)) } + + test("describe table") { +checkAnswer(sql("DESCRIBE EXTENDED testData"),Seq( --- End diff -- EXTENDED ? This seems no different with describe. I think we'd better do this later after the discusstion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Implement Describe Table for SQLContext
Github user OopsOutOfMemory commented on a diff in the pull request: https://github.com/apache/spark/pull/4207#discussion_r23661781 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -1034,4 +1034,11 @@ class SQLQuerySuite extends QueryTest with BeforeAndAfterAll { rdd.registerTempTable("distinctData") checkAnswer(sql("SELECT COUNT(DISTINCT key,value) FROM distinctData"), Row(2)) } + + test("describe table") { +checkAnswer(sql("DESCRIBE EXTENDED testData"),Seq( +Row("key","IntegerType",null), Row("value","StringType",null) --- End diff -- `IntegerType` and `StringType` ? should be `int, string` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Implement Describe Table for SQLContext
Github user OopsOutOfMemory commented on the pull request: https://github.com/apache/spark/pull/4207#issuecomment-71770183 hi, @yanbohappy, I've already worked in this. For your PR, I took a look at it. But I think it's a little hacky to me. Would you like to review mine? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5376][Mesos] MesosExecutor should have ...
Github user jongyoul commented on the pull request: https://github.com/apache/spark/pull/4170#issuecomment-71770002 Sorry, I've shaw you my configuration. my configuraion is 5G for SPARK_EXECUTOR_MEMORY and 5 for spark.task.cpus. In my screenshot, we launch two tasks on the same machine. Don't you think It's good to offer task memory twice? My PR gives correct resource management information to mesos' master. For CPUs, I don't know proper value of executor cpus, but not CPUS_TASK_CPUS. Recommend this value, please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-5341] Use maven coordinates as dep...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4215#issuecomment-71769759 [Test build #26200 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26200/consoleFull) for PR 4215 at commit [`3705907`](https://github.com/apache/spark/commit/3705907dc2f61fa68f64df14a23622cc40aff9d8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` * (4) the main class for the child` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-5341] Use maven coordinates as dep...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4215#issuecomment-71769764 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26200/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3715#discussion_r23661437 --- Diff: python/pyspark/streaming/kafka.py --- @@ -0,0 +1,82 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from py4j.java_collections import MapConverter +from py4j.java_gateway import java_import, Py4JError + +from pyspark.storagelevel import StorageLevel +from pyspark.serializers import PairDeserializer, NoOpSerializer +from pyspark.streaming import DStream + +__all__ = ['KafkaUtils', 'utf8_decoder'] + + +def utf8_decoder(s): +""" Decode the unicode as UTF-8 """ +return s and s.decode('utf-8') + + +class KafkaUtils(object): + +@staticmethod +def createStream(ssc, zkQuorum, groupId, topics, + storageLevel=StorageLevel.MEMORY_AND_DISK_SER_2, + keyDecoder=utf8_decoder, valueDecoder=utf8_decoder): +""" +Create an input stream that pulls messages from a Kafka Broker. + +:param ssc: StreamingContext object +:param zkQuorum: Zookeeper quorum (hostname:port,hostname:port,..). +:param groupId: The group id for this consumer. +:param topics: Dict of (topic_name -> numPartitions) to consume. +Each partition is consumed in its own thread. +:param storageLevel: RDD storage level. +:param keyDecoder: A function used to decode key +:param valueDecoder: A function used to decode value +:return: A DStream object +""" +java_import(ssc._jvm, "org.apache.spark.streaming.kafka.KafkaUtils") + +param = { +"zookeeper.connect": zkQuorum, +"group.id": groupId, +"zookeeper.connection.timeout.ms": "1", +} +if not isinstance(topics, dict): +raise TypeError("topics should be dict") +jtopics = MapConverter().convert(topics, ssc.sparkContext._gateway._gateway_client) +jparam = MapConverter().convert(param, ssc.sparkContext._gateway._gateway_client) +jlevel = ssc._sc._getJavaStorageLevel(storageLevel) + +def getClassByName(name): +return ssc._jvm.org.apache.spark.util.Utils.classForName(name) + +try: +array = getClassByName("[B") +decoder = getClassByName("kafka.serializer.DefaultDecoder") +jstream = ssc._jvm.KafkaUtils.createStream(ssc._jssc, array, array, decoder, decoder, + jparam, jtopics, jlevel) +except Py4JError, e: +# TODO: use --jar once it also work on driver +if not e.message or 'call a package' in e.message: --- End diff -- This is clever; the 'call a package' errors are _really_ confusing to users, so this message is pretty helpful. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4809] Rework Guava library shading.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3658#issuecomment-71769526 [Test build #26199 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26199/consoleFull) for PR 3658 at commit [`3c93e42`](https://github.com/apache/spark/commit/3c93e42a5e9474b33aa53f7fd6f22998d44a8c52). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5135][SQL] Add support for describe tab...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4227#issuecomment-71769478 Thanks for submitting the new version. Are these two PRs working on the same thing? https://github.com/apache/spark/pull/4207 Would be great if you two can chime in on each other's PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4809] Rework Guava library shading.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3658#issuecomment-71769535 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26199/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5441][pyspark] Make SerDeUtil PairRDD t...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4236#issuecomment-71769501 Hey thanks for this - mind adding a regression test that fails on the old code? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Implement Describe Table for SQLContext
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4207#issuecomment-71769449 Thanks for submitting the pull request. Are these two PRs working on the same thing? https://github.com/apache/spark/pull/4227 Would be great if you two can chime in on each other's PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3715#issuecomment-71769379 Should we have tests for this? Do we have tests for the other Python streaming sources? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3715#discussion_r23661298 --- Diff: make-distribution.sh --- @@ -188,6 +188,7 @@ echo "Build flags: $@" >> "$DISTDIR/RELEASE" # Copy jars cp "$SPARK_HOME"/assembly/target/scala*/*assembly*hadoop*.jar "$DISTDIR/lib/" cp "$SPARK_HOME"/examples/target/scala*/spark-examples*.jar "$DISTDIR/lib/" +cp "$SPARK_HOME"/external/kafka/scala*/*kafka*assembly*.jar "$DISTDIR/lib/" --- End diff -- Rather than packaging this with the release, can we just ask users to add the maven coordinates when launching it. This will add a fairly large amount to the binary size of Spark (especially if we add other ones in the future). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3715#discussion_r23661317 --- Diff: make-distribution.sh --- @@ -188,6 +188,7 @@ echo "Build flags: $@" >> "$DISTDIR/RELEASE" # Copy jars cp "$SPARK_HOME"/assembly/target/scala*/*assembly*hadoop*.jar "$DISTDIR/lib/" cp "$SPARK_HOME"/examples/target/scala*/spark-examples*.jar "$DISTDIR/lib/" +cp "$SPARK_HOME"/external/kafka/scala*/*kafka*assembly*.jar "$DISTDIR/lib/" --- End diff -- I'm assuming that #4215 gets merged in all this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4586][MLLIB] Python API for ML pipeline...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4151#issuecomment-71769162 [Test build #26202 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26202/consoleFull) for PR 4151 at commit [`fc59a02`](https://github.com/apache/spark/commit/fc59a022f767750e0b4796b83fa7f1da1e28fb5e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5430] move treeReduce and treeAggregate...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4228#discussion_r23661228 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -901,6 +901,38 @@ abstract class RDD[T: ClassTag]( } /** + * Reduces the elements of this RDD in a multi-level tree pattern. + * + * @param depth suggested depth of the tree (default: 2) + * @see [[org.apache.spark.rdd.RDD#reduce]] + */ + def treeReduce(f: (T, T) => T, depth: Int = 2): T = { --- End diff -- Even in Scala we should avoid default arguments if possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5097][SQL] Test cases for DataFrame exp...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4235 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org