[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/2843 [SPARK-3791][SQL][WIP] Provides Spark version and Hive version in HiveThriftServer2 This PR overrides the `GetInfo` Hive Thrift API to provide correct version information. Another property `spark.sql.hive.version` is added to reveal the underlying Hive version. These are generally useful for Spark SQL ODBC driver providers. The Spark version information is extracted from the jar manifest. Also took the chance to remove the `SET -v` hack, which was a workaround for Simba ODBC driver connectivity. TODO - [ ] Find a general way to figure out Hive (or even any dependency) version. For Maven builds, we can retrieve the version information from the META-INF/maven directory within the assembly jar. But this doesn't work for SBT builds. Some other possible approaches can be found in this [blog post](http://blog.soebes.de/blog/2014/01/02/version-information-into-your-appas-with-maven/). You can merge this pull request into a Git repository by running: $ git pull https://github.com/liancheng/spark get-info Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2843.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2843 commit dc9438b68fca834c5ddce2918f8a1474f67d33d9 Author: Cheng Lian l...@databricks.com Date: 2014-10-18T09:09:06Z Overrides Hive GetInfo Thrift API and adds Hive version property commit 9799b505e63793beced7ed79793739c011ee4547 Author: Cheng Lian l...@databricks.com Date: 2014-10-19T05:52:26Z Removes the Simba ODBC SET -v hack --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2843#discussion_r19058163 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala --- @@ -33,8 +33,10 @@ private[hive] object SparkSQLEnv extends Logging { def init() { if (hiveContext == null) { - sparkContext = new SparkContext(new SparkConf() - .setAppName(sSparkSQL::${java.net.InetAddress.getLocalHost.getHostName})) + val sparkConf = new SparkConf() + .setAppName(sSparkSQL::${java.net.InetAddress.getLocalHost.getHostName}) +.set(spark.sql.hive.version, 0.12.0-protobuf-2.5) --- End diff -- This need to be generalized. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2843#discussion_r19058178 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala --- @@ -306,7 +306,9 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) { driver.destroy() results case _ = - sessionState.out.println(tokens(0) + + cmd_1) + if (sessionState.out != null) { +sessionState.out.println(tokens(0) + + cmd_1) + } --- End diff -- `SessionState` life cycle control is rather broken and error prone in current code base. Working on a separate PR to fix it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2843#issuecomment-59640634 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21881/consoleFull) for PR 2843 at commit [`9799b50`](https://github.com/apache/spark/commit/9799b505e63793beced7ed79793739c011ee4547). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2843#issuecomment-59640641 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21881/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2843#discussion_r19058181 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala --- @@ -37,35 +43,81 @@ import org.apache.spark.sql.catalyst.util.getTempFilePath /** * Tests for the HiveThriftServer2 using JDBC. + * + * NOTE: SPARK_PREPEND_CLASSES is explicitly disabled in this test suite. Assembly jar must be + * rebuilt after changing HiveThriftServer2 related code. --- End diff -- This requirement should be OK for Jenkins, since Jenkins always build the assembly jar before executing any test suites. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2843#issuecomment-59640639 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21881/consoleFull) for PR 2843 at commit [`9799b50`](https://github.com/apache/spark/commit/9799b505e63793beced7ed79793739c011ee4547). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2576#issuecomment-59640730 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21880/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2843#issuecomment-59641162 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/399/consoleFull) for PR 2843 at commit [`9799b50`](https://github.com/apache/spark/commit/9799b505e63793beced7ed79793739c011ee4547). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2843#issuecomment-59641199 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/400/consoleFull) for PR 2843 at commit [`9799b50`](https://github.com/apache/spark/commit/9799b505e63793beced7ed79793739c011ee4547). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/2844 [SPARK-3958] TorrentBroadcast cleanup / debugging improvements. This PR makes several changes to TorrentBroadcast in order to make it easier to reason about, which should help when debugging SPARK-3958. The key changes: - Remove all state from the global TorrentBroadcast object. This state consisted mainly of configuration options, like the block size and compression codec, and was read by the blockify / unblockify methods. Unfortunately, the use of `lazy val` for `BLOCK_SIZE` meant that the block size was always determined by the first SparkConf that TorrentBroadast was initialized with; as a result, unit tests could not properly test TorrentBroadcast with different block sizes. Instead, blockifyObject and unBlockifyObject now accept compression codecs and blockSizes as arguments. These arguments are supplied at the call sites inside of TorrentBroadcast instances. Each TorrentBroadcast instance determines these values from SparkEnv's SparkConf. I was careful to ensure that we do not accidentally serialize CompressionCodec or SparkConf objects as part of the TorrentBroadcast object. - Remove special-case handling of local-mode in TorrentBroadcast. I don't think that broadcast implementations should know about whether we're running in local mode. If we want to optimize the performance of broadcast in local mode, then we should detect this at a higher level and use a dummy LocalBroadcastFactory implementation instead. Removing this code fixes a subtle error condition: in the old local mode code, a failure to find the broadcast in the local BlockManager would lead to an attempt to deblockify zero blocks, which could lead to confusing deserialization or decompression errors when we attempted to decompress an empty byte array. This should never have happened, though: a failure to find the block in local mode is evidence of some other error. The changes here will make it easier to debug those errors if they ever happen. - Add a check that throws an exception when attempting to deblockify an empty array. - Use ScalaCheck to add a test to check that TorrentBroadcast's blockifyObject and unBlockifyObject methods are inverses. - Misc. cleanup and logging improvements. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JoshRosen/spark torrentbroadcast-bugfix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2844.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2844 commit 48c98c1996c87cebbd0669924f57527b8e81c35e Author: Josh Rosen joshro...@databricks.com Date: 2014-10-19T06:36:49Z [SPARK-3958] TorrentBroadcast cleanup / debugging improvements. This PR makes several changes to TorrentBroadcast in order to make it easier to reason about, which should help when debugging SPARK-3958. The key changes: - Remove all state from the global TorrentBroadcast object. This state consisted mainly of configuration options, like the block size and compression codec, and was read by the blockify / unblockify methods. Unfortunately, the use of `lazy val` for `BLOCK_SIZE` meant that the block size was always determined by the first SparkConf that TorrentBroadast was initialized with; as a result, unit tests could not properly test TorrentBroadcast with different block sizes. Instead, blockifyObject and unBlockifyObject now accept compression codecs and blockSizes as arguments. These arguments are supplied at the call sites inside of TorrentBroadcast instances. Each TorrentBroadcast instance determines these values from SparkEnv's SparkConf. I was careful to ensure that we do not accidentally serialize CompressionCodec or SparkConf objects as part of the TorrentBroadcast object. - Remove special-case handling of local-mode in TorrentBroadcast. I don't think that broadcast implementations should know about whether we're running in local mode. If we want to optimize the performance of broadcast in local mode, then we should detect this at a higher level and use a dummy LocalBroadcastFactory implementation instead. Removing this code fixes a subtle error condition: in the old local mode code, a failure to find the broadcast in the local BlockManager would lead to an attempt to deblockify zero blocks, which could lead to confusing deserialization or decompression errors when we attempted to decompress an empty
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2844#issuecomment-59641699 /cc @rxin for review. I'd like to apply this to `branch-1.1` as well, since I believe that it's also affected by current TorrentBroadcast bugs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2844#issuecomment-59641733 Also, /cc @davies, who helped me to spot the local mode might deblockify an empty array bug and who's been working on TorrentBroadcast optimizations. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2844#issuecomment-59641757 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21882/consoleFull) for PR 2844 at commit [`618a872`](https://github.com/apache/spark/commit/618a87260faaebf353c1d9b4abc17af9f0cfa472). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3902] [SPARK-3590] Stabilize AsynRDDAct...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2760#issuecomment-59641916 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21883/consoleFull) for PR 2760 at commit [`0d45fbc`](https://github.com/apache/spark/commit/0d45fbc9e41c8dc2fffd58a0a48c19a6d9dafdd8). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3902] [SPARK-3590] Stabilize AsynRDDAct...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2760#issuecomment-59641956 @rxin Did you have any other feedback here? If not, I'd like to merge this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2546] Clone JobConf for each task (bran...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2684#issuecomment-59642012 I'm going to merge this and cherry-pick it into all maintenance branches. We'll probably turn on cloning by default in 1.2 and we'll be sure to clearly document this configuration option in the 1.0.3 and 1.1.1 release notes. Thanks to everyone who helped test this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2546] Clone JobConf for each task (bran...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2684 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/2845 [SPARK-4000][BUILD] Sends archived unit tests logs to Jenkins master This PR sends archived unit tests logs to the build history directory in Jenkins master, so that we can serve it via HTTP later to help debugging Jenkins build failures. You can merge this pull request into a Git repository by running: $ git pull https://github.com/liancheng/spark log-archive Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2845.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2845 commit 4b912f78adbc9e6a1a3ca66bf32b5560d642ad5d Author: Cheng Lian l...@databricks.com Date: 2014-10-19T07:39:11Z Sends archived unit tests logs to Jenkins master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2845#issuecomment-59642454 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21884/consoleFull) for PR 2845 at commit [`4b912f7`](https://github.com/apache/spark/commit/4b912f78adbc9e6a1a3ca66bf32b5560d642ad5d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2843#issuecomment-59642510 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/399/consoleFull) for PR 2843 at commit [`9799b50`](https://github.com/apache/spark/commit/9799b505e63793beced7ed79793739c011ee4547). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2844#issuecomment-59642634 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21882/consoleFull) for PR 2844 at commit [`618a872`](https://github.com/apache/spark/commit/618a87260faaebf353c1d9b4abc17af9f0cfa472). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2844#issuecomment-59642636 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21882/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2844#issuecomment-59643009 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21885/consoleFull) for PR 2844 at commit [`33fc754`](https://github.com/apache/spark/commit/33fc75447c676a5fca1f6f7e7095562f3a1583d5). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3997][Build]scalastyle should output th...
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/2846 [SPARK-3997][Build]scalastyle should output the error location You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark SPARK-3997 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2846.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2846 commit 82c38ecdf56fa087606bc9c12df2b9602b5c91a7 Author: GuoQiang Li wi...@qq.com Date: 2014-10-19T08:19:34Z scalastyle should output the error location --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3997][Build]scalastyle should output th...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2846#issuecomment-59643139 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21886/consoleFull) for PR 2846 at commit [`82c38ec`](https://github.com/apache/spark/commit/82c38ecdf56fa087606bc9c12df2b9602b5c91a7). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2843#issuecomment-59643215 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21887/consoleFull) for PR 2843 at commit [`da5e716`](https://github.com/apache/spark/commit/da5e716fd1b8cc48c43f37373641bbabbb91a11f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2843#issuecomment-59643247 Hm, 3 consecutive build failures, embarrassing... For the first one, unit tests are not started at all, seems that the build process is interrupted somehow. The second failure is bit weird, although we're already using random port to avoid port conflict, it still failed to open the listening port. Checked the TCP port range in Jenkins master node, which should be valid. But I don't have access to the Jenkins slave node that executed this build. The cause of the third failure is a known bug fixed in the master branch, just rebased to the most recent master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3902] [SPARK-3590] Stabilize AsynRDDAct...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2760#issuecomment-59643261 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21883/consoleFull) for PR 2760 at commit [`0d45fbc`](https://github.com/apache/spark/commit/0d45fbc9e41c8dc2fffd58a0a48c19a6d9dafdd8). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class JavaFutureActionWrapper[S, T](futureAction: FutureAction[S], converter: S = T)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3902] [SPARK-3590] Stabilize AsynRDDAct...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2760#issuecomment-59643262 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21883/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2843#issuecomment-59643445 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/400/consoleFull)** for PR 2843 at commit [`9799b50`](https://github.com/apache/spark/commit/9799b505e63793beced7ed79793739c011ee4547) after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2845#issuecomment-59643839 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21884/consoleFull) for PR 2845 at commit [`4b912f7`](https://github.com/apache/spark/commit/4b912f78adbc9e6a1a3ca66bf32b5560d642ad5d). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2844#issuecomment-59643838 It looks like this build is going to fail a ReplSuite test: ```scala test(broadcast vars) { // Test that the value that a broadcast var had when it was created is used, // even if that variable is then modified in the driver program // TODO: This doesn't actually work for arrays when we run in local mode! val output = runInterpreter(local, |var array = new Array[Int](5) |val broadcastArray = sc.broadcast(array) |sc.parallelize(0 to 4).map(x = broadcastArray.value(x)).collect |array(0) = 5 |sc.parallelize(0 to 4).map(x = broadcastArray.value(x)).collect .stripMargin) assertDoesNotContain(error:, output) assertDoesNotContain(Exception, output) assertContains(res0: Array[Int] = Array(0, 0, 0, 0, 0), output) assertContains(res2: Array[Int] = Array(5, 0, 0, 0, 0), output) } ``` I see now that my change to remove the special local-mode handling inadvertently leads to a duplication of the variable in the driver program. This could maybe be a performance issue, since now we will use 2x the memory in the driver for each broadcast variable. I'll restore the line that stores the local copy of the broadcast variable when it's created. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2845#issuecomment-59643840 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21884/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2844#issuecomment-59643971 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21885/consoleFull) for PR 2844 at commit [`33fc754`](https://github.com/apache/spark/commit/33fc75447c676a5fca1f6f7e7095562f3a1583d5). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2844#issuecomment-59643974 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21885/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2844#issuecomment-59644064 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21888/consoleFull) for PR 2844 at commit [`5c22782`](https://github.com/apache/spark/commit/5c227825b3cf0bbe3826e20fe66370229bfc43a2). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3997][Build]scalastyle should output th...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2846#issuecomment-59644562 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21886/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3997][Build]scalastyle should output th...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2846#issuecomment-59644561 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21886/consoleFull) for PR 2846 at commit [`82c38ec`](https://github.com/apache/spark/commit/82c38ecdf56fa087606bc9c12df2b9602b5c91a7). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4001][MLlib] adding apriori algorithm f...
GitHub user jackylk opened a pull request: https://github.com/apache/spark/pull/2847 [SPARK-4001][MLlib] adding apriori algorithm for frequent item set mining in Spark Apriori is the classic algorithm for frequent item set mining in a transactional data set. It will be useful if Apriori algorithm is added to MLLib in Spark. This PR add an implementation for it. There is a point I am not sure wether it is most efficient. In order to filter out the eligible frequent item set, currently I am using a cartesian operation on two RDDs to calculate the degree of support of each item set, not sure wether it is better to use broadcast variable to achieve the same. I will add an example to use this algorithm if requires You can merge this pull request into a Git repository by running: $ git pull https://github.com/jackylk/spark apriori Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2847.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2847 commit da2cba7e063745aacef74ff555e7bd7c55a24f56 Author: Jacky Li jacky.li...@huawei.com Date: 2014-10-19T09:19:27Z adding apriori algorithm for frequent item set mining in Spark commit 889b33fdfabcc222c82e3bce619aeb6c7031fc58 Author: Jacky Li jacky.li...@huawei.com Date: 2014-10-19T09:31:04Z modify per scalastyle check --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4001][MLlib] adding apriori algorithm f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2847#issuecomment-59644841 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2843#issuecomment-59645000 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21887/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2843#issuecomment-59644997 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21887/consoleFull) for PR 2843 at commit [`da5e716`](https://github.com/apache/spark/commit/da5e716fd1b8cc48c43f37373641bbabbb91a11f). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` class SerializableMapWrapper[A, B](underlying: collection.Map[A, B])` * `class Predict(` * `case class EvaluatePython(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2844#issuecomment-59645141 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21888/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2844#issuecomment-59645137 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21888/consoleFull) for PR 2844 at commit [`5c22782`](https://github.com/apache/spark/commit/5c227825b3cf0bbe3826e20fe66370229bfc43a2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3926 [CORE] Result of JavaRDD.collectAsM...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2805#issuecomment-59645175 Yes, all SGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3989]Added possibility to directly inst...
Github user ziky90 commented on the pull request: https://github.com/apache/spark/pull/2836#issuecomment-59645599 Ok, currently I'm using EMR instead of the spark-ec2 script, because it seems to me more convenient then connecting EC2 cluster from my own bash script, but you're right it's a possible way to go and it's not necessarily needed to have this functionality in spark-ec2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3989]Added possibility to directly inst...
Github user ziky90 closed the pull request at: https://github.com/apache/spark/pull/2836 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2845#discussion_r19059750 --- Diff: dev/run-tests-jenkins --- @@ -92,12 +92,39 @@ function post_message () { echo api_response: ${api_response} 2 echo data: ${data} 2 fi - + if [ $curl_status -eq 0 ] [ $http_code -eq 201 ]; then echo Post successful. fi } +function send_archived_logs () { + echo Archiving unit tests logs... + + local log_files=$(find . -name unit-tests.log) + + if [ -z $log_files ]; then +echo No log files found. 2 + else +local log_archive=unit-tests-logs.tar.gz +echo $log_files | xargs tar czf ${log_archive} + +local jenkins_build_dir=${JENKINS_HOME}/jobs/${JOB_NAME}/builds/${BUILD_NUMBER} +local scp_output=$(scp ${log_archive} amp-jenkins-master:${jenkins_build_dir}/${log_archive}) --- End diff -- It's not good to hardcode Jenkins master hostname here. Should inject an extra environment variable `$MASTER_NODE_NAME` in Jenkins configurations. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2845#issuecomment-59647627 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21889/consoleFull) for PR 2845 at commit [`68c7010`](https://github.com/apache/spark/commit/68c7010748fd275cd4e10ac09d994dc0e61a4e24). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/2471#issuecomment-59648648 @vanzin. is it ok to go? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2845#issuecomment-59649466 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21889/consoleFull) for PR 2845 at commit [`68c7010`](https://github.com/apache/spark/commit/68c7010748fd275cd4e10ac09d994dc0e61a4e24). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2845#issuecomment-59649470 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21889/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3959][SPARK-3960][SQL] SqlParser fails ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2816#issuecomment-59649582 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21890/consoleFull) for PR 2816 at commit [`5c847ac`](https://github.com/apache/spark/commit/5c847aca4e7d618dee7b8c647bdca6f845d328e3). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2845#issuecomment-59649597 Verified that the log archive was uploaded to the correct location in Jenkins master node. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-59649721 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21891/consoleFull) for PR 2520 at commit [`553d9e9`](https://github.com/apache/spark/commit/553d9e9536e2e939278d238a0a34a3b9024590b5). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3959][SPARK-3960][SQL] SqlParser fails ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2816#issuecomment-59650495 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21890/consoleFull) for PR 2816 at commit [`5c847ac`](https://github.com/apache/spark/commit/5c847aca4e7d618dee7b8c647bdca6f845d328e3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3959][SPARK-3960][SQL] SqlParser fails ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2816#issuecomment-59650498 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21890/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3791][SQL][WIP] Provides Spark version ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2843#discussion_r19060362 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala --- @@ -753,44 +753,19 @@ class HiveQuerySuite extends HiveComparisonTest { } assert(hiveconf.get(testKey, ) == testVal) -assertResult(Set(testKey - testVal)) { - collectResults(sql(SET)) -} +assertResult(Set(testKey - testVal))(collectResults(sql(SET))) +assertResult(Set(testKey - testVal))(collectResults(sql(SET -v))) sql(sSET ${testKey + testKey}=${testVal + testVal}) assert(hiveconf.get(testKey + testKey, ) == testVal + testVal) assertResult(Set(testKey - testVal, (testKey + testKey) - (testVal + testVal))) { collectResults(sql(SET)) } - -// set key -assertResult(Set(testKey - testVal)) { - collectResults(sql(sSET $testKey)) -} - -assertResult(Set(nonexistentKey - undefined)) { - collectResults(sql(sSET $nonexistentKey)) -} - -// Assert that sql() should have the same effects as sql() by repeating the above using sql(). -clear() -assert(sql(SET).collect().size == 0) - -assertResult(Set(testKey - testVal)) { - collectResults(sql(sSET $testKey=$testVal)) -} - -assert(hiveconf.get(testKey, ) == testVal) -assertResult(Set(testKey - testVal)) { - collectResults(sql(SET)) -} - -sql(sSET ${testKey + testKey}=${testVal + testVal}) -assert(hiveconf.get(testKey + testKey, ) == testVal + testVal) assertResult(Set(testKey - testVal, (testKey + testKey) - (testVal + testVal))) { --- End diff -- These lines are removed because they were originally for testing the deprecated `hql` call. At that time `sql` and `hql` have different code paths. Later on those `hql` calls were changed to `sql` to avoid compile time deprecation warning, and this makes them absolutely duplicated code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3959][SPARK-3960][SQL] SqlParser fails ...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2816#issuecomment-59650806 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3959][SPARK-3960][SQL] SqlParser fails ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2816#issuecomment-59650907 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21892/consoleFull) for PR 2816 at commit [`5c847ac`](https://github.com/apache/spark/commit/5c847aca4e7d618dee7b8c647bdca6f845d328e3). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-59651589 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21891/consoleFull) for PR 2520 at commit [`553d9e9`](https://github.com/apache/spark/commit/553d9e9536e2e939278d238a0a34a3b9024590b5). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-59651592 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21891/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3959][SPARK-3960][SQL] SqlParser fails ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2816#issuecomment-59652376 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21892/consoleFull) for PR 2816 at commit [`5c847ac`](https://github.com/apache/spark/commit/5c847aca4e7d618dee7b8c647bdca6f845d328e3). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3959][SPARK-3960][SQL] SqlParser fails ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2816#issuecomment-59652379 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21892/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2844#issuecomment-59653126 This most recent test-failure is another side-effect of removing TorrentBroadcast's optimizations for local mode: ``` [info] - Unpersisting TorrentBroadcast on executors only in local mode *** FAILED *** [info] 1 did not equal 0 (BroadcastSuite.scala:219) [info] - Unpersisting TorrentBroadcast on executors and driver in local mode *** FAILED *** [info] 1 did not equal 0 (BroadcastSuite.scala:219) ``` This time, the error is because there's a check that asserts that broadcast pieces are not stored into the driver's block manager when running in local mode. I don't think that this optimization necessarily makes sense, since we'll have to store those blocks anyways when running in distributed mode. Therefore, I'm going to change these tests to remove this local-mode special-casing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2844#issuecomment-59653640 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21893/consoleFull) for PR 2844 at commit [`c3b08f9`](https://github.com/apache/spark/commit/c3b08f93b61f0748b7c42fc32314bd92150e5b88). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2844#issuecomment-59655954 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21893/consoleFull) for PR 2844 at commit [`c3b08f9`](https://github.com/apache/spark/commit/c3b08f93b61f0748b7c42fc32314bd92150e5b88). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2844#issuecomment-59655958 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21893/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/2576#discussion_r19061946 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcTableOperations.scala --- @@ -0,0 +1,351 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +package org.apache.spark.sql.hive.orc + +import java.io.IOException +import java.text.SimpleDateFormat +import java.util.{Locale, Date} +import scala.collection.JavaConversions._ + +import org.apache.hadoop.fs.Path +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.mapreduce.lib.output.{FileOutputFormat, FileOutputCommitter} +import org.apache.hadoop.mapreduce.lib.input.FileInputFormat +import org.apache.hadoop.io.{Writable, NullWritable} +import org.apache.hadoop.mapreduce.{TaskID, TaskAttemptContext, Job} +import org.apache.hadoop.hive.ql.io.orc.{OrcSerde, OrcInputFormat, OrcOutputFormat} +import org.apache.hadoop.hive.serde2.objectinspector._ +import org.apache.hadoop.hive.serde2.ColumnProjectionUtils +import org.apache.hadoop.hive.common.`type`.{HiveDecimal, HiveVarchar} +import org.apache.hadoop.mapred.{SparkHadoopMapRedUtil, Reporter, JobConf} + +import org.apache.spark.sql.execution._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.parquet.FileSystemHelper +import org.apache.spark.{TaskContext, SerializableWritable} +import org.apache.spark.rdd.RDD +import org.apache.spark.util.Utils._ + +/** + * orc table scan operator. Imports the file that backs the given + * [[org.apache.spark.sql.hive.orc.OrcRelation]] as a ``RDD[Row]``. + */ +case class OrcTableScan( +output: Seq[Attribute], +relation: OrcRelation, +columnPruningPred: Option[Expression]) + extends LeafNode { + + @transient + lazy val serde: OrcSerde = initSerde + + @transient + lazy val getFieldValue: Seq[Product = Any] = { +val inspector = serde.getObjectInspector.asInstanceOf[StructObjectInspector] +output.map(attr = { + val ref = inspector.getStructFieldRef(attr.name.toLowerCase(Locale.ENGLISH)) + row: Product = { +val fieldData = row.productElement(1) +val data = inspector.getStructFieldData(fieldData, ref) +unwrapData(data, ref.getFieldObjectInspector) + } +}) + } + + private def initSerde(): OrcSerde = { +val serde = new OrcSerde +serde.initialize(null, relation.prop) +serde + } + + def unwrapData(data: Any, oi: ObjectInspector): Any = oi match { +case pi: PrimitiveObjectInspector = pi.getPrimitiveJavaObject(data) +case li: ListObjectInspector = + Option(li.getList(data)) +.map(_.map(unwrapData(_, li.getListElementObjectInspector)).toSeq) +.orNull +case mi: MapObjectInspector = + Option(mi.getMap(data)).map( +_.map { + case (k, v) = +(unwrapData(k, mi.getMapKeyObjectInspector), + unwrapData(v, mi.getMapValueObjectInspector)) +}.toMap).orNull +case si: StructObjectInspector = + val allRefs = si.getAllStructFieldRefs + new GenericRow( +allRefs.map(r = + unwrapData(si.getStructFieldData(data, r), r.getFieldObjectInspector)).toArray) + } + + override def execute(): RDD[Row] = { +val sc = sqlContext.sparkContext +val job = new Job(sc.hadoopConfiguration) + +val conf: Configuration = job.getConfiguration +val fileList = FileSystemHelper.listFiles(relation.path, conf) + +// add all paths in the directory but skip hidden ones such +// as _SUCCESS +for (path - fileList if !path.getName.startsWith(_)) { + FileInputFormat.addInputPath(job, path) +} + +setColumnIds(output, relation, conf) +val inputClass = classOf[OrcInputFormat].asInstanceOf[ + Class[_ :
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2576#issuecomment-59660023 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21894/consoleFull) for PR 2576 at commit [`f680da0`](https://github.com/apache/spark/commit/f680da07742605e6a38bf4132477e063b2b22548). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2780#issuecomment-59661579 @chouqin Thanks for the updates! The updates look good. One more small comment: Could you please add explicit checks in the unit tests to make sure the returned splits are distinct? I should have thought of that earlier. I'll try some timing tests to make sure the sampling does not take too long. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2576#issuecomment-59663743 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21894/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/2844#discussion_r19063222 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -76,23 +87,20 @@ private[spark] class TorrentBroadcast[T: ClassTag]( * @return number of blocks this broadcast variable is divided into */ private def writeBlocks(): Int = { -// For local mode, just put the object in the BlockManager so we can find it later. -SparkEnv.get.blockManager.putSingle( - broadcastId, _value, StorageLevel.MEMORY_AND_DISK, tellMaster = false) - -if (!isLocal) { - val blocks = TorrentBroadcast.blockifyObject(_value) - blocks.zipWithIndex.foreach { case (block, i) = -SparkEnv.get.blockManager.putBytes( - BroadcastBlockId(id, piece + i), - block, - StorageLevel.MEMORY_AND_DISK_SER, - tellMaster = true) - } - blocks.length -} else { - 0 +// Store a copy of the broadcast variable in the driver so that tasks run on the driver +// do not create a duplicate copy of the broadcast variable's value. +SparkEnv.get.blockManager.putSingle(broadcastId, _value, StorageLevel.MEMORY_AND_DISK, + tellMaster = false) --- End diff -- I wonder that store a serialized copy in local mode will not help anything. If it failed to fetch the original copy of value from blockManager, it will also can not fetch the serialized copy. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/2844#discussion_r19063253 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -62,6 +59,20 @@ private[spark] class TorrentBroadcast[T: ClassTag]( * blocks from the driver and/or other executors. */ @transient private var _value: T = obj + /** The compression codec to use, or None if compression is disabled */ + @transient private var compressionCodec: Option[CompressionCodec] = _ + /** Size of each block. Default value is 4MB. This value is only read by the broadcaster. */ + @transient private var blockSize: Int = _ --- End diff -- How about move these two as part of Constructor? Reading the Conf in TorrentBroadcastFactor --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/2780#issuecomment-59664449 @chouqin LGTM. :+1: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/2844#discussion_r19063271 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -156,6 +158,7 @@ private[spark] class TorrentBroadcast[T: ClassTag]( private def readObject(in: ObjectInputStream) { in.defaultReadObject() TorrentBroadcast.synchronized { + setConf(SparkEnv.get.conf) --- End diff -- This looks wired, how can we make sure that this conf is equals to the one used when create the Broadcast? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2844#discussion_r19063287 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -156,6 +158,7 @@ private[spark] class TorrentBroadcast[T: ClassTag]( private def readObject(in: ObjectInputStream) { in.defaultReadObject() TorrentBroadcast.synchronized { + setConf(SparkEnv.get.conf) --- End diff -- The conf is application-scoped. The same conf should be present on this application's executors, where this task will be deserialized. This assumption is used elsewhere, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2844#discussion_r19063336 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -76,23 +87,20 @@ private[spark] class TorrentBroadcast[T: ClassTag]( * @return number of blocks this broadcast variable is divided into */ private def writeBlocks(): Int = { -// For local mode, just put the object in the BlockManager so we can find it later. -SparkEnv.get.blockManager.putSingle( - broadcastId, _value, StorageLevel.MEMORY_AND_DISK, tellMaster = false) - -if (!isLocal) { - val blocks = TorrentBroadcast.blockifyObject(_value) - blocks.zipWithIndex.foreach { case (block, i) = -SparkEnv.get.blockManager.putBytes( - BroadcastBlockId(id, piece + i), - block, - StorageLevel.MEMORY_AND_DISK_SER, - tellMaster = true) - } - blocks.length -} else { - 0 +// Store a copy of the broadcast variable in the driver so that tasks run on the driver +// do not create a duplicate copy of the broadcast variable's value. +SparkEnv.get.blockManager.putSingle(broadcastId, _value, StorageLevel.MEMORY_AND_DISK, + tellMaster = false) --- End diff -- The reason for this store is to avoid creating two copies of `_value` in the driver. If we serialize and deserialize a broadcast variable on the driver and then attempt to access its value, then without this code we will end up going through the regular de-chunking code path, which will cause us to deserialize the serialized copy of `_value` and waste memory. I believe that this serialization and deserialization can take place when tasks are run in local mode, since we still serialize tasks in order to help users be aware of serialization issues that would impact them if they moved to a cluster. This complexity is another reason why I'm in favor of just scrapping all local-mode special-casing and configuring Spark to use a dummy LocalBroadcastFactory for local mode instead of whichever setting the user specified. That would be a larger, more-invasive change, which is why I opted for the simpler fix here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/2780#issuecomment-59664847 @jkbradley I read the paper by Sanku et al and other papers but they required a custom implementation. The sort method has worked OK so far but I was hoping somebody would implement a generic quantile approximation algorithm for Spark that is O(n) and requires limited memory. I think such methods exist in other libraries such as [Algebird](http://twitter.github.io/algebird/com/twitter/algebird/QTree$.html) and [Tdigest](https://github.com/tdunning/t-digest). We should also look whether BlinkDB has attempted to tackle this problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2844#discussion_r19063363 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -62,6 +59,20 @@ private[spark] class TorrentBroadcast[T: ClassTag]( * blocks from the driver and/or other executors. */ @transient private var _value: T = obj + /** The compression codec to use, or None if compression is disabled */ + @transient private var compressionCodec: Option[CompressionCodec] = _ + /** Size of each block. Default value is 4MB. This value is only read by the broadcaster. */ + @transient private var blockSize: Int = _ --- End diff -- I thought about this and agree that it might be cleaner, but this will require more refactoring of other code. One design goal here was to minimize the serialized size of TorrentBroadcast objects, so we can't serialize the SparkConf or CompressionCodec instances (which contain SparkConfs). SparkEnv.conf determines these values anyways. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/2844#discussion_r19063455 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -179,43 +183,29 @@ private[spark] class TorrentBroadcast[T: ClassTag]( private object TorrentBroadcast extends Logging { - /** Size of each block. Default value is 4MB. */ - private lazy val BLOCK_SIZE = conf.getInt(spark.broadcast.blockSize, 4096) * 1024 - private var initialized = false - private var conf: SparkConf = null - private var compress: Boolean = false - private var compressionCodec: CompressionCodec = null - - def initialize(_isDriver: Boolean, conf: SparkConf) { -TorrentBroadcast.conf = conf // TODO: we might have to fix it in tests -synchronized { - if (!initialized) { -compress = conf.getBoolean(spark.broadcast.compress, true) -compressionCodec = CompressionCodec.createCodec(conf) -initialized = true - } -} - } - def stop() { -initialized = false - } - - def blockifyObject[T: ClassTag](obj: T): Array[ByteBuffer] = { -val bos = new ByteArrayChunkOutputStream(BLOCK_SIZE) -val out: OutputStream = if (compress) compressionCodec.compressedOutputStream(bos) else bos -val ser = SparkEnv.get.serializer.newInstance() + def blockifyObject[T: ClassTag]( --- End diff -- The conf has been moved into `class Broadcast`, maybe blockifyObject and unblockify also should be moved. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Minor change in the comment of spark-defaults....
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/2709#issuecomment-59667207 @andrewor14 Sorry for late reply since I was on vacation in Europe last week. I can continue work on this after I finish my talk in IOTA conf tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1209 [CORE] SparkHadoop{MapRed,MapReduce...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2814#issuecomment-59669342 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21895/consoleFull) for PR 2814 at commit [`11e7d5d`](https://github.com/apache/spark/commit/11e7d5d6edf48fc386f8cf58c91fe2c4bdadc45e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...
GitHub user ryan-williams opened a pull request: https://github.com/apache/spark/pull/2848 [SPARK-3967] donât redundantly overwrite executor JAR deps You can merge this pull request into a Git repository by running: $ git pull https://github.com/ryan-williams/spark fetch-file Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2848.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2848 commit f3c80ae80be474ed9928b319f2b0d7808b028915 Author: Ryan Williams ryan.blake.willi...@gmail.com Date: 2014-10-17T22:21:23Z donât redundantly overwrite executor JAR deps see SPARK-3967 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2848#issuecomment-59669916 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3888] [PySpark] limit the memory used b...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2743#discussion_r19064542 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala --- @@ -63,9 +64,12 @@ private[spark] class PythonRDD( val localdir = env.blockManager.diskBlockManager.localDirs.map( f = f.getPath()).mkString(,) envVars += (SPARK_LOCAL_DIRS - localdir) // it's also used in monitor thread -if (reuse_worker) { +if (reuseWorker) { envVars += (SPARK_REUSE_WORKER - 1) } +if (!memoryLimit.isEmpty) { + envVars += (PYSPARK_WORKER_MEMORY_LIMIT - memoryLimit) --- End diff -- @davies - the environment variable is only for internal use, correct? One thing is we could name this to make it more clear that is is only for internal use: ``` _PYSPARK_WORKER_MEMORY_LIMIT ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2845#discussion_r19064621 --- Diff: dev/run-tests-jenkins --- @@ -92,12 +92,39 @@ function post_message () { echo api_response: ${api_response} 2 echo data: ${data} 2 fi - + if [ $curl_status -eq 0 ] [ $http_code -eq 201 ]; then echo Post successful. fi } +function send_archived_logs () { + echo Archiving unit tests logs... + + local log_files=$(find . -name unit-tests.log) + + if [ -z $log_files ]; then +echo No log files found. 2 + else +local log_archive=unit-tests-logs.tar.gz +echo $log_files | xargs tar czf ${log_archive} + +local jenkins_build_dir=${JENKINS_HOME}/jobs/${JOB_NAME}/builds/${BUILD_NUMBER} +local scp_output=$(scp ${log_archive} amp-jenkins-master:${jenkins_build_dir}/${log_archive}) --- End diff -- I'm confused actually - is amp-jenkins-master the current hostname of the master machine? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2845#discussion_r19064633 --- Diff: dev/run-tests-jenkins --- @@ -92,12 +92,39 @@ function post_message () { echo api_response: ${api_response} 2 echo data: ${data} 2 fi - + if [ $curl_status -eq 0 ] [ $http_code -eq 201 ]; then echo Post successful. fi } +function send_archived_logs () { + echo Archiving unit tests logs... + + local log_files=$(find . -name unit-tests.log) + + if [ -z $log_files ]; then +echo No log files found. 2 + else +local log_archive=unit-tests-logs.tar.gz +echo $log_files | xargs tar czf ${log_archive} --- End diff -- Just wondering, will these appear in the tarfile under the full path (e.g. streaming/target/unit-tests.log)? That's ideal. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2845#discussion_r19064648 --- Diff: dev/run-tests-jenkins --- @@ -92,12 +92,39 @@ function post_message () { echo api_response: ${api_response} 2 echo data: ${data} 2 fi - + if [ $curl_status -eq 0 ] [ $http_code -eq 201 ]; then echo Post successful. fi } +function send_archived_logs () { + echo Archiving unit tests logs... + + local log_files=$(find . -name unit-tests.log) + + if [ -z $log_files ]; then +echo No log files found. 2 + else +local log_archive=unit-tests-logs.tar.gz +echo $log_files | xargs tar czf ${log_archive} + +local jenkins_build_dir=${JENKINS_HOME}/jobs/${JOB_NAME}/builds/${BUILD_NUMBER} --- End diff -- Should we add BUILD_NUMBER in the message that we post? Something like this: ``` [Test build #XXX has started/finished] for PR 2845 at commit 4b912f7 (build $XXX). ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2845#issuecomment-59671283 Overall this looks good, had a few minor questions. One thing we can do next is that we could scp the logs to a web server that we control (e.g. something under people.apache.org) and clean up the old ones every time we copy something over. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/2845#discussion_r19064710 --- Diff: dev/run-tests-jenkins --- @@ -92,12 +92,39 @@ function post_message () { echo api_response: ${api_response} 2 echo data: ${data} 2 fi - + if [ $curl_status -eq 0 ] [ $http_code -eq 201 ]; then echo Post successful. fi } +function send_archived_logs () { + echo Archiving unit tests logs... + + local log_files=$(find . -name unit-tests.log) + + if [ -z $log_files ]; then +echo No log files found. 2 + else +local log_archive=unit-tests-logs.tar.gz +echo $log_files | xargs tar czf ${log_archive} + +local jenkins_build_dir=${JENKINS_HOME}/jobs/${JOB_NAME}/builds/${BUILD_NUMBER} --- End diff -- That could be useful. We may also be able to do away with the for PR part since that's kinda redundant. Note that you can currently get the build number from the build URL in the existing messages posted to GitHub. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3902] [SPARK-3590] Stabilize AsynRDDAct...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2760#issuecomment-59672002 LGTM - we discussed some details of this offline last week. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1209 [CORE] SparkHadoop{MapRed,MapReduce...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2814#issuecomment-59672188 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21895/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2845#discussion_r19064925 --- Diff: dev/run-tests-jenkins --- @@ -92,12 +92,39 @@ function post_message () { echo api_response: ${api_response} 2 echo data: ${data} 2 fi - + if [ $curl_status -eq 0 ] [ $http_code -eq 201 ]; then echo Post successful. fi } +function send_archived_logs () { + echo Archiving unit tests logs... + + local log_files=$(find . -name unit-tests.log) + + if [ -z $log_files ]; then +echo No log files found. 2 + else +local log_archive=unit-tests-logs.tar.gz +echo $log_files | xargs tar czf ${log_archive} + +local jenkins_build_dir=${JENKINS_HOME}/jobs/${JOB_NAME}/builds/${BUILD_NUMBER} +local scp_output=$(scp ${log_archive} amp-jenkins-master:${jenkins_build_dir}/${log_archive}) --- End diff -- This hostname is accessible from Jenkins slave nodes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1209 [CORE] SparkHadoop{MapRed,MapReduce...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2814#issuecomment-59672186 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21895/consoleFull) for PR 2814 at commit [`11e7d5d`](https://github.com/apache/spark/commit/11e7d5d6edf48fc386f8cf58c91fe2c4bdadc45e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4000][BUILD] Sends archived unit tests ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2845#discussion_r19064927 --- Diff: dev/run-tests-jenkins --- @@ -92,12 +92,39 @@ function post_message () { echo api_response: ${api_response} 2 echo data: ${data} 2 fi - + if [ $curl_status -eq 0 ] [ $http_code -eq 201 ]; then echo Post successful. fi } +function send_archived_logs () { + echo Archiving unit tests logs... + + local log_files=$(find . -name unit-tests.log) + + if [ -z $log_files ]; then +echo No log files found. 2 + else +local log_archive=unit-tests-logs.tar.gz +echo $log_files | xargs tar czf ${log_archive} --- End diff -- Yes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [spark-3907][sql] add truncate table support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2770#issuecomment-59672228 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/402/consoleFull) for PR 2770 at commit [`f6e710e`](https://github.com/apache/spark/commit/f6e710e7d2c455d57065bd712789b7dd0bf357fb). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3904] [SQL] add constant objectinspecto...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2762#issuecomment-59672239 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/403/consoleFull) for PR 2762 at commit [`49d442b`](https://github.com/apache/spark/commit/49d442bb97259b3a3a07456d65b27e9c2696b916). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1547: Adding Gradient Boos...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-59672538 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21896/consoleFull) for PR 2607 at commit [`6a11c02`](https://github.com/apache/spark/commit/6a11c0249268378b3319644f467daefa8807a899). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org