[GitHub] spark pull request: [SPARK-11102] [SQL] Uninformative exception wh...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9490#issuecomment-160063059 **[Test build #46809 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46809/consoleFull)** for PR 9490 at commit [`9dd1801`](https://github.com/apache/spark/commit/9dd180146f4f66884c73c56e37decefa0d12b5df). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7857][MLLIB] Prevent IDFModel from retu...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/9843#discussion_r46022674 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/IDF.scala --- @@ -211,14 +213,17 @@ private object IDFModel { val n = v.size v match { case SparseVector(size, indices, values) => +val newElements = new ArrayBuffer[(Int, Double)] val nnz = indices.size -val newValues = new Array[Double](nnz) var k = 0 while (k < nnz) { - newValues(k) = values(k) * idf(indices(k)) + val newValue = values(k) * idf(indices(k)) --- End diff -- The existing code is fine since the idf vector is always dense which has constant time lookup. In your change, that will strongly impact the performance, so I don't think it should belong to a separate PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core] Bypass unnecessary network ...
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/9478#issuecomment-160057949 @andrewor14 ISTM that this pr has a little effect on performance even in case of many partitions involved in shuffle. Test settings: - 10 test runs except for the first run - 4 workers in single c4.8xlarge (each worker has 12g mem) - spark.driver.memory=1g - spark.executor.memory=12g - spark.shuffle.bypassNetworkAccess=true - spark.sql.shuffle.partitions=1000 - total shuffle size = around 1.5g - query executed: ``` import org.apache.spark.sql.types._ import org.apache.spark.sql.Row val schema = StructType( Seq( StructField(s"col0", IntegerType, true), StructField(s"col1", StringType, true) ) ) val rdd = sc.parallelize((1 to 36), 36).flatMap { j => (1 to 500).map ( i => Row(i, s"${i}")) } sqlContext.createDataFrame(rdd, schema).registerTempTable("shuffle") sqlContext.sql("cache table shuffle") def timer[R](block: => R): R = { val t0 = System.nanoTime() val result = block val t1 = System.nanoTime() println("Elapsed time: " + ((t1 - t0 + 0.0) / 10.0)+ "s") result } (0 until 12).map { i => timer { sql(s"""select col0, col1 from shuffle cluster by col0""").queryExecution.executedPlan(1).execute().foreach(_ => {}) } } ``` - w/o bypassing Elapsed time: 9.344261745s Elapsed time: 10.039675768s Elapsed time: 11.644405718s Elapsed time: 9.600332115s Elapsed time: 9.745275262s Elapsed time: 11.469732737s Elapsed time: 11.495929827s Elapsed time: 9.036307879s Elapsed time: 9.159725732s Elapsed time: 9.142030108s - w/ bypassing Elapsed time: 8.529972448s Elapsed time: 9.302905215s Elapsed time: 9.933504313s Elapsed time: 7.681615877s Elapsed time: 9.16150229s Elapsed time: 9.655643012s Elapsed time: 8.948393461s Elapsed time: 8.048391038s Elapsed time: 8.780075475s Elapsed time: 9.826335759s I think that this patch might avoid issues caused by netty (i.e., gc pressures and network troubles) in a corner case though, there is no big performance gain in this quick benchmarks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9991#issuecomment-160057079 **[Test build #46807 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46807/consoleFull)** for PR 9991 at commit [`b694e27`](https://github.com/apache/spark/commit/b694e27979a2acad3e9653e082c08e8b3f7e41b7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][BUILD] Changed the comment to reflect ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10012#issuecomment-160056890 **[Test build #46808 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46808/consoleFull)** for PR 10012 at commit [`cd0615b`](https://github.com/apache/spark/commit/cd0615bd23b71ffc7f85ba70119b7b023a7b8b92). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core] Bypass unnecessary network ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9478#issuecomment-160056489 **[Test build #46806 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46806/consoleFull)** for PR 9478 at commit [`303abcd`](https://github.com/apache/spark/commit/303abcd0f37d137c6a8ce4a0147466bb8feb9d9e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][BUILD] Changed the comment to reflect ...
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/10012 [MINOR][BUILD] Changed the comment to reflect the plugin project is there to support SBT pom reader only. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark minor-build-comment Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10012.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10012 commit cd0615bd23b71ffc7f85ba70119b7b023a7b8b92 Author: Prashant Sharma Date: 2015-11-27T07:04:52Z [MINOR][BUILD] Changed the comment to reflect the plugin project is there to support SBT pom reader only. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core] Bypass unnecessary network ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/9478#discussion_r46021085 --- Diff: core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala --- @@ -27,16 +27,24 @@ import org.mockito.Matchers.{any, eq => meq} import org.mockito.Mockito._ import org.mockito.invocation.InvocationOnMock import org.mockito.stubbing.Answer -import org.scalatest.PrivateMethodTester +import org.scalatest.{BeforeAndAfterAll, PrivateMethodTester} -import org.apache.spark.{SparkFunSuite, TaskContext} +import org.apache.spark.{SparkConf, SparkFunSuite, TaskContext} import org.apache.spark.network._ import org.apache.spark.network.buffer.ManagedBuffer import org.apache.spark.network.shuffle.BlockFetchingListener import org.apache.spark.shuffle.FetchFailedException -class ShuffleBlockFetcherIteratorSuite extends SparkFunSuite with PrivateMethodTester { +class ShuffleBlockFetcherIteratorSuite extends SparkFunSuite +with BeforeAndAfterAll with PrivateMethodTester { --- End diff -- This fix complies your comments? I referred to `ContextClearnerSuite`; https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ContextCleanerSuite.scala#L45 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-8426] [scheduler] enhance blacklist mec...
Github user mwws commented on a diff in the pull request: https://github.com/apache/spark/pull/8760#discussion_r46021072 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -636,7 +622,8 @@ private[spark] class TaskSetManager( logInfo("Ignoring task-finished event for " + info.id + " in stage " + taskSet.id + " because task " + index + " has already completed successfully") } -failedExecutors.remove(index) + +blacklistTracker.foreach(_.updateFailureExecutors(stageId, info, Success)) --- End diff -- Thanks for point it out, I will change it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core] Bypass unnecessary network ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/9478#discussion_r46020888 --- Diff: core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala --- @@ -47,12 +47,35 @@ private[spark] class IndexShuffleBlockResolver(conf: SparkConf) extends ShuffleB private val transportConf = SparkTransportConf.fromSparkConf(conf) - def getDataFile(shuffleId: Int, mapId: Int): File = { -blockManager.diskBlockManager.getFile(ShuffleDataBlockId(shuffleId, mapId, NOOP_REDUCE_ID)) + private def getDataFile( + shuffleId: Int, + mapId: Int, + blockManagerId: BlockManagerId = blockManager.blockManagerId) +: File = { +if (blockManager.blockManagerId != blockManagerId) { + blockManager.diskBlockManager.getShuffleFileBypassNetworkAccess( +ShuffleDataBlockId(shuffleId, mapId, NOOP_REDUCE_ID), blockManagerId) +} else { + blockManager.diskBlockManager.getFile( +ShuffleDataBlockId(shuffleId, mapId, NOOP_REDUCE_ID)) +} } - private def getIndexFile(shuffleId: Int, mapId: Int): File = { -blockManager.diskBlockManager.getFile(ShuffleIndexBlockId(shuffleId, mapId, NOOP_REDUCE_ID)) + def getDataFile(shuffleId: Int, mapId: Int): File = +getDataFile(shuffleId, mapId, blockManager.blockManagerId) + + private def getIndexFile( + shuffleId: Int, + mapId: Int, + blockManagerId: BlockManagerId = blockManager.blockManagerId) +: File = { +if (blockManager.blockManagerId != blockManagerId) { + blockManager.diskBlockManager.getShuffleFileBypassNetworkAccess( +ShuffleIndexBlockId(shuffleId, mapId, NOOP_REDUCE_ID), blockManagerId) +} else { + blockManager.diskBlockManager.getFile( +ShuffleIndexBlockId(shuffleId, mapId, NOOP_REDUCE_ID)) --- End diff -- Fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core] Bypass unnecessary network ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/9478#discussion_r46020880 --- Diff: core/src/main/scala/org/apache/spark/network/BlockDataManager.scala --- @@ -30,6 +30,13 @@ trait BlockDataManager { def getBlockData(blockId: BlockId): ManagedBuffer /** + * Interface to get the shuffle block data that block manager with given blockManagerId + * holds in a local host. Throws an exception if the block cannot be found or + * cannot be read successfully. + */ + def getShuffleBlockData(blockId: ShuffleBlockId, blockManagerId: BlockManagerId): ManagedBuffer --- End diff -- Renamed `getShuffleBlockData` into `getBlockData`. Is this fix correct for your comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/9991#issuecomment-160055041 The original purpose of this PR is to fix the `SQLListenerMemoryLeakSuite` test failure. This can be resolved by clearing `SQLContext.sqlListener` before the test. To prevent memory leak similar to SPARK-11700, I added a `SparkContext` stop hook to clear the `sqlListener` reference. I didn't make `SQLContext.clearSqlListener` a public API because it seems a little confusing for users to call it. And the `sqlListener` is added to SparkContext, it will not be GCed at once even if a user calls `SQLContext.clearSqlListener`. Now we clear the reference when the `SparkContext` is being stopped to allow the `sqlListener` to be GCed later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core] Bypass unnecessary network ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/9478#discussion_r46020431 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -58,6 +58,17 @@ final class ShuffleBlockFetcherIterator( import ShuffleBlockFetcherIterator._ + private[this] val enableExternalShuffleService = +blockManager.conf.getBoolean("spark.shuffle.service.enabled", false) + + /** + * If this option enabled, bypass unnecessary network interaction + * if multiple block managers work in a single host. + */ + private[this] val enableBypassNetworkAccess = +blockManager.conf.getBoolean("spark.shuffle.bypassNetworkAccess", false) && --- End diff -- I agree though, if this bypassing always enabled, code blocks to fetch remote blocks are not totally tested in LocalSparkContext. Thought? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9991#issuecomment-160053650 **[Test build #46805 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46805/consoleFull)** for PR 9991 at commit [`4549f62`](https://github.com/apache/spark/commit/4549f62866f8917451dcc4d775943b5232186c46). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11878][SQL]: Eliminate distribute by in...
Github user saucam commented on a diff in the pull request: https://github.com/apache/spark/pull/9858#discussion_r46020227 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala --- @@ -488,6 +488,12 @@ private[sql] case class EnsureRequirements(sqlContext: SQLContext) extends Rule[ } def apply(plan: SparkPlan): SparkPlan = plan.transformUp { +case operator @ Exchange(partitioning, child, _) => + child.children match { +case Exchange(childPartitioning, baseChild, _)::Nil => --- End diff -- Yes, I thought the same, but then it will again be not as generic as this, since SparkStrategies are applied first and till that time we don;t have the exchanges added. So it will be similar to my previous change done in optimizer in that it will check that the child plan is an aggregate or not instead of testing for an Exchange. Will that be acceptable ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core] Bypass unnecessary network ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/9478#discussion_r46020157 --- Diff: core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala --- @@ -47,12 +47,35 @@ private[spark] class IndexShuffleBlockResolver(conf: SparkConf) extends ShuffleB private val transportConf = SparkTransportConf.fromSparkConf(conf) - def getDataFile(shuffleId: Int, mapId: Int): File = { -blockManager.diskBlockManager.getFile(ShuffleDataBlockId(shuffleId, mapId, NOOP_REDUCE_ID)) + private def getDataFile( + shuffleId: Int, + mapId: Int, + blockManagerId: BlockManagerId = blockManager.blockManagerId) +: File = { +if (blockManager.blockManagerId != blockManagerId) { + blockManager.diskBlockManager.getShuffleFileBypassNetworkAccess( +ShuffleDataBlockId(shuffleId, mapId, NOOP_REDUCE_ID), blockManagerId) +} else { + blockManager.diskBlockManager.getFile( +ShuffleDataBlockId(shuffleId, mapId, NOOP_REDUCE_ID)) +} } - private def getIndexFile(shuffleId: Int, mapId: Int): File = { -blockManager.diskBlockManager.getFile(ShuffleIndexBlockId(shuffleId, mapId, NOOP_REDUCE_ID)) + def getDataFile(shuffleId: Int, mapId: Int): File = +getDataFile(shuffleId, mapId, blockManager.blockManagerId) --- End diff -- Other class methods such as `BypassMergeSortShuffleWriter` and `SortShuffleWriter` seem to use this public function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12018][SQL] Refactor common subexpressi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10009#issuecomment-160052783 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12018][SQL] Refactor common subexpressi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10009#issuecomment-160052784 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46803/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12018][SQL] Refactor common subexpressi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10009#issuecomment-160052724 **[Test build #46803 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46803/consoleFull)** for PR 10009 at commit [`b162ba5`](https://github.com/apache/spark/commit/b162ba516b46c1fa579efbe7ec0ac9c60477cb1f). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * ` case class SubExprEliminationState(isNull: String, value: String)`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12020] [TESTS] [test-hadoop2.0] PR buil...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10010#issuecomment-160050957 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46804/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12020] [TESTS] [test-hadoop2.0] PR buil...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10010#issuecomment-160050955 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12020] [TESTS] [test-hadoop2.0] PR buil...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10010#issuecomment-160050876 **[Test build #46804 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46804/consoleFull)** for PR 10010 at commit [`38194ea`](https://github.com/apache/spark/commit/38194ea3036a80e1352f8edebba183412361c403). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11856][SQL] add type cast if the real t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9840#issuecomment-160050723 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11856][SQL] add type cast if the real t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9840#issuecomment-160050725 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46801/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11856][SQL] add type cast if the real t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9840#issuecomment-160050677 **[Test build #46801 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46801/consoleFull)** for PR 9840 at commit [`2f7370c`](https://github.com/apache/spark/commit/2f7370c33ddda84e306a73d478c6cf470e04837f). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * ` * For example, we build an encoder for `case class Data(a: Int, b: String)` and the real type`\n * `case class Cast(child: Expression, dataType: DataType) extends UnaryExpression `\n * `case class UpCast(child: Expression, dataType: DataType, walkedTypePath: Seq[String])`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12018][SQL] Refactor common subexpressi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10009#issuecomment-160050074 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46799/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12018][SQL] Refactor common subexpressi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10009#issuecomment-160050072 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12018][SQL] Refactor common subexpressi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10009#issuecomment-16005 **[Test build #46799 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46799/consoleFull)** for PR 10009 at commit [`8e937d3`](https://github.com/apache/spark/commit/8e937d3e5b912fcb41a1da0b6a1b137fcafad8fb). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * ` case class SubExprEliminationState(isNull: String, value: String)`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12018][SQL] Refactor common subexpressi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10009#issuecomment-160045593 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11735] [SQL] Add a check in the constru...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9702#issuecomment-160045786 **[Test build #2121 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2121/consoleFull)** for PR 9702 at commit [`238e288`](https://github.com/apache/spark/commit/238e2882f22237b45b41f55ce80f110766698d2e). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12018][SQL] Refactor common subexpressi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10009#issuecomment-160045594 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46795/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7857][MLLIB] Prevent IDFModel from retu...
Github user karlhigley commented on a diff in the pull request: https://github.com/apache/spark/pull/9843#discussion_r46017996 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/IDF.scala --- @@ -211,14 +213,17 @@ private object IDFModel { val n = v.size v match { case SparseVector(size, indices, values) => +val newElements = new ArrayBuffer[(Int, Double)] val nnz = indices.size -val newValues = new Array[Double](nnz) var k = 0 while (k < nnz) { - newValues(k) = values(k) * idf(indices(k)) + val newValue = values(k) * idf(indices(k)) --- End diff -- As the diff shows, the existing code is already calling `idf(indices(k))`. That call may indeed be expensive and represent a potential optimization, but it's distinct from the problem this PR is intended to address. Seems like there might be room for a second JIRA/PR to handle the issue you identified. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12018][SQL] Refactor common subexpressi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10009#issuecomment-160045465 **[Test build #46795 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46795/consoleFull)** for PR 10009 at commit [`b3cf6a8`](https://github.com/apache/spark/commit/b3cf6a8ad94e2ba37c60ccda99f830160fa464d6). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * ` case class SubExprEliminationState(isNull: String, value: String)`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11997][SQL] NPE when save a DataFrame a...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/10001 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12021][Streaming][Tests]Fix the potenti...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10011#issuecomment-160044203 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12021][Streaming][Tests]Fix the potenti...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10011#issuecomment-160044204 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46800/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12021][Streaming][Tests]Fix the potenti...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10011#issuecomment-160044154 **[Test build #46800 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46800/consoleFull)** for PR 10011 at commit [`82d45c9`](https://github.com/apache/spark/commit/82d45c917b43f631f862ce5b76dd4cf164c4253c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12020] [TESTS] [test-hadoop2.0] PR buil...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10010#issuecomment-160042498 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12020] [TESTS] [test-hadoop2.0] PR buil...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10010#issuecomment-160042499 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46787/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12020] [TESTS] [test-hadoop2.0] PR buil...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10010#issuecomment-160042480 **[Test build #46787 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46787/consoleFull)** for PR 10010 at commit [`38194ea`](https://github.com/apache/spark/commit/38194ea3036a80e1352f8edebba183412361c403). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7857][MLLIB] Prevent IDFModel from retu...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/9843#discussion_r46017051 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/IDF.scala --- @@ -211,14 +213,17 @@ private object IDFModel { val n = v.size v match { case SparseVector(size, indices, values) => +val newElements = new ArrayBuffer[(Int, Double)] val nnz = indices.size -val newValues = new Array[Double](nnz) var k = 0 while (k < nnz) { - newValues(k) = values(k) * idf(indices(k)) + val newValue = values(k) * idf(indices(k)) --- End diff -- `idf(indices(k))` will do binary search when `idf` is `sparse`, and this will be too expensive. Have them handled using specialized method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11987] Python API update for ChiSqSelec...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10007#issuecomment-160040708 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12018][SQL] Refactor common subexpressi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10009#issuecomment-160040698 **[Test build #46803 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46803/consoleFull)** for PR 10009 at commit [`b162ba5`](https://github.com/apache/spark/commit/b162ba516b46c1fa579efbe7ec0ac9c60477cb1f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11987] Python API update for ChiSqSelec...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10007#issuecomment-160040710 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46802/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11987] Python API update for ChiSqSelec...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10007#issuecomment-160040701 **[Test build #46802 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46802/consoleFull)** for PR 10007 at commit [`3789867`](https://github.com/apache/spark/commit/3789867215621f968bf4fdcb233ec70825212925). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12020] [TESTS] [test-hadoop2.0] PR buil...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10010#issuecomment-160040633 **[Test build #46804 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46804/consoleFull)** for PR 10010 at commit [`38194ea`](https://github.com/apache/spark/commit/38194ea3036a80e1352f8edebba183412361c403). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682][Core] Add encrypted shuffle in sp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-160040221 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682][Core] Add encrypted shuffle in sp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-160040225 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46793/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682][Core] Add encrypted shuffle in sp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-160039738 **[Test build #46793 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46793/consoleFull)** for PR 8880 at commit [`8b0aa5e`](https://github.com/apache/spark/commit/8b0aa5e647f5b8a47cbe45cd4b582130b82886d6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12020] [TESTS] [test-hadoop2.0] PR buil...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/10010#issuecomment-160039041 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12021][Streaming][Tests]Fix the potenti...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/10011#issuecomment-160038407 Thanks for the quick fix, shixiong Happy Thanksgiving --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11856][SQL] add type cast if the real t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9840#issuecomment-160037107 **[Test build #46801 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46801/consoleFull)** for PR 9840 at commit [`2f7370c`](https://github.com/apache/spark/commit/2f7370c33ddda84e306a73d478c6cf470e04837f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11987] Python API update for ChiSqSelec...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10007#issuecomment-160037065 **[Test build #46802 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46802/consoleFull)** for PR 10007 at commit [`3789867`](https://github.com/apache/spark/commit/3789867215621f968bf4fdcb233ec70825212925). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12021][Streaming][Tests]Fix the potenti...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10011#issuecomment-160036796 **[Test build #46800 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46800/consoleFull)** for PR 10011 at commit [`82d45c9`](https://github.com/apache/spark/commit/82d45c917b43f631f862ce5b76dd4cf164c4253c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12018][SQL] Refactor common subexpressi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10009#issuecomment-160036411 **[Test build #46799 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46799/consoleFull)** for PR 10009 at commit [`8e937d3`](https://github.com/apache/spark/commit/8e937d3e5b912fcb41a1da0b6a1b137fcafad8fb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12021][Streaming][Tests]Fix the potenti...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/10011#issuecomment-160036356 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11856][SQL] add type cast if the real t...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/9840#issuecomment-160036107 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11856][SQL] add type cast if the real t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9840#issuecomment-160036090 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46798/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12021][Streaming][Tests]Fix the potenti...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10011#issuecomment-160036082 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46797/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12021][Streaming][Tests]Fix the potenti...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10011#issuecomment-160036081 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11856][SQL] add type cast if the real t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9840#issuecomment-160036089 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12018][SQL] Refactor common subexpressi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10009#discussion_r46015947 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -448,18 +438,12 @@ class CodeGenContext { // 2. Less code. // Currently, we will do this for all non-leaf only expression trees (i.e. expr trees with // at least two nodes) as the cost of doing it is expected to be low. - - // Maintain the loaded value and isNull as member variables. This is necessary if the codegen - // function is split across multiple functions. - // TODO: maintaining this as a local variable probably allows the compiler to do better - // optimizations. - addMutableState("boolean", isLoaded, s"$isLoaded = false;") addMutableState("boolean", isNull, s"$isNull = false;") addMutableState(javaType(expr.dataType), value, s"$value = ${defaultValue(expr.dataType)};") - subExprIsLoadedVariables += isLoaded - val state = SubExprEliminationState(isLoaded, code, fnName) + subExprResetVariables += s"${fnName}($INPUT_ROW);" --- End diff -- nit: `$fnName($INPUT_ROW)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12018][SQL] Refactor common subexpressi...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/10009#issuecomment-160036072 LGTM overall, cc @nongli to take another look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12018][SQL] Refactor common subexpressi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10009#discussion_r46015904 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -94,13 +94,9 @@ abstract class Expression extends TreeNode[Expression] { def gen(ctx: CodeGenContext): GeneratedExpressionCode = { ctx.subExprEliminationExprs.get(this).map { subExprState => // This expression is repeated meaning the code to evaluated has already been added - // as a function, `subExprState.fnName`. Just call that. - val code = -s""" - |/* $this */ - |${subExprState.fnName}(${ctx.INPUT_ROW}); - """.stripMargin.trim - GeneratedExpressionCode(code, subExprState.code.isNull, subExprState.code.value) + // as a function, `subExprState.fnName`, and called. Just use it. --- End diff -- there is no `fnName` in `SubExprEliminationState` anymore right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12018][SQL] Refactor common subexpressi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10009#discussion_r46015873 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -104,16 +104,13 @@ class CodeGenContext { val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions // State used for subexpression elimination. - case class SubExprEliminationState( - isLoaded: String, - code: GeneratedExpressionCode, - fnName: String) + case class SubExprEliminationState(isNull: String, value: String) // Foreach expression that is participating in subexpression elimination, the state to use. val subExprEliminationExprs = mutable.HashMap.empty[Expression, SubExprEliminationState] - // The collection of isLoaded variables that need to be reset on each row. - val subExprIsLoadedVariables = mutable.ArrayBuffer.empty[String] + // The collection of variables that need to be reset on each row. --- End diff -- how about `The collection of sub-exression result reset method that need to be called on each row.`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5337][Mesos][Standalone] respect spark....
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8610#issuecomment-160035457 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5337][Mesos][Standalone] respect spark....
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8610#issuecomment-160035436 **[Test build #46785 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46785/consoleFull)** for PR 8610 at commit [`8232a80`](https://github.com/apache/spark/commit/8232a808e398f3644304822d1824aa0b923090dc). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5337][Mesos][Standalone] respect spark....
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8610#issuecomment-160035458 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46785/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11954][SQL] Encoder for JavaBeans
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/9937#issuecomment-160035430 @marmbrus do you mean we should write java version of `extractorsFor` and `constructorFor` in JavaTypeInference? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] [Spark Core] Limit number of conc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5852#issuecomment-160035323 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] [Spark Core] Limit number of conc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5852#issuecomment-160035325 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46796/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] [Spark Core] Limit number of conc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5852#issuecomment-160035322 [Test build #46796 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46796/consoleFull) for PR 5852 at commit [`0d8088a`](https://github.com/apache/spark/commit/0d8088a2714a4324b41326c3c4b45fe6a2acf46a). * This patch **fails to build**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12018][SQL] Refactor common subexpressi...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/10009#discussion_r46015674 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -417,18 +413,12 @@ class CodeGenContext { val code = expr.gen(this) val fn = s""" - |private void $fnName(InternalRow ${INPUT_ROW}) { - | if (!$isLoaded) { - |${code.code.trim} - |$isLoaded = true; - |$isNull = ${code.isNull}; - |$value = ${code.value}; - | } + |private ${javaType(expr.dataType)} $fnName(InternalRow ${INPUT_ROW}) { + | ${code.code.trim} + | $isNull = ${code.isNull}; + | return ${code.value}; --- End diff -- ok. make sense. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12021][Streaming][Tests]Fix the potenti...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/10011#issuecomment-160035000 cc @tdas @tedyu @andrewor14 BTW, there is a similar test in SparkListenerSuite. But SparkContext doesn't hold a lock in `stop` method, it won't be a problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11856][SQL] add type cast if the real t...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/9840#discussion_r46015611 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala --- @@ -85,12 +85,24 @@ case class DecimalType(precision: Int, scale: Int) extends FractionalType { private[sql] def isWiderThan(other: DataType): Boolean = other match { case dt: DecimalType => (precision - scale) >= (dt.precision - dt.scale) && scale >= dt.scale -case dt: IntegralType => --- End diff -- I'm sure if this is by intention. Why we ignore fraction type? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11788][SQL]:surround timestamp/date val...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9872#issuecomment-160034888 **[Test build #2122 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2122/consoleFull)** for PR 9872 at commit [`ece3838`](https://github.com/apache/spark/commit/ece383837b9ed7d176d35f10460f7208184056bd). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core] Bypass unnecessary network ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9478#issuecomment-160034859 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46794/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core] Bypass unnecessary network ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9478#issuecomment-160034858 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core] Bypass unnecessary network ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9478#issuecomment-160034840 **[Test build #46794 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46794/consoleFull)** for PR 9478 at commit [`ba94687`](https://github.com/apache/spark/commit/ba94687b48d08fc6a4c863fbafeb5d39181cc53c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * ` case class GetLocalDirsPath(blockManagerId: BlockManagerId) extends ToBlockManagerMaster`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11788][SQL]:surround timestamp/date val...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9872#issuecomment-160034762 **[Test build #2122 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2122/consoleFull)** for PR 9872 at commit [`ece3838`](https://github.com/apache/spark/commit/ece383837b9ed7d176d35f10460f7208184056bd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12021][Streaming][Tests]Fix the potenti...
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/10011 [SPARK-12021][Streaming][Tests]Fix the potential dead-lock in StreamingListenerSuite In StreamingListenerSuite."don't call ssc.stop in listener", after the main thread calls `ssc.stop()`, `StreamingContextStoppingCollector` may call `ssc.stop()` in the listener bus thread, which is a dead-lock. This PR updated `StreamingContextStoppingCollector` to only call `ssc.stop()` in the first batch to avoid the dead-lock. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark fix-test-deadlock Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10011.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10011 commit 82d45c917b43f631f862ce5b76dd4cf164c4253c Author: Shixiong Zhu Date: 2015-11-27T03:28:25Z Fix the potential dead-lock in StreamingListenerSuite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11788][SQL]:surround timestamp/date val...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/9872#issuecomment-160034681 Can you update the title to "[SPARK-11788][SQL] surround timestamp/date value with quotes in JDBC data source" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] [Spark Core] Limit number of conc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5852#issuecomment-160034648 [Test build #46796 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46796/consoleFull) for PR 5852 at commit [`0d8088a`](https://github.com/apache/spark/commit/0d8088a2714a4324b41326c3c4b45fe6a2acf46a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11788][SQL]:surround timestamp/date val...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/9872#issuecomment-160034631 LGTM - Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12018][SQL] Refactor common subexpressi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10009#discussion_r46015355 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -104,16 +104,13 @@ class CodeGenContext { val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions // State used for subexpression elimination. - case class SubExprEliminationState( - isLoaded: String, - code: GeneratedExpressionCode, - fnName: String) + case class SubExprEliminationState(isNull: String, value: String) // Foreach expression that is participating in subexpression elimination, the state to use. val subExprEliminationExprs = mutable.HashMap.empty[Expression, SubExprEliminationState] - // The collection of isLoaded variables that need to be reset on each row. - val subExprIsLoadedVariables = mutable.ArrayBuffer.empty[String] + // The collection of variables that need to be initialized on each row. --- End diff -- I think `reset` is a more proper word here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: fixes SPARK-11991
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/9975 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11997][SQL] NPE when save a DataFrame a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10001#issuecomment-160034260 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46791/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11997][SQL] NPE when save a DataFrame a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10001#issuecomment-160034258 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11997][SQL] NPE when save a DataFrame a...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10001#issuecomment-160034212 **[Test build #46791 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46791/consoleFull)** for PR 10001 at commit [`4de7697`](https://github.com/apache/spark/commit/4de7697753f0da6810190bea804b9f490a68bb98). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: fixes SPARK-11991
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/9975#issuecomment-160034014 Thanks - going to merge this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11753][SQL] Make allowNonNumericNumbers...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/9759#discussion_r46015226 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonParsingOptionsSuite.scala --- @@ -93,22 +93,31 @@ class JsonParsingOptionsSuite extends QueryTest with SharedSQLContext { assert(df.first().getLong(0) == 18) } - // The following two tests are not really working - need to look into Jackson's - // JsonParser.Feature.ALLOW_NON_NUMERIC_NUMBERS. - ignore("allowNonNumericNumbers off") { -val str = """{"age": NaN}""" -val rdd = sqlContext.sparkContext.parallelize(Seq(str)) -val df = sqlContext.read.json(rdd) + test("allowNonNumericNumbers off") { +val testCases: Seq[String] = Seq("""{"age": NaN}""", """{"age": Infinity}""", --- End diff -- we should add a test for quoted NaN, inf, etc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11753][SQL] Make allowNonNumericNumbers...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/9759#discussion_r46015193 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonParser.scala --- @@ -100,34 +101,27 @@ object JacksonParser { parser.getFloatValue case (VALUE_STRING, FloatType) => -// Special case handling for NaN and Infinity. --- End diff -- why are we removing the special handling for float types here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12018][SQL] Refactor common subexpressi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10009#discussion_r46015162 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -417,18 +413,12 @@ class CodeGenContext { val code = expr.gen(this) val fn = s""" - |private void $fnName(InternalRow ${INPUT_ROW}) { - | if (!$isLoaded) { - |${code.code.trim} - |$isLoaded = true; - |$isNull = ${code.isNull}; - |$value = ${code.value}; - | } + |private ${javaType(expr.dataType)} $fnName(InternalRow ${INPUT_ROW}) { --- End diff -- nit: if there is only one variable, we can omit the `{}`, i.e. `$INPUT_ROW` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11778][SQL]:add regression test
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/9890 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12018][SQL] Refactor common subexpressi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10009#discussion_r46015118 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -417,18 +413,12 @@ class CodeGenContext { val code = expr.gen(this) val fn = s""" - |private void $fnName(InternalRow ${INPUT_ROW}) { - | if (!$isLoaded) { - |${code.code.trim} - |$isLoaded = true; - |$isNull = ${code.isNull}; - |$value = ${code.value}; - | } + |private ${javaType(expr.dataType)} $fnName(InternalRow ${INPUT_ROW}) { + | ${code.code.trim} + | $isNull = ${code.isNull}; + | return ${code.value}; --- End diff -- and then we can remove the `subExprInitVariables` because the reset is just calling this function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12018][SQL] Refactor common subexpressi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10009#issuecomment-160033692 **[Test build #46795 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46795/consoleFull)** for PR 10009 at commit [`b3cf6a8`](https://github.com/apache/spark/commit/b3cf6a8ad94e2ba37c60ccda99f830160fa464d6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11917][PYSPARK] Add SQLContext#dropTemp...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/9903 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11778][SQL]:add regression test
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/9890#issuecomment-160033616 Thanks - I'm merging this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12018][SQL] Refactor common subexpressi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10009#discussion_r46015031 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -417,18 +413,12 @@ class CodeGenContext { val code = expr.gen(this) val fn = s""" - |private void $fnName(InternalRow ${INPUT_ROW}) { - | if (!$isLoaded) { - |${code.code.trim} - |$isLoaded = true; - |$isNull = ${code.isNull}; - |$value = ${code.value}; - | } + |private ${javaType(expr.dataType)} $fnName(InternalRow ${INPUT_ROW}) { + | ${code.code.trim} + | $isNull = ${code.isNull}; + | return ${code.value}; --- End diff -- can we make this method return void? we can just assign values to `isNull` and `value` as they are both member variables. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11881][SQL] Fix for postgresql fetchsiz...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/9861 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org