[GitHub] spark pull request: [SPARK-14014] [SQL] Replace existing catalog w...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/11836#discussion_r56755343 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -31,17 +32,34 @@ import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, SubqueryAlias} * proxy to the underlying metastore (e.g. Hive Metastore) and it also manages temporary * tables and functions of the Spark Session that it belongs to. */ -class SessionCatalog(externalCatalog: ExternalCatalog) { +class SessionCatalog(externalCatalog: ExternalCatalog, conf: CatalystConf) { import ExternalCatalog._ - private[this] val tempTables = new ConcurrentHashMap[String, LogicalPlan] - private[this] val tempFunctions = new ConcurrentHashMap[String, CatalogFunction] + def this(externalCatalog: ExternalCatalog) { +this(externalCatalog, new SimpleCatalystConf(true)) + } + + protected[this] val tempTables = new ConcurrentHashMap[String, LogicalPlan] + protected[this] val tempFunctions = new ConcurrentHashMap[String, CatalogFunction] // Note: we track current database here because certain operations do not explicitly // specify the database (e.g. DROP TABLE my_table). In these cases we must first // check whether the temporary table or function exists, then, if not, operate on // the corresponding item in the current database. - private[this] var currentDb = "default" + protected[this] var currentDb = { +val defaultName = "default" +val defaultDbDefinition = CatalogDatabase(defaultName, "default database", "", Map()) +// Initialize default database if it doesn't already exist +createDatabase(defaultDbDefinition, ignoreIfExists = true) +defaultName + } + + /** + * Format table name, taking into account case sensitivity. + */ + protected[this] def formatTableName(name: String): String = { +if (conf.caseSensitiveAnalysis) name else name.toLowerCase --- End diff -- Later, it will be good to use this to handle other db name as well for the consistency reason (it will not actually have any effect right now though). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14014] [SQL] Replace existing catalog w...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/11836#discussion_r56755267 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -31,17 +32,34 @@ import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, SubqueryAlias} * proxy to the underlying metastore (e.g. Hive Metastore) and it also manages temporary * tables and functions of the Spark Session that it belongs to. */ -class SessionCatalog(externalCatalog: ExternalCatalog) { +class SessionCatalog(externalCatalog: ExternalCatalog, conf: CatalystConf) { import ExternalCatalog._ - private[this] val tempTables = new ConcurrentHashMap[String, LogicalPlan] - private[this] val tempFunctions = new ConcurrentHashMap[String, CatalogFunction] + def this(externalCatalog: ExternalCatalog) { +this(externalCatalog, new SimpleCatalystConf(true)) + } + + protected[this] val tempTables = new ConcurrentHashMap[String, LogicalPlan] + protected[this] val tempFunctions = new ConcurrentHashMap[String, CatalogFunction] // Note: we track current database here because certain operations do not explicitly // specify the database (e.g. DROP TABLE my_table). In these cases we must first // check whether the temporary table or function exists, then, if not, operate on // the corresponding item in the current database. - private[this] var currentDb = "default" + protected[this] var currentDb = { +val defaultName = "default" +val defaultDbDefinition = CatalogDatabase(defaultName, "default database", "", Map()) +// Initialize default database if it doesn't already exist +createDatabase(defaultDbDefinition, ignoreIfExists = true) +defaultName + } + + /** + * Format table name, taking into account case sensitivity. + */ + protected[this] def formatTableName(name: String): String = { +if (conf.caseSensitiveAnalysis) name else name.toLowerCase --- End diff -- Just a note at here. Hive metastore is always case insensitive. So, the case sensitivity setting is mainly for temp tables and temp functions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13629] [ML] Add binary toggle Param to ...
Github user hhbyyh commented on the pull request: https://github.com/apache/spark/pull/11536#issuecomment-198511013 Get it. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13974][SQL] sub-query names do not need...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11783#issuecomment-197987787 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53436/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13038] [PySpark] Add load/save to pipel...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/11683#issuecomment-197552521 https://issues.apache.org/jira/browse/SPARK-13951 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13898][SQL] Merge DatasetHolder and Dat...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11737#issuecomment-197734353 **[Test build #53394 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53394/consoleFull)** for PR 11737 at commit [`59cae95`](https://github.com/apache/spark/commit/59cae95a34fb8bd8cfee0da5b34fc5d27b0f85d2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13972] [SQL] [FOLLOW-UP] When creating ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/11825#issuecomment-198464786 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13873] [SQL] Avoid copy of UnsafeRow wh...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11740#discussion_r56376958 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegen.scala --- @@ -396,7 +401,7 @@ case class WholeStageCodegen(child: SparkPlan) extends UnaryNode with CodegenSup s""" |$evaluateInputs |${code.code.trim} - |append(${code.value}.copy()); + |append(${code.value}$doCopy); --- End diff -- If there is only one row will be buffered here, we do not need to copy it. The parent of WholeStageCodegen will do the copy if needed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13845][CORE]Using onBlockUpdated to rep...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11679#issuecomment-197753298 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8884][MLlib] 1-sample Anderson-Darling ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11780#issuecomment-197789313 **[Test build #53416 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53416/consoleFull)** for PR 11780 at commit [`a9a59cb`](https://github.com/apache/spark/commit/a9a59cba0d0ba29c8a483a6bf18426db14d59860). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198026929 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198032509 **[Test build #53451 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53451/consoleFull)** for PR 11788 at commit [`6385777`](https://github.com/apache/spark/commit/638577786d4849665fad561b0042efebe5babb8f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11756#issuecomment-197636844 @cloud-fan Sorry, one more question. Would it be great if we maybe make `spark.sql.columnNameOfCorruptRecord` as an option just like the compression option for other data sources? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13816][Graphx] Add parameter checks for...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11655#issuecomment-197277636 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-13809][SQL] State store for stream...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/11645#discussion_r56376064 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDDSuite.scala --- @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.streaming.state + +import java.io.File +import java.nio.file.Files + +import scala.util.Random + +import org.scalatest.{BeforeAndAfter, BeforeAndAfterAll} + +import org.apache.spark.{SparkConf, SparkContext, SparkFunSuite} +import org.apache.spark.LocalSparkContext._ +import org.apache.spark.rdd.RDD +import org.apache.spark.scheduler.ExecutorCacheTaskLocation +import org.apache.spark.sql.catalyst.util.quietly +import org.apache.spark.util.Utils + +class StateStoreRDDSuite extends SparkFunSuite with BeforeAndAfter with BeforeAndAfterAll { + + private val conf = new SparkConf().setMaster("local").setAppName(this.getClass.getCanonicalName) + private var tempDir = Files.createTempDirectory("StateStoreRDDSuite").toString + + import StateStoreCoordinatorSuite._ + import StateStoreSuite._ + + after { +StateStore.stop() + } + + override def afterAll(): Unit = { +super.afterAll() +Utils.deleteRecursively(new File(tempDir)) + } + + test("versioning and immutability") { +quietly { + withSpark(new SparkContext(conf)) { sc => +val path = Utils.createDirectory(tempDir, Random.nextString(10)).toString --- End diff -- createTempDir --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12469][CORE][WIP/RFC] Consistent accumu...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/11105#discussion_r56421658 --- Diff: core/src/main/scala/org/apache/spark/Accumulable.scala --- @@ -146,6 +212,32 @@ class Accumulable[R, T] private ( def merge(term: R) { value_ = param.addInPlace(value_, term)} /** + * Merge in pending updates for ac consistent accumulators or merge accumulated values for + * regular accumulators. This is only called on the driver when merging task results together. + */ + private[spark] def internalMerge(term: Any) { +if (!consistent) { + merge(term.asInstanceOf[R]) +} else { + mergePending(term.asInstanceOf[mutable.HashMap[(Int, Int, Int), R]]) +} + } + + /** + * Merge another Accumulable's pending updates, checks to make sure that each pending update has + * not already been processed before updating. + */ + private[spark] def mergePending(term: mutable.HashMap[(Int, Int, Int), R]) = { +term.foreach{case ((rddId, shuffleId, splitId), v) => + val splits = processed.getOrElseUpdate((rddId, shuffleId), new mutable.BitSet()) + if (!splits.contains(splitId)) { +splits += splitId +value_ = param.addInPlace(value_, v) + } --- End diff -- I don't think you need both `completed` and `processed` -- they seem to be doing more or less the same thing. You could change this to: ```scala term.foreach{case (partitionKey, v) => if (!completed.contains(partitionKey)) { completed += partitionKey value_ = param.addInPlace(value_, v) } } ``` If I understand correctly, there is a bit of logical distinction between the two -- `processed` was being used on the driver, that is across multiple tasks, to track what had been accumulated and what hadn't. `completed`, OTOH, was being used on the executors, to track the updates coming from *one task*. Typically it would only contain a single entry, for the one `(rdd, shuffle, partition)` tuple that was being used, since tasks *usually* only compute one partition, but that isn't true when there is a coalesce involved. If that explanation sounds correct, its probably best to put it into a comment somewhere, and I'd still say that you can eliminate `processed` and just explain how `completed` will get used in two different ways on the executors and on the driver. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-197537016 **[Test build #53343 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53343/consoleFull)** for PR 10355 at commit [`9feca44`](https://github.com/apache/spark/commit/9feca44e2796baaa9ccf5dfb03e2b9a0eabb731b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-13809][SQL] State store for stream...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/11645#discussion_r56376208 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala --- @@ -0,0 +1,471 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.streaming.state + +import java.io.File + +import scala.collection.mutable +import scala.util.Random + +import org.apache.hadoop.fs.Path +import org.scalatest.{BeforeAndAfter, PrivateMethodTester} +import org.scalatest.concurrent.Eventually._ +import org.scalatest.time.SpanSugar._ + +import org.apache.spark.{SparkConf, SparkContext, SparkFunSuite} +import org.apache.spark.LocalSparkContext._ +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.GenericInternalRow +import org.apache.spark.sql.catalyst.util.quietly +import org.apache.spark.unsafe.types.UTF8String +import org.apache.spark.util.Utils + +class StateStoreSuite extends SparkFunSuite with BeforeAndAfter with PrivateMethodTester { + type MapType = mutable.HashMap[InternalRow, InternalRow] + + import StateStoreCoordinatorSuite._ + import StateStoreSuite._ + + private val tempDir = Utils.createTempDir().toString + + after { +StateStore.stop() + } + + test("update, remove, commit, and all data iterator") { +val provider = newStoreProvider() + +// Verify state before starting a new set of updates +assert(provider.latestIterator().isEmpty) + +val store = provider.getStore(0) +assert(!store.hasCommitted) +intercept[IllegalStateException] { + store.iterator() +} +intercept[IllegalStateException] { + store.updates() +} + +// Verify state after updating +update(store, "a", 1) +intercept[IllegalStateException] { + store.iterator() +} +intercept[IllegalStateException] { + store.updates() +} +assert(provider.latestIterator().isEmpty) + +// Make updates, commit and then verify state +update(store, "b", 2) +update(store, "aa", 3) +remove(store, _.startsWith("a")) +assert(store.commit() === 1) + +assert(store.hasCommitted) +assert(unwrapToSet(store.iterator()) === Set("b" -> 2)) +assert(unwrapToSet(provider.latestIterator()) === Set("b" -> 2)) +assert(fileExists(provider, version = 1, isSnapshot = false)) +assert(getDataFromFiles(provider) === Set("b" -> 2)) + +// Trying to get newer versions should fail +intercept[Exception] { + provider.getStore(2) +} +intercept[Exception] { + getDataFromFiles(provider, 2) +} + +// New updates to the reloaded store with new version, and does not change old version +val reloadedStore = new HDFSBackedStateStoreProvider(store.id, provider.directory).getStore(1) +update(reloadedStore, "c", 4) +assert(reloadedStore.commit() === 2) +assert(unwrapToSet(reloadedStore.iterator()) === Set("b" -> 2, "c" -> 4)) +assert(getDataFromFiles(provider) === Set("b" -> 2, "c" -> 4)) +assert(getDataFromFiles(provider, version = 1) === Set("b" -> 2)) +assert(getDataFromFiles(provider, version = 2) === Set("b" -> 2, "c" -> 4)) + } + + test("updates iterator with all combos of updates and removes") { +val provider = newStoreProvider() +var currentVersion: Int = 0 +def withStore(body: StateStore => Unit): Unit = { + val store = provider.getStore(currentVersion) + body(store) + currentVersion += 1 +} + +// New data should be seen in updates as value added, even if they had multiple updates +withStore { store => + update(store, "a", 1) + update(store, "aa", 1) + update(store, "aa", 2)
[GitHub] spark pull request: [Spark-13772] fix data type mismatch for decim...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11605#issuecomment-197827396 **[Test build #53421 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53421/consoleFull)** for PR 11605 at commit [`42addd6`](https://github.com/apache/spark/commit/42addd64bfb864ff59fecc5c4c11852d7cd49f60). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][DOC] Add JavaStreamingTestExample
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/11776#discussion_r56478518 --- Diff: examples/src/main/java/org/apache/spark/examples/mllib/JavaStreamingTestExample.java --- @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.mllib; + + +import org.apache.spark.Accumulator; +import org.apache.spark.api.java.function.VoidFunction; +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.api.java.function.Function; +// $example on$ +import org.apache.spark.mllib.stat.test.BinarySample; +import org.apache.spark.mllib.stat.test.StreamingTest; +import org.apache.spark.mllib.stat.test.StreamingTestResult; +// $example off$ +import org.apache.spark.SparkConf; +import org.apache.spark.streaming.Duration; +import org.apache.spark.streaming.Seconds; +import org.apache.spark.streaming.api.java.JavaDStream; +import org.apache.spark.streaming.api.java.JavaStreamingContext; +import org.apache.spark.util.Utils; + + +/** + * Perform streaming testing using Welch's 2-sample t-test on a stream of data, where the data + * stream arrives as text files in a directory. Stops when the two groups are statistically + * significant (p-value < 0.05) or after a user-specified timeout in number of batches is exceeded. + * + * The rows of the text files must be in the form `Boolean, Double`. For example: + * false, -3.92 + * true, 99.32 + * + * Usage: + * JavaStreamingTestExample + * + * To run on your local machine using the directory `dataDir` with 5 seconds between each batch and + * a timeout after 100 insignificant batches, call: + *$ bin/run-example mllib.JavaStreamingTestExample dataDir 5 100 + * + * As you add text files to `dataDir` the significance test wil continually update every + * `batchDuration` seconds until the test becomes significant (p-value < 0.05) or the number of + * batches processed exceeds `numBatchesTimeout`. + */ +public class JavaStreamingTestExample { + public static void main(String[] args) { +if (args.length != 3) { + System.err.println("Usage: JavaStreamingTestExample " + +" "); +System.exit(1); +} + +String dataDir = args[0]; +Duration batchDuration = Seconds.apply(Long.valueOf(args[1])); +int numBatchesTimeout = Integer.valueOf(args[2]); + +SparkConf conf = new SparkConf().setMaster("local").setAppName("StreamingTestExample"); +JavaStreamingContext ssc = new JavaStreamingContext(conf, batchDuration); + + ssc.checkpoint(Utils.createTempDir(System.getProperty("java.io.tmpdir"), "spark").toString()); + +// $example on$ +JavaDStream data = ssc.textFileStream(dataDir).map( + new Function() { +@Override +public BinarySample call(String line) throws Exception { + String[] ts = line.split(","); + boolean label = Boolean.valueOf(ts[0]); + double value = Double.valueOf(ts[1]); + return new BinarySample(label, value); +} + }); + +StreamingTest streamingTest = new StreamingTest() + .setPeacePeriod(0) + .setWindowSize(0) + .setTestMethod("welch"); + +JavaDStream out = streamingTest.registerStream(data); +out.print(); +// $example off$ + +// Stop processing if test becomes significant or we time out +final Accumulator timeoutCounter = + ssc.sparkContext().accumulator(numBatchesTimeout); + +out.foreachRDD(new VoidFunction () { + @Override + public void call(JavaRDD rdd) throws Exception { +timeoutCounter.add(-1); + +long cntSignificant = rdd.filter(new Function () { + @Override + public Boolean
[GitHub] spark pull request: [SPARK-13922][SQL] Filter rows with null attri...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/11749#discussion_r56416653 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java --- @@ -345,15 +358,25 @@ public void setColumn(int ordinal, ColumnVector column) { * in this batch will not include this row. */ public final void markFiltered(int rowId) { -assert(filteredRows[rowId] == false); +assert(!filteredRows[rowId]); filteredRows[rowId] = true; ++numRowsFiltered; } + /** + * Marks a given column as non-nullable. Any row that has a NULL value for the corresponding + * attribute is filtered out. + */ + public final void filterNullsInColumn(int ordinal) { +assert(!nullFilteredColumns.contains(ordinal)); --- End diff -- I don't think this assert is necessary. I think this is perfectly valid and makes this api easier to use. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/11665#issuecomment-197456454 thanks, all comments addressed! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13866][SQL] Handle decimal type in CSV ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11724#discussion_r56444849 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -108,14 +109,38 @@ private[csv] object CSVInferSchema { } private def tryParseDouble(field: String): DataType = { -if ((allCatch opt field.toDouble).isDefined) { +val doubleTry = allCatch opt field.toDouble --- End diff -- Ah.. numeric types with fractions can be also `Decimal`. It has precision and scale. ```scala import java.math.BigDecimal scala> BigDecimal.valueOf(1.) res4: java.math.BigDecimal = 1. scala> BigDecimal.valueOf(1.).precision res6: Int = 5 scala> BigDecimal.valueOf(1.).scale res7: Int = 4 ``` ```scala import java.math.BigDecimal scala> BigDecimal.valueOf(1) res5: java.math.BigDecimal = 1 scala> BigDecimal.valueOf(1).precision res8: Int = 1 scala> BigDecimal.valueOf(1).scale res9: Int = 0 ``` `DoubleType` with fractions can lose precision if it has too many. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13826][SQL] Addendum: update documentat...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11814#issuecomment-198223965 **[Test build #53512 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53512/consoleFull)** for PR 11814 at commit [`3b3fcca`](https://github.com/apache/spark/commit/3b3fcca9e48a6c07d8364966e153d57ee0e13290). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13815][MLlib] Provide better Exception ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11772#issuecomment-197565822 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Added transitive closure transformation to Cat...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11777#issuecomment-197769134 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13974][SQL] sub-query names do not need...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11783#issuecomment-197913025 **[Test build #53430 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53430/consoleFull)** for PR 11783 at commit [`e4edc0e`](https://github.com/apache/spark/commit/e4edc0e22212d9bb2a09bb84eb2e75de128d3736). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198109663 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13986][CORE][MLLIB] Remove `DeveloperAp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11797#issuecomment-198231936 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53503/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13980][WIP] Incrementally serialize blo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11791#issuecomment-198086746 **[Test build #53456 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53456/consoleFull)** for PR 11791 at commit [`7dc3623`](https://github.com/apache/spark/commit/7dc362331f3f549670ecd9488db456b4136a3ad7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13579][build][wip] Stop building the ma...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11796#issuecomment-198098184 **[Test build #53472 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53472/consoleFull)** for PR 11796 at commit [`54336b6`](https://github.com/apache/spark/commit/54336b6305a0cc5fb2b80247f5ab76e1b6f08407). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13826][SQL] Revises Dataset ScalaDoc
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/11769#issuecomment-197827320 Well, already minimized changes, but still too large to display :( --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13118][SQL] Expression encoding for opt...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/11708#issuecomment-197701587 Thanks - merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13963][ML] Adding binary toggle param t...
GitHub user BryanCutler opened a pull request: https://github.com/apache/spark/pull/11832 [SPARK-13963][ML] Adding binary toggle param to HashingTF ## What changes were proposed in this pull request? Adding binary toggle parameter to ml.feature.HashingTF, as well as mllib.feature.HashingTF since the former wraps this functionality. This parameter, if true, will set non-zero valued term counts to 1 to transform term count features to binary values that are well suited for discrete probability models. ## How was this patch tested? Added unit tests for ML and MLlib You can merge this pull request into a Git repository by running: $ git pull https://github.com/BryanCutler/spark binary-param-HashingTF-SPARK-13963 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11832.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11832 commit a5ff3309c0d07e57177374133130803eb98ebffb Author: Bryan CutlerDate: 2016-03-18T21:19:19Z [SPARK-13963] Adding binary toggle to HashingTF in ml/mllib commit 31097231769860b86d1d3234ebf7d4e95f96e5cb Author: Bryan Cutler Date: 2016-03-18T21:19:48Z Added unit test for HashingTF binary toggle commit ca1436166a1292f92d72408c10cf606623b31bbd Author: Bryan Cutler Date: 2016-03-18T21:26:34Z fixed param description text --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13921] Store serialized blocks as multi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11748#issuecomment-197603358 **[Test build #53351 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53351/consoleFull)** for PR 11748 at commit [`4f5074e`](https://github.com/apache/spark/commit/4f5074ece49030a6e7134f7ece706ed441c02ee4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13921] Store serialized blocks as multi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11748#issuecomment-198081291 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-198519918 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53555/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13942][CORE][DOCS] Remove Shark-related...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/11770 [SPARK-13942][CORE][DOCS] Remove Shark-related docs and visibility for 2.x ## What changes were proposed in this pull request? `Shark` was merged into `Spark SQL` since [July 2014](https://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html). The followings seem to be the only legacy. **Migration Guide** ``` - ## Migration Guide for Shark Users - ... - ### Scheduling - ... - ### Reducer number - ... - ### Caching ``` **SparkEnv visibility and comments** ``` - * - * NOTE: This is not intended for external use. This is exposed for Shark and may be made private - * in a future release. */ @DeveloperApi -class SparkEnv ( +private[spark] class SparkEnv ( ``` For Spark 2.x, we had better clean up those docs and comments in any way. However, the visibility of `SparkEnv` class might be controversial. At the first attempt, this issue proposes to change both stuffs according to the note(*This is exposed for Shark*). During review process, the change on visibility might be removed. ## How was this patch tested? Pass the Jenkins test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-13942 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11770.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11770 commit f91c5480e9a3c4644a9f95ebbc48833abbed2ea0 Author: Dongjoon HyunDate: 2016-03-16T19:53:34Z [SPARK-13942][CORE][DOCS] Remove Shark-related docs and visibility for 2.x --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13449] Naive Bayes wrapper in SparkR
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11486#issuecomment-197676388 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-197808784 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53406/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13950] [SQL] generate code for sort mer...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11771#issuecomment-197592218 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53371/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13989][SQL] Remove non-vectorized/unsaf...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11799#issuecomment-198538107 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12639] [SQL] Mark Filters Fully Handled...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11317#issuecomment-197605244 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53374/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13928] Move org.apache.spark.Logging in...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/11764#issuecomment-197645762 Sorry I missed it as this message is so far away from the final one... @JoshRosen thanks again! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL][BUILD] Remove duplicated lines
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/11773 [MINOR][SQL][BUILD] Remove duplicated lines ## What changes were proposed in this pull request? This PR removes three minor duplicated lines. First one is making the following unreachable code warning. ``` JoinSuite.scala:52: unreachable code [warn] case j: BroadcastHashJoin => j ``` The other two are just consecutive repetitions in `Seq` of MiMa filters. ## How was this patch tested? Pass the existing Jenkins test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark remove_duplicated_line Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11773.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11773 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11756#issuecomment-198236937 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53507/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13989][SQL] Remove non-vectorized/unsaf...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11799#issuecomment-198141927 **[Test build #53477 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53477/consoleFull)** for PR 11799 at commit [`ef90585`](https://github.com/apache/spark/commit/ef90585abd1a33806ee51b7acbd589a3cb33af72). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13885][YARN] Fix attempt id regression ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11721 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13826][SQL] Revises Dataset ScalaDoc
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11769#issuecomment-197898940 **[Test build #53427 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53427/consoleFull)** for PR 11769 at commit [`50502a5`](https://github.com/apache/spark/commit/50502a5152b9b0c0458ebd6b7ad48524b1422c58). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-13772] fix data type mismatch for decim...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11605#issuecomment-197875847 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53422/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-13809][SQL] State store for stream...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11645#issuecomment-197671956 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53388/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13761] [ML] Deprecate validateParams
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/11620#discussion_r56418523 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -322,7 +337,8 @@ object GeneralizedLinearRegression extends DefaultParamsReadable[GeneralizedLine /** * A description of the error distribution to be used in the model. - * @param name the name of the family. +* --- End diff -- indentation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14014] [SQL] Replace existing catalog w...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/11836#issuecomment-198572871 @yhuai @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56567647 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -199,7 +210,8 @@ class CodegenContext { case StringType => s"$input.getUTF8String($ordinal)" case BinaryType => s"$input.getBinary($ordinal)" case CalendarIntervalType => s"$input.getInterval($ordinal)" - case t: StructType => s"$input.getStruct($ordinal, ${t.size})" + case t: StructType => if (!isColumnarType(input)) { s"$input.getStruct($ordinal, ${t.size})" } --- End diff -- Why not make them have the same APIs? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-198057482 **[Test build #53458 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53458/consoleFull)** for PR 10355 at commit [`e8a56d5`](https://github.com/apache/spark/commit/e8a56d5927b4652ac0b89fb3783a495a239d0eae). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13924][SQL] officially support multi-in...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/11754#issuecomment-197457013 Merging in master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12566] [ML] [WIP] GLM model family, lin...
Github user hhbyyh commented on the pull request: https://github.com/apache/spark/pull/11549#issuecomment-197773071 Thanks @mengxr @thunterdb @yanboliang for the review. Sent an update: 1. resolve the conflict with GLMSummary. 2. revert the summary statistics related part. 3. extract family and link name in R --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198266969 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-13772] fix data type mismatch for decim...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/11605#issuecomment-197825834 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13980][WIP] Incrementally serialize blo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11791#issuecomment-198218448 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53496/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13432][SQL] add the source file name an...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11301#issuecomment-198239021 **[Test build #53517 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53517/consoleFull)** for PR 11301 at commit [`62fa1de`](https://github.com/apache/spark/commit/62fa1de1832fa23a9adc425073a96d494a806798). * This patch **fails R style tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11665#issuecomment-197457623 **[Test build #53330 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53330/consoleFull)** for PR 11665 at commit [`336a18e`](https://github.com/apache/spark/commit/336a18e3e9f55514545a28f0bb32658dee2ff70b). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13898][SQL] Merge DatasetHolder and Dat...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11737#issuecomment-197734597 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13427][SQL] Support USING clause in JOI...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11297#issuecomment-197750038 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53397/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13853][SQL] QueryPlan sub-classes shoul...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/11673#discussion_r56599894 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1450,7 +1450,9 @@ object CleanupAliases extends Rule[LogicalPlan] { case c: CreateStructUnsafe if !stop => stop = true c.copy(children = c.children.map(trimNonTopLevelAliases)) -case Alias(child, _) if !stop => child +// Only eliminate aliases for named expressions, otherwise we may turn an `Alias` to a +// normal expression and break the type requirement for it. +case Alias(child: NamedExpression, _) if !stop => child --- End diff -- I tried to do it before, but not all logical plans can be accessed in catalyst, for example, `EvaluatePython`. Should we have a more general mechanism for declaring operators producing new attributes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198597887 **[Test build #53584 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53584/consoleFull)** for PR 11636 at commit [`cdd3078`](https://github.com/apache/spark/commit/cdd3078c4a252ab8701cd1bfb08e911f3878db65). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-197586274 **[Test build #53368 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53368/consoleFull)** for PR 10355 at commit [`3b424dc`](https://github.com/apache/spark/commit/3b424dc88211c83615ef8f16402879cc9eb45c2e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13982][SparkR] KMean's predict: Feature...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11793#issuecomment-198081241 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13926] Automatically use Kryo serialize...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11755 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13742][Core] Add non-iterator interface...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/11578#discussion_r56619813 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -41,6 +41,12 @@ trait RandomSampler[T, U] extends Pseudorandom with Cloneable with Serializable /** take a random sample */ def sample(items: Iterator[T]): Iterator[U] --- End diff -- I think if we want to keep it (which could make sense) - we should maybe add a default implementation for it so we don't have duplicate sampling logic (as we do right now in the classes) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13989][SQL] Remove non-vectorized/unsaf...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11799#issuecomment-198142033 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13889][YARN][Branch-1.6]Fix the calcula...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11813#issuecomment-198215415 **[Test build #53505 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53505/consoleFull)** for PR 11813 at commit [`17d8bc1`](https://github.com/apache/spark/commit/17d8bc1f13c3b29e22ecbec6a9f08491e5970368). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13926] Automatically use Kryo serialize...
Github user nongli commented on the pull request: https://github.com/apache/spark/pull/11755#issuecomment-197552094 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13853][SQL] QueryPlan sub-classes shoul...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/11673#issuecomment-197792870 hi @marmbrus , yes your idea makes sense, we should use `Alias` to produce new attributes if we can. Except leaf nodes, there are some special cases we still need to use `producedAttributes`: `Generate`, `Expand`, `ScriptTransform` and aggregate related operators. In this PR, I made `EvaluatePython` to use alias for new attribute, and clean up those special cases to use `producedAttributes`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13919] [SQL] fix column pruning through...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11828#issuecomment-198585992 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][DOC] Add JavaStreamingTestExample
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/11776#discussion_r56469940 --- Diff: examples/src/main/java/org/apache/spark/examples/mllib/JavaStreamingTestExample.java --- @@ -0,0 +1,123 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.mllib; + + +import org.apache.spark.Accumulator; +// $example on$ +import org.apache.spark.api.java.function.VoidFunction; --- End diff -- Don't think this import is actually required, as that code is after the final `$example off$`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13908][SQL] Add a LocalLimit for Collec...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11817#issuecomment-198296418 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13761] [ML] Deprecate validateParams
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/11620#issuecomment-197621807 No problem. Thanks for the PR! LGTM Merging with master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-197623813 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13989][SQL] Remove non-vectorized/unsaf...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11799#issuecomment-198136602 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53476/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14011][CORE][SQL] Enable `LineLength` J...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11831#issuecomment-198543525 **[Test build #53566 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53566/consoleFull)** for PR 11831 at commit [`2923ef0`](https://github.com/apache/spark/commit/2923ef095369376be03a868c2bf2375294dab6d1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13958]Executor OOM due to unbounded gro...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11794#issuecomment-198534785 **[Test build #53550 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53550/consoleFull)** for PR 11794 at commit [`4db3880`](https://github.com/apache/spark/commit/4db388084de12d805ae905d9062db75e130a7b73). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11723#issuecomment-198300194 **[Test build #53525 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53525/consoleFull)** for PR 11723 at commit [`ae808d7`](https://github.com/apache/spark/commit/ae808d73e022077dba6ad999627589eed4730270). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13928] Move org.apache.spark.Logging in...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11764#issuecomment-197681101 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13995][SQL] Constraints should take car...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11809#issuecomment-198209937 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13973] [PySpark]: `ipython notebook` is...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11829#issuecomment-198552276 **[Test build #53567 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53567/consoleFull)** for PR 11829 at commit [`e1a0c40`](https://github.com/apache/spark/commit/e1a0c40d4ec4101074f8e5310f357cedbdbec60a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13942][CORE][DOCS] Remove Shark-related...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11770#issuecomment-197593028 **[Test build #53370 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53370/consoleFull)** for PR 11770 at commit [`74e51b1`](https://github.com/apache/spark/commit/74e51b13faffa322a7bceff74254301f06113b49). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13928] Move org.apache.spark.Logging in...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11764#issuecomment-197700857 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53392/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11665#issuecomment-197458671 **[Test build #53331 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53331/consoleFull)** for PR 11665 at commit [`92d935f`](https://github.com/apache/spark/commit/92d935fc8c1204b5d8272655dbe0e606270e5854). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13972][SQ][WIP] hive tests should fail ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11782#issuecomment-198274820 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53524/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Trivial][Docs] Two typos in comment
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/11761 [Trivial][Docs] Two typos in comment ## What changes were proposed in this pull request? two typos ## How was this patch tested? no tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark typo Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11761.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11761 commit a3fd6ff37a59c411188dddba6e618576bb3ea8f6 Author: Zheng RuiFengDate: 2016-03-16T11:27:13Z simple typo --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12719][HOTFIX] Fix compilation against ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11798#issuecomment-198168724 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12299][CORE][WIP] Remove history servin...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/10991#issuecomment-197606014 I don't think that the Master should have any event-log consumption logic whatsoever. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11756#issuecomment-198217639 **[Test build #53507 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53507/consoleFull)** for PR 11756 at commit [`3ff900e`](https://github.com/apache/spark/commit/3ff900ec904991e79bf6267c16ee38dfc15660be). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13976][SQL] do not remove sub-queries a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11786#issuecomment-197992472 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198585189 **[Test build #53584 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53584/consoleFull)** for PR 11636 at commit [`cdd3078`](https://github.com/apache/spark/commit/cdd3078c4a252ab8701cd1bfb08e911f3878db65). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11663#issuecomment-198009920 **[Test build #53446 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53446/consoleFull)** for PR 11663 at commit [`c483da8`](https://github.com/apache/spark/commit/c483da8a6b1c305a96ce2a4a25ac8a0a1f98e653). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-13883][SQL] Parquet Implementation...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11709#issuecomment-197585281 **[Test build #53367 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53367/consoleFull)** for PR 11709 at commit [`8ccdd77`](https://github.com/apache/spark/commit/8ccdd77fb1bdb63a589a214e6151d81b54eeb524). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13926] Automatically use Kryo serialize...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/11755#issuecomment-197718175 Merging in master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13761] [ML] Remove remaining uses of va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11790#issuecomment-198035808 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53447/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org