[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/19525#discussion_r149530436 --- Diff: mllib/src/main/scala/org/apache/spark/ml/linalg/JsonMatrixConverter.scala --- @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.ml.linalg + +import org.json4s.DefaultFormats +import org.json4s.JsonDSL._ +import org.json4s.jackson.JsonMethods.{compact, parse => parseJson, render} + +private[ml] object JsonMatrixConverter { + + /** Unique class name for identifying JSON object encoded by this class. */ + val className = "org.apache.spark.ml.linalg.Matrix" --- End diff -- I'd suggest a more shorter string(or integer) to identify this is a matrix, it should be huge burden to store so long metadata string for a matrix with several elements. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/19525#discussion_r149532602 --- Diff: mllib/src/main/scala/org/apache/spark/ml/linalg/JsonMatrixConverter.scala --- @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.ml.linalg + +import org.json4s.DefaultFormats +import org.json4s.JsonDSL._ +import org.json4s.jackson.JsonMethods.{compact, parse => parseJson, render} + +private[ml] object JsonMatrixConverter { + + /** Unique class name for identifying JSON object encoded by this class. */ + val className = "org.apache.spark.ml.linalg.Matrix" --- End diff -- Or can we just use ```type``` to identify vector and matrix? For example, ```type``` less than 10 is reserved for vector and more than 10 is for matrix. What do you think of it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/19525#discussion_r149534129 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -2769,6 +2769,20 @@ class LogisticRegressionSuite LogisticRegressionSuite.allParamSettings, checkModelData) } + test("read/write with BoundsOnCoefficients") { +def checkModelData(model: LogisticRegressionModel, model2: LogisticRegressionModel): Unit = { + assert(model.getLowerBoundsOnCoefficients === model2.getLowerBoundsOnCoefficients) + assert(model.getUpperBoundsOnCoefficients === model2.getUpperBoundsOnCoefficients) --- End diff -- Or we can merge this test case with existing read/write test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/19525#discussion_r149522834 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -827,6 +831,11 @@ class SparseMatrix @Since("2.0.0") ( @Since("2.0.0") object SparseMatrix { + @Since("2.3.0") + private[ml] def unapply( --- End diff -- Ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19681: [SPARK-20652][sql] Store SQL UI data in the new a...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19681#discussion_r149537039 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala --- @@ -0,0 +1,353 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.ui + +import java.util.Date +import java.util.concurrent.ConcurrentHashMap + +import scala.collection.JavaConverters._ +import scala.collection.mutable.HashMap + +import org.apache.spark.{JobExecutionStatus, SparkConf} +import org.apache.spark.internal.Logging +import org.apache.spark.scheduler._ +import org.apache.spark.sql.execution.SQLExecution +import org.apache.spark.sql.execution.metric._ +import org.apache.spark.sql.internal.StaticSQLConf._ +import org.apache.spark.status.LiveEntity +import org.apache.spark.status.config._ +import org.apache.spark.ui.SparkUI +import org.apache.spark.util.kvstore.KVStore + +private[sql] class SQLAppStatusListener( +conf: SparkConf, +kvstore: KVStore, +live: Boolean, +ui: Option[SparkUI] = None) + extends SparkListener with Logging { + + // How often to flush intermediate stage of a live execution to the store. When replaying logs, + // never flush (only do the very last write). + private val liveUpdatePeriodNs = if (live) conf.get(LIVE_ENTITY_UPDATE_PERIOD) else -1L + + private val liveExecutions = new HashMap[Long, LiveExecutionData]() + private val stageMetrics = new HashMap[Int, LiveStageMetrics]() + + private var uiInitialized = false + + override def onJobStart(event: SparkListenerJobStart): Unit = { +val executionIdString = event.properties.getProperty(SQLExecution.EXECUTION_ID_KEY) +if (executionIdString == null) { + // This is not a job created by SQL + return +} + +val executionId = executionIdString.toLong +val jobId = event.jobId +val exec = getOrCreateExecution(executionId) + +// Record the accumulator IDs for the stages of this job, so that the code that keeps +// track of the metrics knows which accumulators to look at. +val accumIds = exec.metrics.map(_.accumulatorId).sorted.toList +event.stageIds.foreach { id => + stageMetrics.put(id, new LiveStageMetrics(id, 0, accumIds.toArray, new ConcurrentHashMap())) +} + +exec.jobs = exec.jobs + (jobId -> JobExecutionStatus.RUNNING) +exec.stages = event.stageIds +update(exec) + } + + override def onStageSubmitted(event: SparkListenerStageSubmitted): Unit = { +if (!isSQLStage(event.stageInfo.stageId)) { + return +} + +// Reset the metrics tracking object for the new attempt. +stageMetrics.get(event.stageInfo.stageId).foreach { metrics => + metrics.taskMetrics.clear() + metrics.attemptId = event.stageInfo.attemptId +} + } + + override def onJobEnd(event: SparkListenerJobEnd): Unit = { +liveExecutions.values.foreach { exec => + if (exec.jobs.contains(event.jobId)) { +val result = event.jobResult match { + case JobSucceeded => JobExecutionStatus.SUCCEEDED + case _ => JobExecutionStatus.FAILED +} +exec.jobs = exec.jobs + (event.jobId -> result) +exec.endEvents += 1 +update(exec) + } +} + } + + override def onExecutorMetricsUpdate(event: SparkListenerExecutorMetricsUpdate): Unit = { +event.accumUpdates.foreach { case (taskId, stageId, attemptId, accumUpdates) => + updateStageMetrics(stageId, attemptId, taskId, accumUpdates, false) +} + } + + override def onTaskEnd(event: SparkListenerTaskEnd): Unit = { +if (!isSQLStage(event.stageId)) { + return +} + +val info = event.taskInfo +// SPARK-20342. If processing events from a live ap
[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19657 Will take a look within today. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19681: [SPARK-20652][sql] Store SQL UI data in the new app stat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19681 **[Test build #83570 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83570/testReport)** for PR 19681 at commit [`197dd8f`](https://github.com/apache/spark/commit/197dd8fe645d3672c6e0c0ac0f52144a84b91dc5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should filter ou...
Github user akopich commented on the issue: https://github.com/apache/spark/pull/19565 ping @WeichenXu123 , @srowen , @hhbyyh Further comments? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19682: [SPARK-22464] [SQL] No pushdown for Hive metastore parti...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/19682 Thanks for the fix! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hba...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19663#discussion_r149544676 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -687,6 +687,20 @@ private[spark] class Client( private def createConfArchive(): File = { val hadoopConfFiles = new HashMap[String, File]() +// SPARK_CONF_DIR shows up in the classpath before HADOOP_CONF_DIR/YARN_CONF_DIR +val localConfDir = System.getProperty("SPARK_CONF_DIR", --- End diff -- `SPARK_CONF_DIR` is set by Spark's launch scripts, so you should just be able to do: ``` sys.env.get("SPARK_CONF_DIR").foreach { ... } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hba...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19663#discussion_r149544880 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -687,6 +687,20 @@ private[spark] class Client( private def createConfArchive(): File = { val hadoopConfFiles = new HashMap[String, File]() +// SPARK_CONF_DIR shows up in the classpath before HADOOP_CONF_DIR/YARN_CONF_DIR +val localConfDir = System.getProperty("SPARK_CONF_DIR", + System.getProperty("SPARK_HOME") + File.separator + "conf") +val dir = new File(localConfDir) +if (dir.isDirectory) { + val files = dir.listFiles(new FileFilter { +override def accept(pathname: File): Boolean = { + pathname.isFile && pathname.getName.endsWith("xml") +} + }) + files.foreach { f => hadoopConfFiles(f.getName) = f } +} + +// Ensure HADOOP_CONF_DIR/YARN_CONF_DIR not overriding existing files --- End diff -- This comment doesn't make a lot of sense, at least not in this position. What are you trying to say? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hba...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19663#discussion_r149544716 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -687,6 +687,20 @@ private[spark] class Client( private def createConfArchive(): File = { val hadoopConfFiles = new HashMap[String, File]() +// SPARK_CONF_DIR shows up in the classpath before HADOOP_CONF_DIR/YARN_CONF_DIR +val localConfDir = System.getProperty("SPARK_CONF_DIR", + System.getProperty("SPARK_HOME") + File.separator + "conf") +val dir = new File(localConfDir) +if (dir.isDirectory) { + val files = dir.listFiles(new FileFilter { +override def accept(pathname: File): Boolean = { + pathname.isFile && pathname.getName.endsWith("xml") --- End diff -- `".xml"` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19681: [SPARK-20652][sql] Store SQL UI data in the new app stat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19681 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19681: [SPARK-20652][sql] Store SQL UI data in the new app stat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19681 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83570/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19459 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83569/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19678: [SPARK-20646][core] Port executors page to new UI backen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19678 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83568/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19678: [SPARK-20646][core] Port executors page to new UI backen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19678 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19459 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19678: [SPARK-20646][core] Port executors page to new UI backen...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19678 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19681: [SPARK-20652][sql] Store SQL UI data in the new app stat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19681 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83567/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19681: [SPARK-20652][sql] Store SQL UI data in the new app stat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19681 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19687: [SPARK-19644][SQL]Clean up Scala reflection garba...
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/19687 [SPARK-19644][SQL]Clean up Scala reflection garbage after creating Encoder ## What changes were proposed in this pull request? Because of the memory leak issue in `scala.reflect.api.Types.TypeApi.<:<` (https://github.com/scala/bug/issues/8302), creating an encoder may leak memory. This PR adds `cleanUpReflectionObjects` to clean up these leaking objects for methods calling `scala.reflect.api.Types.TypeApi.<:<`. ## How was this patch tested? The updated unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark SPARK-19644 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19687.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19687 commit c03811ff006058987fa8d5fb9f7d097b9acc9ac5 Author: Shixiong Zhu Date: 2017-11-08T00:33:55Z Clean up Scala reflection garbage after creating Encoder --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19687: [SPARK-19644][SQL]Clean up Scala reflection garbage afte...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/19687 cc @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19433: [SPARK-3162] [MLlib] Add local tree training for decisio...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19433 CC @dbtsai in case you're interested b/c of Sequoia forests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19272: [Spark-21842][Mesos] Support Kerberos ticket rene...
Github user ArtRand commented on a diff in the pull request: https://github.com/apache/spark/pull/19272#discussion_r149549953 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -213,6 +216,14 @@ private[spark] class MesosCoarseGrainedSchedulerBackend( sc.conf.getOption("spark.mesos.driver.frameworkId").map(_ + suffix) ) +// check that the credentials are defined, even though it's likely that auth would have failed +// already if you've made it this far, then start the token renewer +if (hadoopDelegationTokens.isDefined) { --- End diff -- I agree that I shouldn't need to use the conditional `hadoopDelegationTokens.isDefined`, however there will need to be some check (`UserGroupInformation.isSecurityEnabled` or similar) to pass the `driverEndpoint` to the renewer/manager here. When the initial tokens are generated `driverEndpoint` is still `None` because `start()` hasn't been called yet. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19433: [SPARK-3162] [MLlib] Add local tree training for decisio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19433 **[Test build #3983 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3983/testReport)** for PR 19433 at commit [`b7e6e40`](https://github.com/apache/spark/commit/b7e6e40976612546b81d9775c194b274c146dc85). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19687: [SPARK-19644][SQL]Clean up Scala reflection garbage afte...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19687 **[Test build #83571 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83571/testReport)** for PR 19687 at commit [`c03811f`](https://github.com/apache/spark/commit/c03811ff006058987fa8d5fb9f7d097b9acc9ac5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19678: [SPARK-20646][core] Port executors page to new UI backen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19678 **[Test build #83572 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83572/testReport)** for PR 19678 at commit [`c7123d9`](https://github.com/apache/spark/commit/c7123d9c8d3934c482cd89ea820b2958f4dbbe0a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19634: [SPARK-22412][SQL] Fix incorrect comment in DataSourceSc...
Github user vgankidi commented on the issue: https://github.com/apache/spark/pull/19634 @gatorsmile I also wanted to discuss if we should consider other bin packing algorithms. According to this http://www.math.unl.edu/~s-sjessie1/203Handouts/Bin%20Packing.pdf, next fit decreasing is the least efficient of all but it is easiest to implement and has O(N) run time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/17436 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19433: [SPARK-3162] [MLlib] Add local tree training for decisio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19433 **[Test build #3983 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3983/testReport)** for PR 19433 at commit [`b7e6e40`](https://github.com/apache/spark/commit/b7e6e40976612546b81d9775c194b274c146dc85). * This patch **fails to generate documentation**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17436 **[Test build #83573 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83573/testReport)** for PR 17436 at commit [`9ce6fc0`](https://github.com/apache/spark/commit/9ce6fc0b0ad2c4c97236f0519db07b5a3600bb81). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19661: [SPARK-22450][Core][Mllib]safely register class f...
Github user ConeyLiu commented on a diff in the pull request: https://github.com/apache/spark/pull/19661#discussion_r149553694 --- Diff: core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala --- @@ -178,10 +178,40 @@ class KryoSerializer(conf: SparkConf) kryo.register(Utils.classForName("scala.collection.immutable.Map$EmptyMap$")) kryo.register(classOf[ArrayBuffer[Any]]) +// We can't load those class directly in order to avoid unnecessary jar dependencies. +// We load them safely, ignore it if the class not found. +Seq("org.apache.spark.mllib.linalg.Vector", + "org.apache.spark.mllib.linalg.DenseVector", + "org.apache.spark.mllib.linalg.SparseVector", + "org.apache.spark.mllib.linalg.Matrix", + "org.apache.spark.mllib.linalg.DenseMatrix", + "org.apache.spark.mllib.linalg.SparseMatrix", + "org.apache.spark.ml.linalg.Vector", + "org.apache.spark.ml.linalg.DenseVector", + "org.apache.spark.ml.linalg.SparseVector", + "org.apache.spark.ml.linalg.Matrix", + "org.apache.spark.ml.linalg.DenseMatrix", + "org.apache.spark.ml.linalg.SparseMatrix", + "org.apache.spark.ml.feature.Instance", + "org.apache.spark.ml.feature.OffsetInstance" +).flatMap(safeClassLoader(_)).foreach(kryo.register(_)) --- End diff -- Hi @cloud-fan , I tried the following codeï¼ ```scala flatMap(cn => Try{Utils.classForName(cn)}.toOption).foreach(kryo.register(_)) ``` and ```scala flatMap{ cn => try { val clazz = Utils.classForName(cn) Some(clazz) } catch { case _: ClassNotFoundException => None } }.foreach(kryo.register(_)) ``` Both reported the same errors: ``` Error:(198, 18) type mismatch; found : String => Iterable[Class[_$2]]( forSome { type _$2 }) required: String => scala.collection.GenTraversableOnce[B] ).flatMap{cn => Option(Utils.classForName(cn))}.foreach(kryo.register(_)) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19685: [SPARK-19759][ML] not using blas in ALSModel.pred...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19685#discussion_r149554146 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -289,9 +289,11 @@ class ALSModel private[ml] ( private val predict = udf { (featuresA: Seq[Float], featuresB: Seq[Float]) => if (featuresA != null && featuresB != null) { - // TODO(SPARK-19759): try dot-producting on Seqs or another non-converted type for - // potential optimization. - blas.sdot(rank, featuresA.toArray, 1, featuresB.toArray, 1) + var dotProduct = 0.0f + for(i <- 0 until rank) { --- End diff -- You should `while` instead of `for` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19685: [SPARK-19759][ML] not using blas in ALSModel.predict for...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19685 Have you made some test to check the performance difference for this ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interface of d...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19156 **[Test build #83574 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83574/testReport)** for PR 19156 at commit [`480e80d`](https://github.com/apache/spark/commit/480e80dbb0392bebe96dc1620195a39b54f75740). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19285 **[Test build #83575 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83575/testReport)** for PR 19285 at commit [`bc3ad4e`](https://github.com/apache/spark/commit/bc3ad4ea11e49b19ef4199642dbc4488f202d928). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...
Github user tengpeng commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r149558607 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala --- @@ -764,13 +764,17 @@ class LinearRegressionSuite (Intercept) 6.3022157 0.00186003388 <2e-16 *** V2 4.6982442 0.00118053980 <2e-16 *** V3 7.1994344 0.00090447961 <2e-16 *** + + # R code for r2adj --- End diff -- @srowen it's fine in terms of functioning. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r149559666 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala --- @@ -764,13 +764,17 @@ class LinearRegressionSuite (Intercept) 6.3022157 0.00186003388 <2e-16 *** V2 4.6982442 0.00118053980 <2e-16 *** V3 7.1994344 0.00090447961 <2e-16 *** + + # R code for r2adj --- End diff -- There may be some confusion. If you type that code, "as-is", into an R shell, it will not work. It reference a variable called `X1`, which is never defined. When we provide R code in comments like this, we intend for it to be copy and pasted into a shell and just work. So, it does not function. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19666: [SPARK-22451][ML] Reduce decision tree aggregate size fo...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19666 @facaiy Thanks for your review! I put more explanation on the design purpose of `traverseUnorderedSplits`. But, if you have better solution, no hesitate to tell me! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...
Github user tengpeng commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r149560345 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala --- @@ -764,13 +764,17 @@ class LinearRegressionSuite (Intercept) 6.3022157 0.00186003388 <2e-16 *** V2 4.6982442 0.00118053980 <2e-16 *** V3 7.1994344 0.00090447961 <2e-16 *** + + # R code for r2adj --- End diff -- Thanks for the clarification. Do you think change `x1` to `V1` would help? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19666: [SPARK-22451][ML] Reduce decision tree aggregate size fo...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19666 Also cc @smurching Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19666: [SPARK-22451][ML] Reduce decision tree aggregate ...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19666#discussion_r149561550 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -741,17 +678,43 @@ private[spark] object RandomForest extends Logging { (splits(featureIndex)(bestFeatureSplitIndex), bestFeatureGainStats) } else if (binAggregates.metadata.isUnordered(featureIndex)) { // Unordered categorical feature - val leftChildOffset = binAggregates.getFeatureOffset(featureIndexIdx) - val (bestFeatureSplitIndex, bestFeatureGainStats) = -Range(0, numSplits).map { splitIndex => - val leftChildStats = binAggregates.getImpurityCalculator(leftChildOffset, splitIndex) - val rightChildStats = binAggregates.getParentImpurityCalculator() -.subtract(leftChildStats) + val numBins = binAggregates.metadata.numBins(featureIndex) + val featureOffset = binAggregates.getFeatureOffset(featureIndexIdx) + + val binStatsArray = Array.tabulate(numBins) { binIndex => +binAggregates.getImpurityCalculator(featureOffset, binIndex) + } + val parentStats = binAggregates.getParentImpurityCalculator() + + var bestGain = Double.NegativeInfinity + var bestSet: BitSet = null + var bestLeftChildStats: ImpurityCalculator = null + var bestRightChildStats: ImpurityCalculator = null + + traverseUnorderedSplits[ImpurityCalculator](numBins, null, +(stats, binIndex) => { + val binStats = binStatsArray(binIndex) + if (stats == null) { +binStats + } else { +stats.copy.add(binStats) + } +}, +(set, leftChildStats) => { + val rightChildStats = parentStats.copy.subtract(leftChildStats) gainAndImpurityStats = calculateImpurityStats(gainAndImpurityStats, leftChildStats, rightChildStats, binAggregates.metadata) - (splitIndex, gainAndImpurityStats) -}.maxBy(_._2.gain) - (splits(featureIndex)(bestFeatureSplitIndex), bestFeatureGainStats) + if (gainAndImpurityStats.gain > bestGain) { +bestGain = gainAndImpurityStats.gain +bestSet = set | new BitSet(numBins) // copy set --- End diff -- The class do not support `copy` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19688: [SPARK-22466][Spark Submit]export SPARK_CONF_DIR ...
GitHub user yaooqinn opened a pull request: https://github.com/apache/spark/pull/19688 [SPARK-22466][Spark Submit]export SPARK_CONF_DIR while conf is default ## What changes were proposed in this pull request? ### Before ``` Kent@KentsMacBookPro î° ~/Documents/spark-packages/spark-2.3.0-SNAPSHOT-bin-master î° bin/spark-shell --master local Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 17/11/08 10:28:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/11/08 10:28:45 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. Spark context Web UI available at http://169.254.168.63:4041 Spark context available as 'sc' (master = local, app id = local-1510108125770). Spark session available as 'spark'. Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.0-SNAPSHOT /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_65) Type in expressions to have them evaluated. Type :help for more information. scala> sys.env.get("SPARK_CONF_DIR") res0: Option[String] = None ``` ### After ``` scala> sys.env.get("SPARK_CONF_DIR") res0: Option[String] = Some(/Users/Kent/Documents/spark/conf) ``` ## How was this patch tested? @vanzin You can merge this pull request into a Git repository by running: $ git pull https://github.com/yaooqinn/spark SPARK-22466 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19688.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19688 commit 19ac61cd6d8b4cca295a1f0d2f2988ee3ac20d8c Author: Kent Yao Date: 2017-11-08T02:30:01Z export SPARK_CONF_DIR while conf is default --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hba...
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/19663#discussion_r149561888 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -687,6 +687,20 @@ private[spark] class Client( private def createConfArchive(): File = { val hadoopConfFiles = new HashMap[String, File]() +// SPARK_CONF_DIR shows up in the classpath before HADOOP_CONF_DIR/YARN_CONF_DIR +val localConfDir = System.getProperty("SPARK_CONF_DIR", + System.getProperty("SPARK_HOME") + File.separator + "conf") +val dir = new File(localConfDir) +if (dir.isDirectory) { + val files = dir.listFiles(new FileFilter { +override def accept(pathname: File): Boolean = { + pathname.isFile && pathname.getName.endsWith("xml") --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should filter ou...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19565 ok I agree this change. @jkbradley Can you take a look ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hba...
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/19663#discussion_r149561877 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -687,6 +687,20 @@ private[spark] class Client( private def createConfArchive(): File = { val hadoopConfFiles = new HashMap[String, File]() +// SPARK_CONF_DIR shows up in the classpath before HADOOP_CONF_DIR/YARN_CONF_DIR +val localConfDir = System.getProperty("SPARK_CONF_DIR", --- End diff -- not exactly till now , plz check https://github.com/apache/spark/pull/19688 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hba...
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/19663#discussion_r149561925 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -687,6 +687,20 @@ private[spark] class Client( private def createConfArchive(): File = { val hadoopConfFiles = new HashMap[String, File]() +// SPARK_CONF_DIR shows up in the classpath before HADOOP_CONF_DIR/YARN_CONF_DIR +val localConfDir = System.getProperty("SPARK_CONF_DIR", + System.getProperty("SPARK_HOME") + File.separator + "conf") +val dir = new File(localConfDir) +if (dir.isDirectory) { + val files = dir.listFiles(new FileFilter { +override def accept(pathname: File): Boolean = { + pathname.isFile && pathname.getName.endsWith("xml") +} + }) + files.foreach { f => hadoopConfFiles(f.getName) = f } +} + +// Ensure HADOOP_CONF_DIR/YARN_CONF_DIR not overriding existing files --- End diff -- ok, i'd remove it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19688: [SPARK-22466][Spark Submit]export SPARK_CONF_DIR while c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19688 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19663 **[Test build #83576 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83576/testReport)** for PR 19663 at commit [`f8c1f63`](https://github.com/apache/spark/commit/f8c1f63944c602a00802356f94788464320ffa3f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interface of d...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19156 **[Test build #83574 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83574/testReport)** for PR 19156 at commit [`480e80d`](https://github.com/apache/spark/commit/480e80dbb0392bebe96dc1620195a39b54f75740). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interface of d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19156 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83574/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interface of d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19156 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19607: [WIP][SPARK-22395][SQL][PYTHON] Fix the behavior of time...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19607 **[Test build #83578 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83578/testReport)** for PR 19607 at commit [`4adb073`](https://github.com/apache/spark/commit/4adb073f8d1454fbea0742a16b6d7662e063b37a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel indexe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19662 **[Test build #83577 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83577/testReport)** for PR 19662 at commit [`dd672ac`](https://github.com/apache/spark/commit/dd672ac815038f8dfd89fecb1f5b3d4668158752). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel indexe...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19662 @WeichenXu123 I did a scan. Currently I only found `VectorAssembler`'s udf may have similar issue. Fixed and added test for it too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19272: [Spark-21842][Mesos] Support Kerberos ticket rene...
Github user ArtRand commented on a diff in the pull request: https://github.com/apache/spark/pull/19272#discussion_r149564294 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -213,6 +216,14 @@ private[spark] class MesosCoarseGrainedSchedulerBackend( sc.conf.getOption("spark.mesos.driver.frameworkId").map(_ + suffix) ) +// check that the credentials are defined, even though it's likely that auth would have failed +// already if you've made it this far, then start the token renewer +if (hadoopDelegationTokens.isDefined) { --- End diff -- I may have spoke too soon, there might be a way.. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19664#discussion_r149564330 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -214,11 +215,13 @@ case class Invoke( override def eval(input: InternalRow): Any = throw new UnsupportedOperationException("Only code-generated evaluation is supported.") + private lazy val encodedFunctionName = TermName(functionName).encodedName.toString --- End diff -- Maybe, although I didn't have concrete case causing the issue for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19664#discussion_r149564523 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala --- @@ -335,4 +338,17 @@ class ScalaReflectionSuite extends SparkFunSuite { assert(linkedHashMapDeserializer.dataType == ObjectType(classOf[LHMap[_, _]])) } + test("SPARK-22442: Generate correct field names for special characters") { +val serializer = serializerFor[SpecialCharAsFieldData](BoundReference( + 0, ObjectType(classOf[SpecialCharAsFieldData]), nullable = false)) +val deserializer = deserializerFor[SpecialCharAsFieldData] +assert(serializer.dataType(0).name == "field.1") +assert(serializer.dataType(1).name == "field 2") + +val argumentsFields = deserializer.asInstanceOf[NewInstance].arguments.flatMap { _.collect { + case UpCast(u: UnresolvedAttribute, _, _) => u.name +}} +assert(argumentsFields(0) == "`field.1`") --- End diff -- We need to deliberately wrap backticks around a field name such as `field.1` because of the dot character. Otherwise `UnresolvedAttribute` will parse it as two name parts `Seq("field", "1")` and then fail resolving later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19663 **[Test build #83576 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83576/testReport)** for PR 19663 at commit [`f8c1f63`](https://github.com/apache/spark/commit/f8c1f63944c602a00802356f94788464320ffa3f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19663 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19663 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83576/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19664#discussion_r149565144 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -214,11 +215,13 @@ case class Invoke( override def eval(input: InternalRow): Any = throw new UnsupportedOperationException("Only code-generated evaluation is supported.") + private lazy val encodedFunctionName = TermName(functionName).encodedName.toString --- End diff -- Since we use `Invoke` to access field(s) in object, this can be an issue. I didn't found `StaticInvoke` used similarly. So it should be fine. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19459 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19459 **[Test build #83579 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83579/testReport)** for PR 19459 at commit [`99ce1e4`](https://github.com/apache/spark/commit/99ce1e44f57c411af95b1c9d9c95f35f2c1652e1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19666: [SPARK-22451][ML] Reduce decision tree aggregate ...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19666#discussion_r149567340 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -631,6 +614,42 @@ class RandomForestSuite extends SparkFunSuite with MLlibTestSparkContext { val expected = Map(0 -> 1.0 / 3.0, 2 -> 2.0 / 3.0) assert(mapToVec(map.toMap) ~== mapToVec(expected) relTol 0.01) } + + test("traverseUnorderedSplits") { + --- End diff -- So how to test all possible splits to make sure the generated splits are all correct ? If tree generated, only best split is remained. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19662#discussion_r149567769 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala --- @@ -126,4 +126,25 @@ class VectorAssemblerSuite .setOutputCol("myOutputCol") testDefaultReadWrite(t) } + + test("VectorAssembler's UDF should not apply on filtered data") { --- End diff -- mark the [SPARK-22446] on the test name. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19662#discussion_r149568133 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala --- @@ -126,4 +126,25 @@ class VectorAssemblerSuite .setOutputCol("myOutputCol") testDefaultReadWrite(t) } + + test("VectorAssembler's UDF should not apply on filtered data") { --- End diff -- Ok. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel indexe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19662 **[Test build #83580 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83580/testReport)** for PR 19662 at commit [`d2ac83e`](https://github.com/apache/spark/commit/d2ac83e5b1c74abd422e436752f1cf91127e388a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19687: [SPARK-19644][SQL]Clean up Scala reflection garbage afte...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19687 **[Test build #83571 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83571/testReport)** for PR 19687 at commit [`c03811f`](https://github.com/apache/spark/commit/c03811ff006058987fa8d5fb9f7d097b9acc9ac5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19687: [SPARK-19644][SQL]Clean up Scala reflection garbage afte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19687 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19687: [SPARK-19644][SQL]Clean up Scala reflection garbage afte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19687 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83571/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19689: [SPARK-22462][SQL] Make rdd-based actions in Data...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/19689 [SPARK-22462][SQL] Make rdd-based actions in Dataset trackable in SQL UI ## What changes were proposed in this pull request? For the few Dataset actions such as `foreach`, currently no SQL metrics are visible in the SQL tab of SparkUI. It is because it binds wrongly to Dataset's `QueryExecution`. As the actions directly evaluate on the RDD which has individual `QueryExecution`, to show correct SQL metrics on UI, we should bind to RDD's `QueryExecution`. ## How was this patch tested? Manually test. Screenshot is attached in the PR. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-22462 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19689.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19689 commit ac539cd0e761193d9a665d8ccb19a8fba5dd504b Author: Liang-Chi Hsieh Date: 2017-11-07T10:54:14Z Make rdd-based actions trackable in UI. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19689 The screenshot for running `sql("select * from range(10)").foreach(a => Unit)` on spark-shell: https://user-images.githubusercontent.com/68855/32531135-1e60d544-c47d-11e7-88d6-627ef77d0b80.png";> --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19689 **[Test build #83581 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83581/testReport)** for PR 19689 at commit [`ac539cd`](https://github.com/apache/spark/commit/ac539cd0e761193d9a665d8ccb19a8fba5dd504b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19648: [SPARK-14516][ML][FOLLOW-UP] Move ClusteringEvaluatorSui...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/19648 Merged into master, thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19648: [SPARK-14516][ML][FOLLOW-UP] Move ClusteringEvalu...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19648 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19678: [SPARK-20646][core] Port executors page to new UI backen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19678 **[Test build #83572 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83572/testReport)** for PR 19678 at commit [`c7123d9`](https://github.com/apache/spark/commit/c7123d9c8d3934c482cd89ea820b2958f4dbbe0a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19678: [SPARK-20646][core] Port executors page to new UI backen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19678 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83572/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19678: [SPARK-20646][core] Port executors page to new UI backen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19678 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17436 **[Test build #83573 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83573/testReport)** for PR 17436 at commit [`9ce6fc0`](https://github.com/apache/spark/commit/9ce6fc0b0ad2c4c97236f0519db07b5a3600bb81). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17436 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17436 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83573/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19689 **[Test build #83581 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83581/testReport)** for PR 19689 at commit [`ac539cd`](https://github.com/apache/spark/commit/ac539cd0e761193d9a665d8ccb19a8fba5dd504b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19689 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19689 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83581/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19285 **[Test build #83575 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83575/testReport)** for PR 19285 at commit [`bc3ad4e`](https://github.com/apache/spark/commit/bc3ad4ea11e49b19ef4199642dbc4488f202d928). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19657 Yup, I just checked it too and was writing a comment .. The current change should pass :). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19657 **[Test build #83582 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83582/testReport)** for PR 19657 at commit [`18e238a`](https://github.com/apache/spark/commit/18e238a62d53de5a73283a741c1a9bb8230f4484). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19620: [SPARK-22327][SPARKR][TEST][BACKPORT-2.1] check f...
Github user felixcheung closed the pull request at: https://github.com/apache/spark/pull/19620 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19620: [SPARK-22327][SPARKR][TEST][BACKPORT-2.1] check for vers...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19620 merged --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19619: [SPARK-22327][SPARKR][TEST][BACKPORT-2.2] check for vers...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19619 merged --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19619: [SPARK-22327][SPARKR][TEST][BACKPORT-2.2] check f...
Github user felixcheung closed the pull request at: https://github.com/apache/spark/pull/19619 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19285 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19285 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83575/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19557: [SPARK-22281][SPARKR] Handle R method breaking signature...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19557 merged to master/2.2 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19557: [SPARK-22281][SPARKR] Handle R method breaking si...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19557 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19678: [SPARK-20646][core] Port executors page to new UI backen...
Github user squito commented on the issue: https://github.com/apache/spark/pull/19678 merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19678: [SPARK-20646][core] Port executors page to new UI...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19678 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13206: [SPARK-15420] [SQL] Add repartition and sort to prepare ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13206 **[Test build #83583 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83583/consoleFull)** for PR 13206 at commit [`a64be8a`](https://github.com/apache/spark/commit/a64be8a91ddadcd7acbbd08956f214b3c40f0dca). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org