[GitHub] spark pull request: [SPARK-9522][SQL] SparkSubmit process can not ...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/7853#discussion_r39876553 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -96,7 +96,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli val startTime = System.currentTimeMillis() - private val stopped: AtomicBoolean = new AtomicBoolean(false) + private[spark] val stopped: AtomicBoolean = new AtomicBoolean(false) --- End diff -- See this thread: http://search-hadoop.com/m/q3RTtqvncy17sSTx1 Can this be marked with @DeveloperApi ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10458] [Spark Core] Added isStopped() m...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/8749#discussion_r39883770 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -264,6 +264,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli val tachyonFolderName = externalBlockStoreFolderName def isLocal: Boolean = (master == "local" || master.startsWith("local[")) + def isStopped: Boolean = stopped.get() --- End diff -- Should this be annotated @DeveloperApi ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9522][SQL] SparkSubmit process can not ...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/7853#discussion_r39880833 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -96,7 +96,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli val startTime = System.currentTimeMillis() - private val stopped: AtomicBoolean = new AtomicBoolean(false) + private[spark] val stopped: AtomicBoolean = new AtomicBoolean(false) --- End diff -- That's right. I think there would be more user requests coming in for exposing this flag. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Expose SparkContext#stopped flag with @Develop...
GitHub user tedyu opened a pull request: https://github.com/apache/spark/pull/8822 Expose SparkContext#stopped flag with @DeveloperApi See this thread: http://search-hadoop.com/m/q3RTtqvncy17sSTx1 We should expose this flag to developers @andrewor14 You can merge this pull request into a Git repository by running: $ git pull https://github.com/tedyu/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8822.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8822 commit 007c6d60c2e05fa96b3d0e7dc48c2bd1303022de Author: tedyu <yuzhih...@gmail.com> Date: 2015-09-18T17:44:00Z Expose stopped flag with @DeveloperApi --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Check partitionId's range in ExternalSorter#sp...
GitHub user tedyu opened a pull request: https://github.com/apache/spark/pull/8703 Check partitionId's range in ExternalSorter#spill() See this thread for background: http://search-hadoop.com/m/q3RTt0rWvIkHAE81 We should check the range of partition Id and provide meaningful message through exception. Alternatively, we can use abs() and modulo to force the partition Id into legitimate range. However, expectation is that user should correct the logic error in his / her code. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tedyu/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8703.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8703 commit 70eb93eb6fa269c26f82e1125aeea69a589d0428 Author: tedyu <yuzhih...@gmail.com> Date: 2015-09-10T18:27:01Z Check partitionId's range in ExternalSorter#spill() commit 63dfe1191ed9f04560cfd39fe01108fcef462673 Author: tedyu <yuzhih...@gmail.com> Date: 2015-09-10T18:29:11Z Correct indentation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10074 Include Float in @specialized anno...
Github user tedyu closed the pull request at: https://github.com/apache/spark/pull/8259 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10004] [shuffle] Perform auth checks wh...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/8218#discussion_r38595554 --- Diff: network/common/src/main/java/org/apache/spark/network/server/OneForOneStreamManager.java --- @@ -109,15 +111,34 @@ public void connectionTerminated(Channel channel) { } } + @Override + public void checkAuthorization(TransportClient client, long streamId) { +if (client.getClientId() != null) { + StreamState state = streams.get(streamId); + Preconditions.checkArgument(state != null, "Unknown stream ID."); + if (!client.getClientId().equals(state.appId)) { +throw new SecurityException(String.format( + "Client %s not authorized to read stream %d (app %s).", --- End diff -- Should we not disclose the actual appId in the exception message ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Include Float in @specialized annotation
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/8259#issuecomment-132006028 @rxin I don't use GraphX. But I think including Float in these classes would boost performance where Float is used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Include Float in @specialized annotation
GitHub user tedyu opened a pull request: https://github.com/apache/spark/pull/8259 Include Float in @specialized annotation You can merge this pull request into a Git repository by running: $ git pull https://github.com/tedyu/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8259.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8259 commit 1bd166fd8c1f943077278fa8b26c547df5f10b89 Author: tedyu yuzhih...@gmail.com Date: 2015-08-17T23:46:43Z Include Float in @specialized annotation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10074 Include Float in @specialized anno...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/8259#issuecomment-132063810 Test failures don't seem to be related to the PR. Not sure whether the following is a test or not: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41080/testReport/junit/org.apache.spark.deploy.yarn/YarnClusterSuite/_It_is_not_a_test_/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9486][SQL] Add data source aliasing for...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/7802#discussion_r36590599 --- Diff: sql/core/src/main/resources/META-INF/services/org.apache.spark.sql.sources.DataSourceRegister --- @@ -0,0 +1,3 @@ +org.apache.spark.sql.jdbc.DefaultSource +org.apache.spark.sql.json.DefaultSource +org.apache.spark.sql.parquet.DefaultSource --- End diff -- Should orc be added as well ? I see change to OrcRelation.scala below. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [BUILD] Remove dependency reduced POM hack
GitHub user tedyu opened a pull request: https://github.com/apache/spark/pull/7919 [BUILD] Remove dependency reduced POM hack You can merge this pull request into a Git repository by running: $ git pull https://github.com/tedyu/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7919.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7919 commit 1bfbd7b9c66d1bde1606fec995886e42af627ffc Author: tedyu yuzhih...@gmail.com Date: 2015-08-04T01:53:23Z [BUILD] Remove dependency reduced POM hack --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [BUILD] Remove dependency reduced POM hack
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/7919#issuecomment-127478076 The PySpark test failure should not be related to the change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9549][SQL] fix bugs in expressions
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/7882#discussion_r36114103 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -614,8 +614,9 @@ object DateTimeUtils { */ def dateAddMonths(days: Int, months: Int): Int = { val absoluteMonth = (getYear(days) - YearZero) * 12 + getMonth(days) - 1 + months -val currentMonthInYear = absoluteMonth % 12 -val currentYear = absoluteMonth / 12 +val nonNegativeMonth = if (absoluteMonth = 0) absoluteMonth else 0 +val currentMonthInYear = nonNegativeMonth % 12 +val currentYear = nonNegativeMonth / 12 --- End diff -- /% is not defined for Int. I read the notion in a Scala book which I have returned. I will read more once I have that book back. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9549][SQL] fix bugs in expressions
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/7882#discussion_r36103906 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -614,8 +614,9 @@ object DateTimeUtils { */ def dateAddMonths(days: Int, months: Int): Int = { val absoluteMonth = (getYear(days) - YearZero) * 12 + getMonth(days) - 1 + months -val currentMonthInYear = absoluteMonth % 12 -val currentYear = absoluteMonth / 12 +val nonNegativeMonth = if (absoluteMonth = 0) absoluteMonth else 0 +val currentMonthInYear = nonNegativeMonth % 12 +val currentYear = nonNegativeMonth / 12 --- End diff -- The above two statements can be replaced with: val (currentYear, currentMonthInYear) = nonNegativeMonth /% 12 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4751] Dynamic allocation in standalone ...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/7532#discussion_r36038428 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala --- @@ -152,6 +152,34 @@ private[spark] class SparkDeploySchedulerBackend( super.applicationId } + /** + * Request executors from the Master by specifying the total number desired, + * including existing pending and running executors. + * + * @return whether the request is acknowledged. + */ + protected override def doRequestTotalExecutors(requestedTotal: Int): Boolean = { +Option(client) match { + case Some(c) = c.requestTotalExecutors(requestedTotal) + case None = +logWarning(Attempted to request executors before driver fully initialized.) +false +} + } + + /** + * Kill the given list of executors through the Master. + * @return whether the kill request is acknowledged. + */ + protected override def doKillExecutors(executorIds: Seq[String]): Boolean = { +Option(client) match { + case Some(c) = c.killExecutors(executorIds) + case None = +logWarning(Attempted to kill executors before driver fully initialized.) --- End diff -- Maybe include executorIds in the log. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4751] Dynamic allocation in standalone ...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/7532#discussion_r36038415 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -804,6 +827,87 @@ private[master] class Master( } /** + * Handle a request to set the target number of executors for this application. + * + * If the executor limit is adjusted upwards, new executors will be launched provided + * that there are workers with sufficient resources. If it is adjusted downwards, however, + * we do not kill existing executors until we explicitly receive a kill request. + * + * @return whether the application has previously registered with this Master. + */ + private def handleRequestExecutors(appId: String, requestedTotal: Int): Boolean = { +idToApp.get(appId) match { + case Some(appInfo) = +logInfo(sApplication $appId requested to set total executors to $requestedTotal.) +appInfo.executorLimit = requestedTotal +schedule() +true + case None = +logWarning(sUnknown application $appId requested $requestedTotal total executors.) +false +} + } + + /** + * Handle a kill request from the given application. + * + * This method assumes the executor limit has already been adjusted downwards through + * a separate [[RequestExecutors]] message, such that we do not launch new executors + * immediately after the old ones are removed. + * + * @return whether the application has previously registered with this Master. + */ + private def handleKillExecutors(appId: String, executorIds: Seq[Int]): Boolean = { +idToApp.get(appId) match { + case Some(appInfo) = +logInfo(sApplication $appId requests to kill executors: + executorIds.mkString(, )) +val (known, unknown) = executorIds.partition(appInfo.executors.contains) +known.foreach { executorId = + val desc = appInfo.executors(executorId) + appInfo.removeExecutor(desc) + killExecutor(desc) +} +if (unknown.nonEmpty) { + logWarning(sApplication $appId attempted to kill non-existent executors: --- End diff -- Should this be logged at DEBUG level ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-9446 Clear Active SparkContext in stop()...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/7756#issuecomment-126369191 Anything I can do to move this forward ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-9446 Clear Active SparkContext in stop()...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/7756#issuecomment-126534789 Anything I need to do for this issue ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-9446 Clear Active SparkContext in stop()...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/7756#issuecomment-126117214 In CoarseGrainedSchedulerBackend.scala, around line 281: ``` override def stop() { stopExecutors() ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Clear Active SparkContext in stop() method usi...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/7756#issuecomment-126087585 Thanks for the quick reviews. See if rev2 is better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Clear Active SparkContext in stop() method usi...
GitHub user tedyu opened a pull request: https://github.com/apache/spark/pull/7756 Clear Active SparkContext in stop() method using finally See 'stopped SparkContext remaining active' thread on mailing list for relevant details You can merge this pull request into a Git repository by running: $ git pull https://github.com/tedyu/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7756.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7756 commit f5fb519e6f20eed448806a4794277503c341e1a8 Author: tedyu yuzhih...@gmail.com Date: 2015-07-29T20:03:17Z Clear Active SparkContext in stop() method using finally --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-9446 Clear Active SparkContext in stop()...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/7756#issuecomment-126122944 Doesn't seem to matter, considering we have the following at the beginning of stop(): ``` if (!stopped.compareAndSet(false, true)) { logInfo(SparkContext already stopped.) return ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Make MetastoreRelation#hiveQlPartitions lazy v...
Github user tedyu closed the pull request at: https://github.com/apache/spark/pull/7466 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Make MetastoreRelation#hiveQlPartitions lazy v...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/7466#issuecomment-122286140 Covered by #7421 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Make MetastoreRelation#hiveQlPartitions lazy v...
GitHub user tedyu opened a pull request: https://github.com/apache/spark/pull/7466 Make MetastoreRelation#hiveQlPartitions lazy val You can merge this pull request into a Git repository by running: $ git pull https://github.com/tedyu/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7466.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7466 commit d0bb9d64233e77d7930495058d41c26a290eed49 Author: tedyu yuzhih...@gmail.com Date: 2015-07-17T10:58:41Z Make MetastoreRelation#hiveQlPartitions lazy val --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7078] [SPARK-7079] Binary processing so...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/6444#discussion_r34847593 --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillWriter.java --- @@ -0,0 +1,146 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util.collection.unsafe.sort; + +import java.io.File; +import java.io.IOException; + +import scala.Tuple2; + +import org.apache.spark.executor.ShuffleWriteMetrics; +import org.apache.spark.serializer.DummySerializerInstance; +import org.apache.spark.storage.BlockId; +import org.apache.spark.storage.BlockManager; +import org.apache.spark.storage.BlockObjectWriter; +import org.apache.spark.storage.TempLocalBlockId; +import org.apache.spark.unsafe.PlatformDependent; + +/** + * Spills a list of sorted records to disk. Spill files have the following format: + * + * [# of records (int)] [[len (int)][prefix (long)][data (bytes)]...] + */ +final class UnsafeSorterSpillWriter { --- End diff -- Should this class declare 'implement Closeable' ? This way, its caller would be able to use try-with-resources construct --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7078] [SPARK-7079] Binary processing so...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/6444#discussion_r34849488 --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillMerger.java --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util.collection.unsafe.sort; + +import java.io.IOException; +import java.util.Comparator; +import java.util.PriorityQueue; + +final class UnsafeSorterSpillMerger { + + private final PriorityQueueUnsafeSorterIterator priorityQueue; + + public UnsafeSorterSpillMerger( + final RecordComparator recordComparator, + final PrefixComparator prefixComparator, + final int numSpills) { +final ComparatorUnsafeSorterIterator comparator = new ComparatorUnsafeSorterIterator() { + + @Override + public int compare(UnsafeSorterIterator left, UnsafeSorterIterator right) { +final int prefixComparisonResult = + prefixComparator.compare(left.getKeyPrefix(), right.getKeyPrefix()); +if (prefixComparisonResult == 0) { + return recordComparator.compare( +left.getBaseObject(), left.getBaseOffset(), +right.getBaseObject(), right.getBaseOffset()); +} else { + return prefixComparisonResult; +} + } +}; +priorityQueue = new PriorityQueueUnsafeSorterIterator(numSpills, comparator); + } + + public void addSpill(UnsafeSorterIterator spillReader) throws IOException { +if (spillReader.hasNext()) { + spillReader.loadNext(); +} +priorityQueue.add(spillReader); --- End diff -- If !spillReader.hasNext(), can the addition be skipped ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7078] [SPARK-7079] Binary processing so...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/6444#discussion_r34850864 --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillWriter.java --- @@ -0,0 +1,146 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util.collection.unsafe.sort; + +import java.io.File; +import java.io.IOException; + +import scala.Tuple2; + +import org.apache.spark.executor.ShuffleWriteMetrics; +import org.apache.spark.serializer.DummySerializerInstance; +import org.apache.spark.storage.BlockId; +import org.apache.spark.storage.BlockManager; +import org.apache.spark.storage.BlockObjectWriter; +import org.apache.spark.storage.TempLocalBlockId; +import org.apache.spark.unsafe.PlatformDependent; + +/** + * Spills a list of sorted records to disk. Spill files have the following format: + * + * [# of records (int)] [[len (int)][prefix (long)][data (bytes)]...] + */ +final class UnsafeSorterSpillWriter { + + static final int DISK_WRITE_BUFFER_SIZE = 1024 * 1024; + + // Small writes to DiskBlockObjectWriter will be fairly inefficient. Since there doesn't seem to + // be an API to directly transfer bytes from managed memory to the disk writer, we buffer + // data through a byte array. + private byte[] writeBuffer = new byte[DISK_WRITE_BUFFER_SIZE]; + + private final File file; + private final BlockId blockId; + private final int numRecordsToWrite; + private BlockObjectWriter writer; + private int numRecordsSpilled = 0; + + public UnsafeSorterSpillWriter( + BlockManager blockManager, + int fileBufferSize, + ShuffleWriteMetrics writeMetrics, + int numRecordsToWrite) throws IOException { +final Tuple2TempLocalBlockId, File spilledFileInfo = + blockManager.diskBlockManager().createTempLocalBlock(); +this.file = spilledFileInfo._2(); +this.blockId = spilledFileInfo._1(); +this.numRecordsToWrite = numRecordsToWrite; +// Unfortunately, we need a serializer instance in order to construct a DiskBlockObjectWriter. +// Our write path doesn't actually use this serializer (since we end up calling the `write()` +// OutputStream methods), but DiskBlockObjectWriter still calls some methods on it. To work +// around this, we pass a dummy no-op serializer. +writer = blockManager.getDiskWriter( + blockId, file, DummySerializerInstance.INSTANCE, fileBufferSize, writeMetrics); +// Write the number of records +writeIntToBuffer(numRecordsToWrite, 0); +writer.write(writeBuffer, 0, 4); + } + + // Based on DataOutputStream.writeLong. + private void writeLongToBuffer(long v, int offset) throws IOException { +writeBuffer[offset + 0] = (byte)(v 56); +writeBuffer[offset + 1] = (byte)(v 48); +writeBuffer[offset + 2] = (byte)(v 40); +writeBuffer[offset + 3] = (byte)(v 32); +writeBuffer[offset + 4] = (byte)(v 24); +writeBuffer[offset + 5] = (byte)(v 16); +writeBuffer[offset + 6] = (byte)(v 8); +writeBuffer[offset + 7] = (byte)(v 0); + } + + // Based on DataOutputStream.writeInt. + private void writeIntToBuffer(int v, int offset) throws IOException { +writeBuffer[offset + 0] = (byte)(v 24); +writeBuffer[offset + 1] = (byte)(v 16); +writeBuffer[offset + 2] = (byte)(v 8); +writeBuffer[offset + 3] = (byte)(v 0); + } + + /** + * Write a record to a spill file. + * + * @param baseObject the base object / memory page containing the record + * @param baseOffset the base offset which points directly to the record data. + * @param recordLength the length of the record. + * @param keyPrefix a sort key prefix + */ + public void write( + Object baseObject, + long baseOffset, + int recordLength
[GitHub] spark pull request: [SPARK-7078] [SPARK-7079] Binary processing so...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/6444#discussion_r34847659 --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java --- @@ -0,0 +1,282 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util.collection.unsafe.sort; + +import java.io.IOException; +import java.util.LinkedList; + +import com.google.common.annotations.VisibleForTesting; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.SparkConf; +import org.apache.spark.TaskContext; +import org.apache.spark.executor.ShuffleWriteMetrics; +import org.apache.spark.shuffle.ShuffleMemoryManager; +import org.apache.spark.storage.BlockManager; +import org.apache.spark.unsafe.PlatformDependent; +import org.apache.spark.unsafe.memory.MemoryBlock; +import org.apache.spark.unsafe.memory.TaskMemoryManager; +import org.apache.spark.util.Utils; + +/** + * External sorter based on {@link UnsafeInMemorySorter}. + */ +public final class UnsafeExternalSorter { + + private final Logger logger = LoggerFactory.getLogger(UnsafeExternalSorter.class); + + private static final int PAGE_SIZE = 1 27; // 128 megabytes + @VisibleForTesting + static final int MAX_RECORD_SIZE = PAGE_SIZE - 4; + + private final PrefixComparator prefixComparator; + private final RecordComparator recordComparator; + private final int initialSize; + private final TaskMemoryManager memoryManager; + private final ShuffleMemoryManager shuffleMemoryManager; + private final BlockManager blockManager; + private final TaskContext taskContext; + private ShuffleWriteMetrics writeMetrics; + + /** The buffer size to use when writing spills using DiskBlockObjectWriter */ + private final int fileBufferSizeBytes; + + /** + * Memory pages that hold the records being sorted. The pages in this list are freed when + * spilling, although in principle we could recycle these pages across spills (on the other hand, + * this might not be necessary if we maintained a pool of re-usable pages in the TaskMemoryManager + * itself). + */ + private final LinkedListMemoryBlock allocatedPages = new LinkedListMemoryBlock(); + + // These variables are reset after spilling: + private UnsafeInMemorySorter sorter; + private MemoryBlock currentPage = null; + private long currentPagePosition = -1; + private long freeSpaceInCurrentPage = 0; + + private final LinkedListUnsafeSorterSpillWriter spillWriters = new LinkedList(); + + public UnsafeExternalSorter( + TaskMemoryManager memoryManager, + ShuffleMemoryManager shuffleMemoryManager, + BlockManager blockManager, + TaskContext taskContext, + RecordComparator recordComparator, + PrefixComparator prefixComparator, + int initialSize, + SparkConf conf) throws IOException { +this.memoryManager = memoryManager; +this.shuffleMemoryManager = shuffleMemoryManager; +this.blockManager = blockManager; +this.taskContext = taskContext; +this.recordComparator = recordComparator; +this.prefixComparator = prefixComparator; +this.initialSize = initialSize; +// Use getSizeAsKb (not bytes) to maintain backwards compatibility for units +this.fileBufferSizeBytes = (int) conf.getSizeAsKb(spark.shuffle.file.buffer, 32k) * 1024; +initializeForWriting(); + } + + // TODO: metrics tracking + integration with shuffle write metrics + // need to connect the write metrics to task metrics so we count the spill IO somewhere. + + /** + * Allocates new sort data structures. Called when creating the sorter and after each spill. + */ + private void initializeForWriting() throws IOException { +this.writeMetrics = new ShuffleWriteMetrics(); +// TODO
[GitHub] spark pull request: [SPARK-7078] [SPARK-7079] Binary processing so...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/6444#discussion_r34849233 --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSortDataFormat.java --- @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util.collection.unsafe.sort; + +import org.apache.spark.util.collection.SortDataFormat; + +/** + * Supports sorting an array of (record pointer, key prefix) pairs. + * Used in {@link UnsafeInMemorySorter}. + * p + * Within each long[] buffer, position {@code 2 * i} holds a pointer pointer to the record at + * index {@code i}, while position {@code 2 * i + 1} in the array holds an 8-byte key prefix. + */ +final class UnsafeSortDataFormat extends SortDataFormatRecordPointerAndKeyPrefix, long[] { + + public static final UnsafeSortDataFormat INSTANCE = new UnsafeSortDataFormat(); + + private UnsafeSortDataFormat() { } + + @Override + public RecordPointerAndKeyPrefix getKey(long[] data, int pos) { +// Since we re-use keys, this method shouldn't be called. +throw new UnsupportedOperationException(); + } + + @Override + public RecordPointerAndKeyPrefix newKey() { +return new RecordPointerAndKeyPrefix(); + } + + @Override + public RecordPointerAndKeyPrefix getKey(long[] data, int pos, RecordPointerAndKeyPrefix reuse) { +reuse.recordPointer = data[pos * 2]; +reuse.keyPrefix = data[pos * 2 + 1]; +return reuse; + } + + @Override + public void swap(long[] data, int pos0, int pos1) { +long tempPointer = data[pos0 * 2]; +long tempKeyPrefix = data[pos0 * 2 + 1]; +data[pos0 * 2] = data[pos1 * 2]; +data[pos0 * 2 + 1] = data[pos1 * 2 + 1]; +data[pos1 * 2] = tempPointer; +data[pos1 * 2 + 1] = tempKeyPrefix; + } + + @Override + public void copyElement(long[] src, int srcPos, long[] dst, int dstPos) { +dst[dstPos * 2] = src[srcPos * 2]; +dst[dstPos * 2 + 1] = src[srcPos * 2 + 1]; + } + + @Override + public void copyRange(long[] src, int srcPos, long[] dst, int dstPos, int length) { +System.arraycopy(src, srcPos * 2, dst, dstPos * 2, length * 2); + } + + @Override + public long[] allocate(int length) { +assert (length Integer.MAX_VALUE / 2) : Length + length + is too large; --- End diff -- Should length of (Integer.MAX_VALUE / 2) be allowed ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7078] [SPARK-7079] Binary processing so...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/6444#discussion_r34848070 --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java --- @@ -0,0 +1,282 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util.collection.unsafe.sort; + +import java.io.IOException; +import java.util.LinkedList; + +import com.google.common.annotations.VisibleForTesting; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.SparkConf; +import org.apache.spark.TaskContext; +import org.apache.spark.executor.ShuffleWriteMetrics; +import org.apache.spark.shuffle.ShuffleMemoryManager; +import org.apache.spark.storage.BlockManager; +import org.apache.spark.unsafe.PlatformDependent; +import org.apache.spark.unsafe.memory.MemoryBlock; +import org.apache.spark.unsafe.memory.TaskMemoryManager; +import org.apache.spark.util.Utils; + +/** + * External sorter based on {@link UnsafeInMemorySorter}. + */ +public final class UnsafeExternalSorter { + + private final Logger logger = LoggerFactory.getLogger(UnsafeExternalSorter.class); + + private static final int PAGE_SIZE = 1 27; // 128 megabytes + @VisibleForTesting + static final int MAX_RECORD_SIZE = PAGE_SIZE - 4; + + private final PrefixComparator prefixComparator; + private final RecordComparator recordComparator; + private final int initialSize; + private final TaskMemoryManager memoryManager; + private final ShuffleMemoryManager shuffleMemoryManager; + private final BlockManager blockManager; + private final TaskContext taskContext; + private ShuffleWriteMetrics writeMetrics; + + /** The buffer size to use when writing spills using DiskBlockObjectWriter */ + private final int fileBufferSizeBytes; + + /** + * Memory pages that hold the records being sorted. The pages in this list are freed when + * spilling, although in principle we could recycle these pages across spills (on the other hand, + * this might not be necessary if we maintained a pool of re-usable pages in the TaskMemoryManager + * itself). + */ + private final LinkedListMemoryBlock allocatedPages = new LinkedListMemoryBlock(); + + // These variables are reset after spilling: + private UnsafeInMemorySorter sorter; + private MemoryBlock currentPage = null; + private long currentPagePosition = -1; + private long freeSpaceInCurrentPage = 0; + + private final LinkedListUnsafeSorterSpillWriter spillWriters = new LinkedList(); + + public UnsafeExternalSorter( + TaskMemoryManager memoryManager, + ShuffleMemoryManager shuffleMemoryManager, + BlockManager blockManager, + TaskContext taskContext, + RecordComparator recordComparator, + PrefixComparator prefixComparator, + int initialSize, + SparkConf conf) throws IOException { +this.memoryManager = memoryManager; +this.shuffleMemoryManager = shuffleMemoryManager; +this.blockManager = blockManager; +this.taskContext = taskContext; +this.recordComparator = recordComparator; +this.prefixComparator = prefixComparator; +this.initialSize = initialSize; +// Use getSizeAsKb (not bytes) to maintain backwards compatibility for units +this.fileBufferSizeBytes = (int) conf.getSizeAsKb(spark.shuffle.file.buffer, 32k) * 1024; +initializeForWriting(); + } + + // TODO: metrics tracking + integration with shuffle write metrics + // need to connect the write metrics to task metrics so we count the spill IO somewhere. + + /** + * Allocates new sort data structures. Called when creating the sorter and after each spill. + */ + private void initializeForWriting() throws IOException { +this.writeMetrics = new ShuffleWriteMetrics(); +// TODO
[GitHub] spark pull request: [SPARK-9045] Fix Scala 2.11 build break in Uns...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/7405#discussion_r34623959 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/AbstractScalaRowIterator.scala --- @@ -17,11 +17,14 @@ package org.apache.spark.sql -import org.apache.spark.sql.catalyst.InternalRow - /** * Shim to allow us to implement [[scala.Iterator]] in Java. Scala 2.11+ has an AbstractIterator * class for this, but that class is `private[scala]` in 2.10. We need to explicitly fix this to - * `Row` in order to work around a spurious IntelliJ compiler error. + * `Row` in order to work around a spurious IntelliJ compiler error. This cannot be an abstract + * class because that leads to compilation errors under Scala 2.11. */ -private[spark] abstract class AbstractScalaRowIterator extends Iterator[InternalRow] +private[spark] class AbstractScalaRowIterator[T] extends Iterator[T] { --- End diff -- The 'Abstract' should be dropped from the class name, right ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6980] [CORE] Akka timeout exceptions in...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/6205#discussion_r33873268 --- Diff: core/src/main/scala/org/apache/spark/rpc/RpcEnv.scala --- @@ -182,3 +184,109 @@ private[spark] object RpcAddress { RpcAddress(host, port) } } + + +/** + * An exception thrown if RpcTimeout modifies a [[TimeoutException]]. + */ +private[rpc] class RpcTimeoutException(message: String, cause: TimeoutException) + extends TimeoutException(message) { initCause(cause) } + + +/** + * Associates a timeout with a description so that a when a TimeoutException occurs, additional + * context about the timeout can be amended to the exception message. + * @param timeout timeout duration in seconds + * @param conf the configuration parameter that controls this timeout + */ +private[spark] class RpcTimeout(timeout: FiniteDuration, val conf: String) { + + /** Get the timeout duration */ + def duration: FiniteDuration = timeout + + /** Amends the standard message of TimeoutException to include the description */ + private def createRpcTimeoutException(te: TimeoutException): RpcTimeoutException = { +new RpcTimeoutException(te.getMessage() + . This timeout is controlled by + conf, te) + } + + /** + * PartialFunction to match a TimeoutException and add the timeout description to the message + * + * @note This can be used in the recover callback of a Future to add to a TimeoutException + * Example: + *val timeout = new RpcTimeout(5 millis, short timeout) + *Future(throw new TimeoutException).recover(timeout.addMessageIfTimeout) + */ + def addMessageIfTimeout[T]: PartialFunction[Throwable, T] = { +// The exception has already been converted to a RpcTimeoutException so just raise it +case rte: RpcTimeoutException = throw rte +// Any other TimeoutException get converted to a RpcTimeoutException with modified message +case te: TimeoutException = throw createRpcTimeoutException(te) + } + + /** + * Wait for the completed result and return it. If the result is not available within this + * timeout, throw a [[RpcTimeoutException]] to indicate which configuration controls the timeout. + * @param awaitable the `Awaitable` to be awaited + * @throws RpcTimeoutException if after waiting for the specified time `awaitable` + * is still not ready + */ + def awaitResult[T](awaitable: Awaitable[T]): T = { +try { + Await.result(awaitable, duration) +} catch addMessageIfTimeout + } +} + + +private[spark] object RpcTimeout { + + /** + * Lookup the timeout property in the configuration and create + * a RpcTimeout with the property key in the description. + * @param conf configuration properties containing the timeout + * @param timeoutProp property key for the timeout in seconds + * @throws NoSuchElementException if property is not set + */ + def apply(conf: SparkConf, timeoutProp: String): RpcTimeout = { +val timeout = { conf.getTimeAsSeconds(timeoutProp) seconds } +new RpcTimeout(timeout, timeoutProp) + } + + /** + * Lookup the timeout property in the configuration and create + * a RpcTimeout with the property key in the description. + * Uses the given default value if property is not set + * @param conf configuration properties containing the timeout + * @param timeoutProp property key for the timeout in seconds + * @param defaultValue default timeout value in seconds if property not found + */ + def apply(conf: SparkConf, timeoutProp: String, defaultValue: String): RpcTimeout = { +val timeout = { conf.getTimeAsSeconds(timeoutProp, defaultValue) seconds } +new RpcTimeout(timeout, timeoutProp) + } + + /** + * Lookup prioritized list of timeout properties in the configuration + * and create a RpcTimeout with the first set property key in the + * description. + * Uses the given default value if property is not set + * @param conf configuration properties containing the timeout + * @param timeoutPropList prioritized list of property keys for the timeout in seconds + * @param defaultValue default timeout value in seconds if no properties found + */ + def apply(conf: SparkConf, timeoutPropList: Seq[String], defaultValue: String): RpcTimeout = { +require(timeoutPropList.nonEmpty) + +// Find the first set property or use the default value with the first property +val itr = timeoutPropList.iterator +var foundProp: Option[(String, String)] = None +while (itr.hasNext foundProp.isEmpty){ + val propKey
[GitHub] spark pull request: [SPARK-7988][STREAMING] Round-robin scheduling...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/6607#discussion_r33632077 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala --- @@ -271,6 +273,41 @@ class ReceiverTracker(ssc: StreamingContext, skipReceiverLaunch: Boolean = false } /** + * Get the list of executors excluding driver + */ +private def getExecutors(ssc: StreamingContext): List[String] = { + val executors = ssc.sparkContext.getExecutorMemoryStatus.map(_._1.split(:)(0)).toList + val driver = ssc.sparkContext.getConf.get(spark.driver.host) + executors.diff(List(driver)) +} + +/** Set host location(s) for each receiver so as to distribute them over + * executors in a round-robin fashion taking into account preferredLocation if set + */ +private[streaming] def scheduleReceivers(receivers: Seq[Receiver[_]], + executors: List[String]): Array[ArrayBuffer[String]] = { + val locations = new Array[ArrayBuffer[String]](receivers.length) + var i = 0 + for (i - 0 until receivers.length) { +locations(i) = new ArrayBuffer[String]() +if (receivers(i).preferredLocation.isDefined) { + locations(i) += receivers(i).preferredLocation.get +} + } + var count = 0 + for (i - 0 until max(receivers.length, executors.length)) { --- End diff -- Why is max used here ? receivers.length is not enough ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-8336 Fix NullPointerException with funct...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/6793#discussion_r32379130 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala --- @@ -69,6 +69,7 @@ class ArithmeticExpressionSuite extends SparkFunSuite with ExpressionEvalHelper checkEvaluation(-c1, -1.1, row) checkEvaluation(c1 + c2, 3.1, row) +checkDoubleEvaluation(Rand(30), (0.7363714192755834 +- 0.001), row) --- End diff -- I am in Beijing now. Except for difficulty of accessing gmail, github is quite slow as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-8336 Fix NullPointerException with funct...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/6793#discussion_r32379114 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala --- @@ -69,6 +69,7 @@ class ArithmeticExpressionSuite extends SparkFunSuite with ExpressionEvalHelper checkEvaluation(-c1, -1.1, row) checkEvaluation(c1 + c2, 3.1, row) +checkDoubleEvaluation(Rand(30), (0.7363714192755834 +- 0.001), row) --- End diff -- Looking at the tests under sql, I don't see how TaskContext is explicitly set. Creating a new test is fine. The new test would contain a method containing one line. Just want to make sure this is fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-8336 Fix NullPointerException with funct...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/6793#issuecomment-111724468 I am trying to figure out how checkEvaluation should be used for the new test. protected def checkEvaluation( expression: Expression, expected: Any, inputRow: Row = EmptyRow): Unit = { w.r.t. Rand(), the expected value is not deterministic. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-8336 Fix NullPointerException with funct...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/6793#issuecomment-111751576 Looking at ArithmeticExpressionSuite.scala, it has some checks in the following form: checkDoubleEvaluation(c1 - c2, (-0.9 +- 0.001), row) This seems to be better fit for checking the return value from Rand() --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-7355 FlakyTest - o.a.s.DriverSuite
Github user tedyu closed the pull request at: https://github.com/apache/spark/pull/6059 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Fix NullPointerException with functions.rand()
GitHub user tedyu opened a pull request: https://github.com/apache/spark/pull/6793 Fix NullPointerException with functions.rand() This PR fixes the problem reported by Justin Yip in the thread 'NullPointerException with functions.rand()' You can merge this pull request into a Git repository by running: $ git pull https://github.com/tedyu/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6793.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6793 commit a1d66c59abc59a1454a53255ce9e9df66dd98f5a Author: tedyu yuzhih...@gmail.com Date: 2015-06-12T22:57:42Z Fix NullPointerException with functions.rand() --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-8336 Fix NullPointerException with funct...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/6793#issuecomment-111668224 I looked at UnsafeFixedWidthAggregationMapSuite.scala in expressions package. Is RandomSuite.scala going to test Rand and Randn only ? A bit more hint is appreciated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-8336 Fix NullPointerException with funct...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/6793#issuecomment-111655373 Mind telling me which suite the new test should be added to ? Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-8336 Fix NullPointerException with funct...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/6793#issuecomment-111657774 At first glance, none of the test suites under sql/catalyst/src/test//scala/org/apache/spark/sql seems proper for the new test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7955][Core] Ensure executors with cache...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/6508#discussion_r31374404 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -101,6 +104,11 @@ private[spark] class ExecutorAllocationManager( private val executorIdleTimeoutS = conf.getTimeAsSeconds( spark.dynamicAllocation.executorIdleTimeout, 60s) + private val cachedExecutorTimeoutS = conf.getTimeAsSeconds( +spark.dynamicAllocation.executorIdleTimeout, s${Long.MaxValue / 60}s) --- End diff -- Did you mean 'Long.MaxValue / 1000' ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Kryo buffer size configured in mb should be pr...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/6390#discussion_r30953036 --- Diff: core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala --- @@ -50,7 +50,13 @@ class KryoSerializer(conf: SparkConf) with Logging with Serializable { - private val bufferSizeKb = conf.getSizeAsKb(spark.kryoserializer.buffer, 64k) + private val bufferSize = conf.get(spark.kryoserializer.buffer, 64k) + private val bufferSizeKb = _ + if (bufferSize.endsWith(k) || bufferSize.endsWith(kb)) { +bufferSizeKb = conf.getSizeAsKb(spark.kryoserializer.buffer, 64k) + } else { +bufferSizeKb = conf.getSizeAsMb(spark.kryoserializer.buffer, 64k) --- End diff -- Yes - I was trying to differentiate mb from kb. Let me regroup the checks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Kryo buffer size configured in mb should be pr...
GitHub user tedyu opened a pull request: https://github.com/apache/spark/pull/6390 Kryo buffer size configured in mb should be properly supported @JoshRosen This PR tried to fix the issue reported by Debasish Das under 'Kryo option changed' thread You can merge this pull request into a Git repository by running: $ git pull https://github.com/tedyu/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6390.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6390 commit 830d0d098f01564339a9568b2e736d233c7378bb Author: tedyu yuzhih...@gmail.com Date: 2015-05-24T14:30:34Z Kryo buffer size configured in mb should be properly supported --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Kryo buffer size configured in mb should be pr...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/6390#discussion_r30953067 --- Diff: core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala --- @@ -50,7 +50,13 @@ class KryoSerializer(conf: SparkConf) with Logging with Serializable { - private val bufferSizeKb = conf.getSizeAsKb(spark.kryoserializer.buffer, 64k) + private val bufferSize = conf.get(spark.kryoserializer.buffer, 64k) + private val bufferSizeKb = _ + if (bufferSize.endsWith(k) || bufferSize.endsWith(kb)) { +bufferSizeKb = conf.getSizeAsKb(spark.kryoserializer.buffer, 64k) + } else { +bufferSizeKb = conf.getSizeAsMb(spark.kryoserializer.buffer, 64k) --- End diff -- Let me wait for Debasish to confirm whether he can reproduce the error before updating this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add test which shows Kryo buffer size configur...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/6390#discussion_r30955789 --- Diff: core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala --- @@ -62,6 +62,10 @@ class KryoSerializerSuite extends FunSuite with SharedSparkContext { val thrown3 = intercept[IllegalArgumentException](new KryoSerializer(conf4).newInstance()) assert(thrown3.getMessage.contains(kryoBufferProperty)) assert(!thrown3.getMessage.contains(kryoBufferMaxProperty)) +val conf5 = conf.clone() +conf5.set(kryoBufferProperty, 8m) --- End diff -- Currently there is no test with legitimate kryoBufferProperty value expressed in 'm' This addition fills the gap. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Fix compilation error due to buildScan() being...
Github user tedyu closed the pull request at: https://github.com/apache/spark/pull/6246 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Fix compilation error due to buildScan() being...
GitHub user tedyu opened a pull request: https://github.com/apache/spark/pull/6246 Fix compilation error due to buildScan() being final This PR fixes the following compilation error: [error] /home/jenkins/workspace/Spark-Master-Maven-with-YARN/HADOOP_PROFILE/hadoop-2.4/label/centos/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala:174: overriding method buildScan in class HadoopFsRelation of type (requiredColumns: Array[String], filters: Array[org.apache.spark.sql.sources.Filter], inputPaths: Array[String])org.apache.spark.rdd.RDD[org.apache.spark.sql.catalyst.expressions.Row]; [error] method buildScan cannot override final member [error] override def buildScan(requiredColumns: Array[String], You can merge this pull request into a Git repository by running: $ git pull https://github.com/tedyu/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6246.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6246 commit e5e94e11a8eabf2913329d365b6424f2b458bd2c Author: tedyu yuzhih...@gmail.com Date: 2015-05-18T21:35:06Z Fix compilation error due to buildScan() being final --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2883] [SQL] ORC data source for Spark S...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/6194#discussion_r30459211 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/package.scala --- @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import org.apache.spark.annotation.Experimental +import org.apache.spark.sql.{DataFrame, SaveMode} + +package object orc { + /** + * ::Experimental:: + * + * Extra ORC file loading functionality on [[HiveContext]] through implicit conversion. + * + * @since 1.4.0 + */ + @Experimental + implicit class OrcContext(sqlContext: HiveContext) { +/** + * ::Experimental:: + * + * Loads specified Parquet files, returning the result as a [[DataFrame]]. --- End diff -- Parquet - ORC --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-7355 FlakyTest - o.a.s.DriverSuite
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/6059#issuecomment-101334832 Ran suite two more times with hadoop-2.4 profile - DriverSuite passed. Running suite with hadoop-2.3 profile now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-7355 FlakyTest - o.a.s.DriverSuite
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/6059#issuecomment-101345262 DriverSuite passed with hadoop-2.3 profile as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-7355 FlakyTest - o.a.s.DriverSuite
GitHub user tedyu opened a pull request: https://github.com/apache/spark/pull/6059 SPARK-7355 FlakyTest - o.a.s.DriverSuite The test passed locally: {code} ^[[32mDriverSuite:^[[0m ^[[32m- driver should exit after finishing without cleanup (SPARK-530)^[[0m ^[[36mRun completed in 32 seconds, 350 milliseconds.^[[0m ^[[36mTotal number of tests run: 1^[[0m ^[[36mSuites: completed 2, aborted 0^[[0m ^[[36mTests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0^[[0m ^[[32mAll tests passed.^[[0m {code} You can merge this pull request into a Git repository by running: $ git pull https://github.com/tedyu/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6059.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6059 commit 58668606cf81a5d72ea9bc080047d21a513b885f Author: tedyu yuzhih...@gmail.com Date: 2015-05-11T17:09:05Z SPARK-7355 FlakyTest - o.a.s.DriverSuite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-7355 FlakyTest - o.a.s.DriverSuite
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/6059#issuecomment-101007216 From https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32407/consoleFull : {code} [info] DriverSuite: [info] - driver should exit after finishing without cleanup (SPARK-530) (11 seconds, 740 milliseconds) {code} The test passed in the full test suite run. FYI --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-7355 FlakyTest - o.a.s.DriverSuite
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/6059#issuecomment-101024546 I am running test suite locally to try to reproduce the test failure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-7355 FlakyTest - o.a.s.DriverSuite
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/6059#issuecomment-101073092 Ran test suite 3 times where DriverSuite passed every time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Reference fasterxml.jackson.version in sql/cor...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/6031#issuecomment-100551775 We do have green builds now: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/2115/HADOOP_PROFILE=hadoop-2.4,label=centos/console --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Upgrade version of jackson-databind in sql/cor...
GitHub user tedyu opened a pull request: https://github.com/apache/spark/pull/6028 Upgrade version of jackson-databind in sql/core/pom.xml Currently version of jackson-databind in sql/core/pom.xml is 2.3.0 This is older than the version specified in root pom.xml This PR upgrades the version in sql/core/pom.xml so that they're consistent. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tedyu/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6028.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6028 commit 28c83946a02751f368d8dff5f4f89e5e7164deb0 Author: tedyu yuzhih...@gmail.com Date: 2015-05-09T14:57:48Z Upgrade version of jackson-databind in sql/core/pom.xml --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Upgrade version of jackson-databind in sql/cor...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/6028#issuecomment-100501599 This PR should get rid of the following test failure: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/lastCompletedBuild/testReport/test.org.apache.spark.sql/JavaUDFSuite/udf2Test/ java.lang.NoSuchMethodError: com.fasterxml.jackson.databind.introspect.POJOPropertyBuilder.addField(Lcom/fasterxml/jackson/databind/introspect/AnnotatedField;Lcom/fasterxml/ jackson/databind/PropertyName;ZZZ)V at com.fasterxml.jackson.module.scala.introspect.ScalaPropertiesCollector. com$fasterxml$jackson$module$scala$introspect$ScalaPropertiesCollector$$_addField(ScalaPropertiesCollector.scala:109) at com.fasterxml.jackson.module.scala.introspect.ScalaPropertiesCollector$$anonfun$_addFields$2$$anonfun$apply$11.apply(ScalaPropertiesCollector.scala:100) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Upgrade version of jackson-databind in sql/cor...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/6028#issuecomment-100501682 @rxin Please take a look --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Reference fasterxml.jackson.version in sql/cor...
GitHub user tedyu opened a pull request: https://github.com/apache/spark/pull/6031 Reference fasterxml.jackson.version in sql/core/pom.xml You can merge this pull request into a Git repository by running: $ git pull https://github.com/tedyu/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6031.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6031 commit 28c83946a02751f368d8dff5f4f89e5e7164deb0 Author: tedyu yuzhih...@gmail.com Date: 2015-05-09T14:57:48Z Upgrade version of jackson-databind in sql/core/pom.xml commit ff2a44ffc8f922905ef629507970b6dec6a29790 Author: tedyu yuzhih...@gmail.com Date: 2015-05-09T18:00:08Z Merge branch 'master' of github.com:apache/spark commit 5c2580cae3280efd12ca16d969637b068f82e3b9 Author: tedyu yuzhih...@gmail.com Date: 2015-05-09T18:01:21Z Reference fasterxml.jackson.version in sql/core/pom.xml --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-7450 Use UNSAFE.getLong() to speed up Bi...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/5897#issuecomment-100050095 bq. I'm really pedantic I like that :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-7450 Use UNSAFE.getLong() to speed up Bi...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/5897#issuecomment-100015784 SPARK-7450 has been filed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Use UNSAFE.getLong() to speed up BitSetMethods...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/5897#issuecomment-99621943 [error] /home/jenkins/workspace/SparkPullRequestBuilder@2/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveWindowFunctionQuerySuite.scala:769: not found: type HiveCompatibilitySuite [error] extends HiveCompatibilitySuite with BeforeAndAfter { [error] ^ [error] /home/jenkins/workspace/SparkPullRequestBuilder@2/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveWindowFunctionQuerySuite.scala:828: value testCases is not a member of org.scalatest.BeforeAndAfter [error] override def testCases: Seq[(String, File)] = super.testCases.filter { [error] ^ I don't think the above error was related to my change. BitSetSuite passed: [info] Test org.apache.spark.unsafe.bitset.BitSetSuite.basicOps started [info] Test org.apache.spark.unsafe.bitset.BitSetSuite.traversal started [info] Test run finished: 0 failed, 0 ignored, 2 total, 0.012s --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Use UNSAFE.getLong() to speed up BitSetMethods...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/5897#discussion_r29638709 --- Diff: unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSetMethods.java --- @@ -71,7 +72,13 @@ public static boolean isSet(Object baseObject, long baseOffset, int index) { * Returns {@code true} if any bit is set. */ public static boolean anySet(Object baseObject, long baseOffset, long bitSetWidthInBytes) { -for (int i = 0; i = bitSetWidthInBytes; i++) { +int widthInLong = (int)(bitSetWidthInBytes / SIZE_OF_LONG); +for (int i = 0; i = widthInLong; i++) { + if (PlatformDependent.UNSAFE.getLong(baseObject, baseOffset + i) != 0) { +return true; + } +} +for (int i = (int)(SIZE_OF_LONG * widthInLong); i bitSetWidthInBytes; i++) { --- End diff -- I don't think so. The second loop is for the remaining bytes :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Use UNSAFE.getLong() to speed up BitSetMethods...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/5897#issuecomment-98897971 From failed test: bq. [error] oro#oro;2.0.8!oro.jar origin location must be absolute: file:/home/jenkins/.m2/repository/oro/oro/2.0.8/oro-2.0.8.jar Pretty sure the above is not caused by my change --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Use UNSAFE.getLong() to speed up BitSetMethods...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/5897#discussion_r29638486 --- Diff: unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSetMethods.java --- @@ -71,7 +72,13 @@ public static boolean isSet(Object baseObject, long baseOffset, int index) { * Returns {@code true} if any bit is set. */ public static boolean anySet(Object baseObject, long baseOffset, long bitSetWidthInBytes) { -for (int i = 0; i = bitSetWidthInBytes; i++) { +long widthInLong = bitSetWidthInBytes / SIZE_OF_LONG; +for (long i = 0; i = widthInLong; i++) { --- End diff -- Addressed the above comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Use UNSAFE.getLong() to speed up BitSetMethods...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/5897#discussion_r29639524 --- Diff: unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSetMethods.java --- @@ -71,7 +72,13 @@ public static boolean isSet(Object baseObject, long baseOffset, int index) { * Returns {@code true} if any bit is set. */ public static boolean anySet(Object baseObject, long baseOffset, long bitSetWidthInBytes) { -for (int i = 0; i = bitSetWidthInBytes; i++) { +int widthInLong = (int)(bitSetWidthInBytes / SIZE_OF_LONG); +for (int i = 0; i = widthInLong; i++) { + if (PlatformDependent.UNSAFE.getLong(baseObject, baseOffset + i) != 0) { +return true; + } +} +for (int i = (int)(SIZE_OF_LONG * widthInLong); i bitSetWidthInBytes; i++) { --- End diff -- That case is covered by the first loop. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Use UNSAFE.getLong() to speed up BitSetMethods...
GitHub user tedyu opened a pull request: https://github.com/apache/spark/pull/5897 Use UNSAFE.getLong() to speed up BitSetMethods#anySet() @JoshRosen Please take a look You can merge this pull request into a Git repository by running: $ git pull https://github.com/tedyu/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5897.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5897 commit 3e9b6919ccb8ccf53b5bcb9fe183a1b7fad0e9ef Author: tedyu yuzhih...@gmail.com Date: 2015-05-04T23:59:02Z Use UNSAFE.getLong() to speed up BitSetMethods#anySet() --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Use UNSAFE.getLong() to speed up BitSetMethods...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/5897#issuecomment-98913797 From https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31809/testReport/junit/org.apache.spark.deploy/SparkSubmitSuite/includes_jars_passed_in_through___jars/ : sbt.ForkMain$ForkError: The code passed to failAfter did not complete within 60 seconds. at org.scalatest.concurrent.Timeouts$$anonfun$failAfter$1.apply(Timeouts.scala:249) at org.scalatest.concurrent.Timeouts$$anonfun$failAfter$1.apply(Timeouts.scala:249) I think the above was not related to my change. bq. This patch adds the following public classes (experimental) This change doesn't add any public class. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Set null as default value for TaskContextImpl#...
GitHub user tedyu opened a pull request: https://github.com/apache/spark/pull/5874 Set null as default value for TaskContextImpl#taskMemoryManager @JoshRosen Please take a look. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tedyu/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5874.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5874 commit 4627733ebb5c3b273b607f90e0042f8d47b97cbd Author: tedyu yuzhih...@gmail.com Date: 2015-05-03T21:41:27Z Set null as default value for TaskContextImpl#taskMemoryManager --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Set null as default value for TaskContextImpl#...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/5874#issuecomment-98562614 From what I can tell (e.g. JavaAPISuite), null taskMemoryManager is passed in existing tests. This doesn't seem to increase the chance of NPE. I agree with Josh that tests should make use of this default value. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Set null as default value for TaskContextImpl#...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/5874#issuecomment-98568325 I found two places where taskMemoryManager is used: DAGScheduler#runLocallyWithinThread() Executor#run() taskMemoryManager is constructed in both places. See if I missed any reference. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Set null as default value for TaskContextImpl#...
Github user tedyu closed the pull request at: https://github.com/apache/spark/pull/5874 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Set null as default value for TaskContextImpl#...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/5874#issuecomment-98571954 Right. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7076][SPARK-7077][SPARK-7080][SQL] Use ...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/5725#issuecomment-97511381 sup**[sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeFixedWidthAggregationMap.java, line 133 \[r14\]](https://reviewable.io:443/reviews/apache/spark/5725#-Jo56lmUPtN4oJEAxApu)** ([raw file](https://github.com/apache/spark/blob/512bd94d463f741170e904ae186e238f997c/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeFixedWidthAggregationMap.java#L133)):/sup Please include writtenLength, unsafeRow.length and javaRow in the assertion so that we have more information when debugging. --- sup**[sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java, line 148 \[r14\]](https://reviewable.io:443/reviews/apache/spark/5725#-Jo57pOAjJQa1nPxxd2y)** ([raw file](https://github.com/apache/spark/blob/512bd94d463f741170e904ae186e238f997c/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java#L148)):/sup Condition doesn't match assertion message w.r.t. = --- sup**[unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java, line 23 \[r14\]](https://reviewable.io:443/reviews/apache/spark/5725#-Jo5AmeniFC0_bDQdiyb)** ([raw file](https://github.com/apache/spark/blob/512bd94d463f741170e904ae186e238f997c/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java#L23)):/sup Declare this class static ? --- sup**[unsafe/src/main/java/org/apache/spark/unsafe/array/LongArray.java, line 26 \[r14\]](https://reviewable.io:443/reviews/apache/spark/5725#-Jo5B0VcBmjKX4TnrZD7)** ([raw file](https://github.com/apache/spark/blob/512bd94d463f741170e904ae186e238f997c/unsafe/src/main/java/org/apache/spark/unsafe/array/LongArray.java#L26)):/sup nit: in-heap - on-heap --- sup**[unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSet.java, line 24 \[r14\]](https://reviewable.io:443/reviews/apache/spark/5725#-Jo5BT5TTxBoPNV_juFI)** ([raw file](https://github.com/apache/spark/blob/512bd94d463f741170e904ae186e238f997c/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSet.java#L24)):/sup https://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html doesn't serve your needs ? --- sup**[unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSetMethods.java, line 75 \[r14\]](https://reviewable.io:443/reviews/apache/spark/5725#-Jo5C4Ty7HYh-hvVPSEb)** ([raw file](https://github.com/apache/spark/blob/512bd94d463f741170e904ae186e238f997c/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSetMethods.java#L75)):/sup Maybe use something similar to http://www.docjar.com/docs/api/sun/misc/Unsafe.html#getInt(Object, long) for speedup ? --- --- Comments from the [review on Reviewable.io](https://reviewable.io:443/reviews/apache/spark/5725) !-- Sent from Reviewable.io -- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6954. [YARN] ExecutorAllocationManager c...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/5704#discussion_r29382639 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -281,21 +285,30 @@ private[spark] class ExecutorAllocationManager( */ private def addExecutors(maxNumExecutorsNeeded: Int): Int = { // Do not request more executors if it would put our target over the upper bound -val currentTarget = targetNumExecutors -if (currentTarget = maxNumExecutors) { - logDebug(sNot adding executors because there are already ${executorIds.size} + -sregistered and $numExecutorsPending pending executor(s) (limit $maxNumExecutors)) +if (numExecutorsTarget = maxNumExecutors) { + val numExecutorsPending = numExecutorsTarget - executorIds.size + logDebug(sNot adding executors because there are already ${executorIds.size} registered + +sand ${numExecutorsPending} pending executor(s) (limit $maxNumExecutors)) --- End diff -- nit: insert a space before 'and' --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-7107 Add parameter for zookeeper.znode.p...
GitHub user tedyu opened a pull request: https://github.com/apache/spark/pull/5673 SPARK-7107 Add parameter for zookeeper.znode.parent to hbase_inputformat... py You can merge this pull request into a Git repository by running: $ git pull https://github.com/tedyu/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5673.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5673 commit 18d172a95710c85bb5e2afea83f2c41889d71597 Author: tedyu yuzhih...@gmail.com Date: 2015-04-24T01:12:15Z SPARK-7107 Add parameter for zookeeper.znode.parent to hbase_inputformat.py --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6085 Increase default value for memory o...
GitHub user tedyu opened a pull request: https://github.com/apache/spark/pull/4836 SPARK-6085 Increase default value for memory overhead You can merge this pull request into a Git repository by running: $ git pull https://github.com/tedyu/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4836.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4836 commit 1fdd4df059fd013da56619c884e1e3a403605beb Author: tedyu yuzhih...@gmail.com Date: 2015-02-28T21:15:46Z SPARK-6085 Increase default value for memory overhead --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6085 Increase default value for memory o...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/4836#issuecomment-76551769 Thanks for the reminder, updated accordingly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6045 RecordWriter should be checked agai...
GitHub user tedyu opened a pull request: https://github.com/apache/spark/pull/4794 SPARK-6045 RecordWriter should be checked against null in PairRDDFunctio... ...ns#saveAsNewAPIHadoopDataset You can merge this pull request into a Git repository by running: $ git pull https://github.com/tedyu/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4794.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4794 commit 2d8d4b1fb61f5aa73bbd824e429e4f388bc84511 Author: tedyu yuzhih...@gmail.com Date: 2015-02-26T20:57:15Z SPARK-6045 RecordWriter should be checked against null in PairRDDFunctions#saveAsNewAPIHadoopDataset --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/4021#discussion_r25209770 --- Diff: core/src/main/scala/org/apache/spark/Accumulators.scala --- @@ -320,7 +334,13 @@ private[spark] object Accumulators { def add(values: Map[Long, Any]): Unit = synchronized { for ((id, value) - values) { if (originals.contains(id)) { -originals(id).asInstanceOf[Accumulable[Any, Any]] ++= value +// Since we are now storing weak references, we must check whether the underlying data +// is valid. +originals(id).get match { + case Some(accum) = accum.asInstanceOf[Accumulable[Any, Any]] ++= value + case None = +throw new IllegalAccessError(Attempted to access garbage collected Accumulator.) --- End diff -- If Accumulator is garbage collected, should we log and continue ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/4021#issuecomment-75666737 Not yet. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1297 Upgrade HBase dependency to 0.98
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/3115#issuecomment-63339550 Thanks Sean for chiming in. bq. Most people wouldn't depend on the server Mapreduce related classes, e.g. TableMapReduceUtil, are in hbase-server --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Exclude dependency on hbase-annotations module
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/3286#issuecomment-63357810 @pwendell I logged SPARK-4455 Let me know if I need to create another pull request due to the merge --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4455 Exclude dependency on hbase-annotat...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/3286#issuecomment-63381658 These annotations are used to indicate the audience / stability of HBase APIs. They're not needed by HBase clients. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Exclude dependency on hbase-annotations module
GitHub user tedyu opened a pull request: https://github.com/apache/spark/pull/3286 Exclude dependency on hbase-annotations module @pwendell Please take a look You can merge this pull request into a Git repository by running: $ git pull https://github.com/tedyu/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3286.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3286 commit 2f28b088cdfa99fd68989e600c3b6bc856845c80 Author: tedyu yuzhih...@gmail.com Date: 2014-11-15T15:22:46Z Exclude dependency on hbase-annotations module --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1297 Upgrade HBase dependency to 0.98
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/1893#issuecomment-61847146 I will create another pull request since my local workspace has become stale. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1297 Upgrade HBase dependency to 0.98
Github user tedyu closed the pull request at: https://github.com/apache/spark/pull/1893 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1297 Upgrade HBase dependency to 0.98
GitHub user tedyu opened a pull request: https://github.com/apache/spark/pull/3115 SPARK-1297 Upgrade HBase dependency to 0.98 @pwendell @rxin Please take a look You can merge this pull request into a Git repository by running: $ git pull https://github.com/tedyu/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3115.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3115 commit 2b079c8c5b15643ec85daeb0b9e360d26fdec0e9 Author: tedyu yuzhih...@gmail.com Date: 2014-11-05T17:27:59Z SPARK-1297 Upgrade HBase dependency to 0.98 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1297 Upgrade HBase dependency to 0.98
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/3115#issuecomment-61866705 hbase 0.98 has been declared stable release. Since hbase 0.94 is not modularized, compilation against 0.94 or earlier releases wouldn't be supported. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1297 Upgrade HBase dependency to 0.98
GitHub user tedyu opened a pull request: https://github.com/apache/spark/pull/1893 SPARK-1297 Upgrade HBase dependency to 0.98 Two profiles are added to examples/pom.xml : hbase-hadoop1 (default) hbase-hadoop2 I verified that compilation passes with either profile active. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tedyu/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1893.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1893 commit 70fb7b4ea8fd7647e4a4ddca4df71521b749521c Author: tedyu yuzhih...@gmail.com Date: 2014-08-11T17:21:13Z SPARK-1297 Upgrade HBase dependency to 0.98 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1297 Upgrade HBase dependency to 0.98
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/1893#discussion_r16066911 --- Diff: examples/pom.xml --- @@ -45,6 +45,39 @@ /dependency /dependencies /profile +profile + idhbase-hadoop2/id + activation +property + namehbase.profile/name + valuehadoop2/value +/property + /activation + properties +protobuf.version2.5.0/protobuf.version --- End diff -- Will drop in next PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1297 Upgrade HBase dependency to 0.98
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/1893#discussion_r16066969 --- Diff: examples/pom.xml --- @@ -45,6 +45,39 @@ /dependency /dependencies /profile +profile + idhbase-hadoop2/id + activation +property + namehbase.profile/name + valuehadoop2/value +/property + /activation + properties +protobuf.version2.5.0/protobuf.version +hbase.version0.98.4-hadoop2/hbase.version + /properties + dependencyManagement +dependencies +/dependencies + /dependencyManagement +/profile +profile + idhbase-hadoop1/id + activation +property + name!hbase.profile/name +/property + /activation + properties +hbase.version0.98.4-hadoop1/hbase.version --- End diff -- Can this be done when the need comes ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1297 Upgrade HBase dependency to 0.98
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/1893#discussion_r16067144 --- Diff: examples/pom.xml --- @@ -110,36 +143,52 @@ version${project.version}/version /dependency dependency - groupIdorg.apache.hbase/groupId - artifactIdhbase/artifactId - version${hbase.version}/version - exclusions -exclusion - groupIdasm/groupId - artifactIdasm/artifactId -/exclusion -exclusion - groupIdorg.jboss.netty/groupId - artifactIdnetty/artifactId -/exclusion -exclusion - groupIdio.netty/groupId - artifactIdnetty/artifactId -/exclusion -exclusion - groupIdcommons-logging/groupId - artifactIdcommons-logging/artifactId -/exclusion -exclusion - groupIdorg.jruby/groupId - artifactIdjruby-complete/artifactId -/exclusion - /exclusions -/dependency -dependency groupIdorg.eclipse.jetty/groupId artifactIdjetty-server/artifactId /dependency + dependency +groupIdorg.apache.hbase/groupId +artifactIdhbase-testing-util/artifactId +version${hbase.version}/version +exclusions + exclusion +groupIdorg.jruby/groupId +artifactIdjruby-complete/artifactId + /exclusion +/exclusions + /dependency + dependency +groupIdorg.apache.hbase/groupId +artifactIdhbase-protocol/artifactId +version${hbase.version}/version + /dependency + dependency +groupIdorg.apache.hbase/groupId +artifactIdhbase-common/artifactId +version${hbase.version}/version + /dependency + dependency +groupIdorg.apache.hbase/groupId +artifactIdhbase-client/artifactId +version${hbase.version}/version + /dependency + dependency +groupIdorg.apache.hbase/groupId --- End diff -- hbase-server is needed. Hive hbase-handler does the same. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1297 Upgrade HBase dependency to 0.98
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/1893#discussion_r16068521 --- Diff: examples/pom.xml --- @@ -110,36 +143,52 @@ version${project.version}/version /dependency dependency - groupIdorg.apache.hbase/groupId - artifactIdhbase/artifactId - version${hbase.version}/version - exclusions -exclusion - groupIdasm/groupId - artifactIdasm/artifactId -/exclusion -exclusion - groupIdorg.jboss.netty/groupId - artifactIdnetty/artifactId -/exclusion -exclusion - groupIdio.netty/groupId - artifactIdnetty/artifactId -/exclusion -exclusion - groupIdcommons-logging/groupId - artifactIdcommons-logging/artifactId -/exclusion -exclusion - groupIdorg.jruby/groupId - artifactIdjruby-complete/artifactId -/exclusion - /exclusions -/dependency -dependency groupIdorg.eclipse.jetty/groupId artifactIdjetty-server/artifactId /dependency + dependency +groupIdorg.apache.hbase/groupId +artifactIdhbase-testing-util/artifactId +version${hbase.version}/version +exclusions + exclusion +groupIdorg.jruby/groupId +artifactIdjruby-complete/artifactId + /exclusion +/exclusions + /dependency + dependency +groupIdorg.apache.hbase/groupId +artifactIdhbase-protocol/artifactId +version${hbase.version}/version + /dependency + dependency +groupIdorg.apache.hbase/groupId +artifactIdhbase-common/artifactId +version${hbase.version}/version + /dependency + dependency +groupIdorg.apache.hbase/groupId --- End diff -- Added exclusions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1297 Upgrade HBase dependency to 0.98
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/1893#discussion_r16068638 --- Diff: examples/pom.xml --- @@ -45,6 +45,39 @@ /dependency /dependencies /profile +profile + idhbase-hadoop2/id + activation +property + namehbase.profile/name + valuehadoop2/value +/property + /activation + properties +protobuf.version2.5.0/protobuf.version +hbase.version0.98.4-hadoop2/hbase.version + /properties + dependencyManagement +dependencies +/dependencies + /dependencyManagement +/profile +profile + idhbase-hadoop1/id + activation +property + name!hbase.profile/name +/property + /activation + properties +hbase.version0.98.4-hadoop1/hbase.version --- End diff -- When hbase.version is specified on command line, that should be effective, right ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1127 Add spark-hbase.
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/194#discussion_r10862801 --- Diff: external/hbase/src/main/scala/org/apache/spark/nosql/hbase/HBaseUtils.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.nosql.hbase + +import org.apache.hadoop.io.Text +import org.apache.spark.rdd.RDD + +/** + * A public object that provide HBase supports. + * You could save RDD into HBase through [[org.apache.spark.nosql.hbase.HBaseUtils.saveAsHBaseTable]] method. + */ +object HBaseUtils { + + /** + * Save [[org.apache.spark.rdd.RDD[Text]]] as a HBase table + * @param rdd [[org.apache.spark.rdd.RDD[Text]]] + * @param zkHost the zookeeper hosts. e.g. 10.232.98.10,10.232.98.11,10.232.98.12 + * @param zkPort the zookeeper client listening port. e.g. 2181 + * @param zkNode the zookeeper znode of HBase. e.g. hbase-apache + * @param table the name of table which we save records + * @param rowkeyType the type of rowkey. [[org.apache.spark.nosql.hbase.HBaseType]] + * @param columns the column list. [[org.apache.spark.nosql.hbase.HBaseColumn]] + * @param delimiter the delimiter which used to split record into fields + */ + def saveAsHBaseTable(rdd: RDD[Text], zkHost: String, zkPort: String, zkNode: String, + table: String, rowkeyType: String, columns: List[HBaseColumn], delimiter: Char) { +val conf = new HBaseConf(zkHost, zkPort, zkNode, table, rowkeyType, columns, delimiter) + +def writeToHBase(iter: Iterator[Text]) { + val writer = new SparkHBaseWriter(conf) + + try { +writer.init() +while (iter.hasNext) { + val record = iter.next() + writer.write(record) +} + } finally { +writer.close() + } --- End diff -- See my comment in close() method below. If init() throws exception, htable would be null. With an additional null check, writer.close() should be a no-op. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---