[GitHub] spark pull request #15297: [WIP][SPARK-9862]Handling data skew
Github user SaintBacchus commented on a diff in the pull request: https://github.com/apache/spark/pull/15297#discussion_r82730696 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -138,13 +138,16 @@ private[spark] abstract class MapOutputTracker(conf: SparkConf) extends Logging * and the second item is a sequence of (shuffle block id, shuffle block size) tuples * describing the shuffle blocks that are stored at that block manager. */ - def getMapSizesByExecutorId(shuffleId: Int, startPartition: Int, endPartition: Int) + def getMapSizesByExecutorId(shuffleId: Int, startPartition: Int, endPartition: Int, + mapid: Int = -1) --- End diff -- It's better to use `Seq[Int]` to fetch many maps in one time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15297: [WIP][SPARK-9862]Handling data skew
Github user SaintBacchus commented on a diff in the pull request: https://github.com/apache/spark/pull/15297#discussion_r82728585 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SkewShuffleRowRDD.scala --- @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import java.util.Arrays + +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark._ +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.InternalRow + +class SkewCoalescedPartitioner( +val parent: Partitioner, --- End diff -- Nit: code format --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14887: [SPARK-17321][YARN] YARN shuffle service should use good...
Github user SaintBacchus commented on the issue: https://github.com/apache/spark/pull/14887 If there are some bad disk in local-dirs, `NodeManager` will not pass these bad disk to spark executor. So it's not necessary to check it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14530: [SPARK-16868][Web Ui] Fix executor be both dead and aliv...
Github user SaintBacchus commented on the issue: https://github.com/apache/spark/pull/14530 I will re-run this case, and dig into why the executor will double register. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14530: [SPARK-16868][Web Ui] Fix executor be both dead and aliv...
Github user SaintBacchus commented on the issue: https://github.com/apache/spark/pull/14530 \cc @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14534: [SPARK-16941]Use concurrentHashMap instead of scala Map ...
Github user SaintBacchus commented on the issue: https://github.com/apache/spark/pull/14534 any other comment? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14534: [SPARK-16941]Use concurrentHashMap instead of sca...
Github user SaintBacchus commented on a diff in the pull request: https://github.com/apache/spark/pull/14534#discussion_r74028969 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/server/SparkSQLOperationManager.scala --- @@ -39,15 +38,19 @@ private[thriftserver] class SparkSQLOperationManager() val handleToOperation = ReflectionUtils .getSuperField[JMap[OperationHandle, Operation]](this, "handleToOperation") - val sessionToActivePool = Map[SessionHandle, String]() - val sessionToContexts = Map[SessionHandle, SQLContext]() + val sessionToActivePool = new ConcurrentHashMap[SessionHandle, String]() --- End diff -- `sessionToActivePool` and `sessionToContext` will be used in `SparkSQLSessionManager` at `openSession` and `closeSession` methond. To make this field as private, it must add new funciton here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14534: [SPARK-16941]Use concurrentHashMap instead of sca...
Github user SaintBacchus commented on a diff in the pull request: https://github.com/apache/spark/pull/14534#discussion_r74023262 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/server/SparkSQLOperationManager.scala --- @@ -39,15 +38,19 @@ private[thriftserver] class SparkSQLOperationManager() val handleToOperation = ReflectionUtils .getSuperField[JMap[OperationHandle, Operation]](this, "handleToOperation") - val sessionToActivePool = Map[SessionHandle, String]() - val sessionToContexts = Map[SessionHandle, SQLContext]() + val sessionToActivePool = new ConcurrentHashMap[SessionHandle, String]() --- End diff -- the whole class is private, it this necessary to make flied to be private? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14534: [SPARK-16941]Use concurrentHashMap instead of scala Map ...
Github user SaintBacchus commented on the issue: https://github.com/apache/spark/pull/14534 cc/ @srowen Is this OK? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14530: [SPARK-16868][Web Ui] Fix executor be both dead a...
Github user SaintBacchus commented on a diff in the pull request: https://github.com/apache/spark/pull/14530#discussion_r73827019 --- Diff: core/src/main/scala/org/apache/spark/storage/StorageStatusListener.scala --- @@ -77,6 +77,18 @@ class StorageStatusListener(conf: SparkConf) extends SparkListener { val maxMem = blockManagerAdded.maxMem val storageStatus = new StorageStatus(blockManagerId, maxMem) executorIdToStorageStatus(executorId) = storageStatus + + // Try to remove the dead storage status if same executor register the block manger twice. + removeDeadExecutorStorageStatus(executorId) +} + } + + private def removeDeadExecutorStorageStatus(executorId: String): Unit = { +deadExecutorStorageStatus.zipWithIndex.foreach { case (status, index) => --- End diff -- `retain` seems to be a method in `MapLike`, but I can't find any similar method in `ListBuffer`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14534: [SPARK-16941]Add SynchronizedMap trait with Map i...
GitHub user SaintBacchus opened a pull request: https://github.com/apache/spark/pull/14534 [SPARK-16941]Add SynchronizedMap trait with Map in SparkSQLOperationManager. ## What changes were proposed in this pull request? ThriftServer will have some thread-safe problem in **SparkSQLOperationManager**. Add a SynchronizedMap trait for the maps in it to avoid this problem. Details in [SPARK-16941](https://issues.apache.org/jira/browse/SPARK-16941) ## How was this patch tested? NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/SaintBacchus/spark SPARK-16941 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14534.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14534 commit 4af58bc3c9e3ff436e6258aff96a663cf55aa8ba Author: huangzhaowei Date: 2016-08-08T04:06:17Z Add SynchronizedMap trait with Map in SparkSQLOperationManager to avoid concurrency problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14530: [SPARK-16868][Web Ui] Fix executor be both dead a...
GitHub user SaintBacchus opened a pull request: https://github.com/apache/spark/pull/14530 [SPARK-16868][Web Ui] Fix executor be both dead and alive on executor ui. ## What changes were proposed in this pull request? In a heavy pressure of the spark application, since the executor will register it to driver block manager twice(because of heart beats), the executor will show as picture show: ![image](https://cloud.githubusercontent.com/assets/7404824/17467245/c1359094-5d4e-11e6-843a-f6d6347e1bf6.png) ## How was this patch tested? NA Details in: [SPARK-16868](https://issues.apache.org/jira/browse/SPARK-16868) You can merge this pull request into a Git repository by running: $ git pull https://github.com/SaintBacchus/spark SPARK-16868 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14530.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14530 commit 6fe4d13fb743f9f3ca5808ba3a7c7c6923e45d0a Author: huangzhaowei Date: 2016-08-03T08:37:17Z Try to remove dead storage status on BlockManagerAdded event to avoid duplicate executor in WebUI. commit 85b385f47c0751549befc00a31bb554e24443932 Author: huangzhaowei Date: 2016-08-05T02:04:22Z Merge branch 'master' into SPARK-16868 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14679] [UI] Fix UI DAG visualization OO...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/12437#issuecomment-211227098 @rdblue Can this PR fix the case like this: ```java 2016-02-24 15:40:20,260 | ERROR | [qtp1927776715-4120] | Failed to make dot file of stage 619 | org.apache.spark.Logging$class.logError(Logging.scala:96) java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10473][YARN]Login again in the driver t...
Github user SaintBacchus closed the pull request at: https://github.com/apache/spark/pull/8942 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12523][YARN]Support long-running of the...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/10645#issuecomment-188588644 @tgravescs we supported to run `spark on hbase` beyond 7 days. It work well but I did not test with hive `metastore` and this is the similar case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][SPARK-13482][Configuration]Make consis...
GitHub user SaintBacchus opened a pull request: https://github.com/apache/spark/pull/11360 [Minor][SPARK-13482][Configuration]Make consistency of the configuraiton named in TransportConf. `spark.storage.memoryMapThreshold` has two kind of the value, one is 2*1024*1024 as integer and the other one is '2m' as string. "2m" is recommanded in document but it will go wrong if the code goes into `TransportConf#memoryMapBytes`. [Jira](https://issues.apache.org/jira/browse/SPARK-13482) You can merge this pull request into a Git repository by running: $ git pull https://github.com/SaintBacchus/spark SPARK-13482 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11360.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11360 commit f8367ee7f9685503b8ef495b1cd34047e4926af4 Author: huangzhaowei Date: 2016-02-25T03:13:16Z Make consistency of the configuraiton named in TransportConf. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Streaming][UI][SPARK-12672]Use the uiRoot fun...
GitHub user SaintBacchus opened a pull request: https://github.com/apache/spark/pull/10617 [Streaming][UI][SPARK-12672]Use the uiRoot function instead of default root path to gain the streaming batch url. You can merge this pull request into a Git repository by running: $ git pull https://github.com/SaintBacchus/spark SPARK-12672 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10617.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10617 commit 70a12b68f157d5f3175941cca8624fa32e702f65 Author: huangzhaowei Date: 2016-01-06T08:13:45Z Use the uiRoot function instead of default root path to gain the streaming batch url. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12316] Wait a minutes to avoid cycle ca...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/10475#issuecomment-168105166 This only work in cluster, but this is easy to reproduce in cluster 1. Start-up a yarn-client spark application 2. Remove the staging dir when AM finished write the token to HDFS but the driver had not read it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12316] Wait a minutes to avoid cycle ca...
GitHub user SaintBacchus opened a pull request: https://github.com/apache/spark/pull/10475 [SPARK-12316] Wait a minutes to avoid cycle calling. When application end, AM will clean the staging dir. But if the driver trigger to update the delegation token, it will can't find the right token file and then it will endless cycle call the method 'updateCredentialsIfRequired'. Then it lead driver StackOverflowError. https://issues.apache.org/jira/browse/SPARK-12316 You can merge this pull request into a Git repository by running: $ git pull https://github.com/SaintBacchus/spark SPARK-12316 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10475.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10475 commit b1ba56be4dba90933c5a17dfd875f6a9d9f74b6e Author: huangzhaowei Date: 2015-12-25T07:18:12Z Wait a minutes to avoid cycle calling. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10766][SPARK-SUBMIT]Add some configurat...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/8918#issuecomment-158258804 OK, I close. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10766][SPARK-SUBMIT]Add some configurat...
Github user SaintBacchus closed the pull request at: https://github.com/apache/spark/pull/8918 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11043][SQL]BugFix:Set the operator log ...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/9056#issuecomment-153281531 I had modify the code as you @chenghao-intel comment and also add a simple test case for it @JoshRosen . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11043][SQL]BugFix:Set the operator log ...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/9056#issuecomment-149757304 It does the same issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10473][YARN]Login again in the driver t...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/8942#issuecomment-149442226 We had considered to the way to reopen the file. In that way it may have to consider the synchronization problem between event log producer and consumer with more codes. Later, I found this way and it's more clear. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10473][YARN]Login again in the driver t...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/8942#issuecomment-149439492 I'm not very clear about how to use `doAs` for the `EventLoggingListener`. You can open a PR and I will help to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10766][SPARK-SUBMIT]Add some configurat...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/8918#issuecomment-148879845 @andrewor14 as I described in [JIRA](https://issues.apache.org/jira/browse/SPARK-10766), in yarn-cluster mode, it's hard to set the class path of the client process. But if I want use the hbase, I had to set the hbase jars into the class path of this process. BTW, I think spark user may want to do some thing in the this process, so I think it's better to enhance the configuration of client. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11000][YARN]Bug fix: Derby have booted ...
Github user SaintBacchus commented on a diff in the pull request: https://github.com/apache/spark/pull/9026#discussion_r42205797 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -1272,11 +1272,24 @@ object Client extends Logging { val mirror = universe.runtimeMirror(getClass.getClassLoader) try { -val hiveClass = mirror.classLoader.loadClass("org.apache.hadoop.hive.ql.metadata.Hive") -val hive = hiveClass.getMethod("get").invoke(null) - -val hiveConf = hiveClass.getMethod("getConf").invoke(hive) val hiveConfClass = mirror.classLoader.loadClass("org.apache.hadoop.hive.conf.HiveConf") +val hiveConf = hiveConfClass.newInstance() + +// Set metastore to be a local temp directory to avoid conflict of the `metaStore client` +// in `HiveContext` which will use the same derby dataBase by default. +val hiveConfSet = (param: String, value: String) => hiveConfClass + .getMethod("set", classOf[Unit]) + .invoke(hiveConf, param, value) +val tempDir = Utils.createTempDir() +val localMetastore = new File(tempDir, "metastore") +hiveConfSet("hive.metastore.warehouse.dir", localMetastore.toURI.toString) +hiveConfSet("javax.jdo.option.ConnectionURL", + s"jdbc:derby:;databaseName=${localMetastore.getAbsolutePath};create=true") +hiveConfSet("datanucleus.rdbms.datastoreAdapterClassName", + "org.datanucleus.store.rdbms.adapter.DerbyAdapter") + +val hiveClass = mirror.classLoader.loadClass("org.apache.hadoop.hive.ql.metadata.Hive") +val hive = hiveClass.getMethod("get").invoke(null, hiveConf.asInstanceOf[Object]) --- End diff -- Good idea --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11000][YARN]Bug fix: Derby have booted ...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/9026#issuecomment-147660393 Actually there are two metastores. In hive-1.2.1 when we use `metastoe.Hive`, it will create the metastore in static code block. As spark have two class loader(main class loader and hive metastore class loader), there will be two metasotres. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11000][YARN]Bug fix: Derby have booted ...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/9026#issuecomment-147614244 @srowen In this issue there is only one `HiveContext`, but there will have two `metastoe.Hive` instance in two different class loaderes. And in the implement of `metastoe.Hive` it will create the each database instance in loading this class. So we have to set the configuration `javax.jdo.option.ConnectionURL` to a temp dir to avoid the problem I mentioned. And actually this logic was refer to the implement of [SparkSQLCLIDriver](https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala#L84)ã --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11043][SQL]BugFix:Set the operator log ...
GitHub user SaintBacchus opened a pull request: https://github.com/apache/spark/pull/9056 [SPARK-11043][SQL]BugFix:Set the operator log in the thrift server. `SessionManager` will set the `operationLog` if the configuration `hive.server2.logging.operation.enabled` is true in version of hive 1.2.1. But the spark did not adapt to this change, so no matter enabled the configuration or not, spark thrift server will always log the warn message. PS: if `hive.server2.logging.operation.enabled` is false, it should log the warn message (the same as hive thrift server). You can merge this pull request into a Git repository by running: $ git pull https://github.com/SaintBacchus/spark SPARK-11043 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9056.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9056 commit 74b2a46d269ef91857f0d3aed203e171dad7eef1 Author: huangzhaowei Date: 2015-10-10T02:22:08Z [SPARK-11043][SQL]BugFix:Set the operator log in the thrift server. commit eeb04490198052c4e013bf4bdcf68e77eac5eea8 Author: huangzhaowei Date: 2015-10-10T02:31:04Z Fix the code style. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11000][YARN]Bug fix: Derby have booted ...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/9026#issuecomment-147023274 /cc @marmbrus @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11000][YARN]Bug fix: Derby have booted ...
GitHub user SaintBacchus opened a pull request: https://github.com/apache/spark/pull/9026 [SPARK-11000][YARN]Bug fix: Derby have booted the database twice in yarn security mode. [obtainTokenForHiveMetastore](https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L1267) in yarn.Client.scala will init the `Hive`. It will create a connect to the database and the meta store client in `HiveContext` will also create a connect to the database. If use the derby by default, it will go wrong. So I specilized the configuration of the `javax.jdo.option.ConnectionURL` in the `obtainTokenForHiveMetastore` to avoid this issue. You can merge this pull request into a Git repository by running: $ git pull https://github.com/SaintBacchus/spark SPARK-11000 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9026.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9026 commit 0fab8c74977927be9a505025754b39fcbef9d614 Author: huangzhaowei Date: 2015-10-08T07:38:10Z [SPARK-11000][YARN]Bug fix: Derby have booted the database twice in yarn security mode. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10473][YARN]Login again in the driver t...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/8942#issuecomment-146425804 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10766][SPARK-SUBMIT]Add some configurat...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/8918#issuecomment-146390642 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10786][SQL]Take the whole statement to ...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/8895#issuecomment-146390491 @liancheng Can you take a look at this small change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10755][YARN]Set driver also update the ...
Github user SaintBacchus closed the pull request at: https://github.com/apache/spark/pull/8867 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10473][YARN]Login again in the driver t...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/8942#issuecomment-146390416 Yeah @tgravescs I'm running in yarn client mode. I'm sure that `HDFS_DELEGATION_TOKEN token 2339 for spark` is the original token gained by the driver. But I don't know which is the valid token used for the event-log writer. I set `dfs.namenode.delegation.token.max-lifetime` to be 5 minutes. In our test, the event log will work fine if the login again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10473][YARN]Login again in the driver t...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/8942#issuecomment-144272777 @harishreedharan The evenLog will still be stopped by the `token expired` exception. The event log was a long-running output stream, #8867 can't update its inner token. ``` java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:153) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:153) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:153) at org.apache.spark.scheduler.EventLoggingListener.onStageCompleted(EventLoggingListener.scala:176) at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:32) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:32) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:32) at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:56) at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:82) at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1217) at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:66) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 2339 for spark) can't be found in cache at org.apache.hadoop.ipc.Client.call(Client.java:1511) at org.apache.hadoop.ipc.Client.call(Client.java:1442) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy15.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:416) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy16.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1652) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1453) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:579) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10755][YARN]Set driver also update the ...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/8867#issuecomment-144262717 @tgravescs @harishreedharan this fix will still loss the event log, maybe it's not a better approach. so we had raise a new [approach](https://github.com/apache/spark/pull/8942) to resolve this issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10473][YARN]Login again in the driver t...
GitHub user SaintBacchus opened a pull request: https://github.com/apache/spark/pull/8942 [SPARK-10473][YARN]Login again in the driver to avoid the events lossing. As discussed with @tgravescs and @harishreedharan at the [8867](https://github.com/apache/spark/pull/8867#issuecomment-142970395), if the `SaslRpcClient`'s authentication is *TOKEN*, it will have the `token expired` exception. But if the authentication is *KERBEROS*`, it will renew the token automatically. This modify can change to authentication from *TOKEN * into *KERBEROS *. You can merge this pull request into a Git repository by running: $ git pull https://github.com/SaintBacchus/spark SPARK-10473 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8942.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8942 commit fd1f73531514865ecf0b632af628650b0b6f1983 Author: huangzhaowei Date: 2015-09-30T02:03:00Z [SPARK-10473][YARN]Login again in the driver to avoid the events lossing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10755][YARN]Set driver also update the ...
Github user SaintBacchus commented on a diff in the pull request: https://github.com/apache/spark/pull/8867#discussion_r40750990 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -544,6 +545,7 @@ private[spark] class Client( logInfo(s"Credentials file set to: $credentialsFile") val renewalInterval = getTokenRenewalInterval(stagingDirPath) sparkConf.set("spark.yarn.token.renewal.interval", renewalInterval.toString) + SparkHadoopUtil.get.startExecutorDelegationTokenRenewer(sparkConf) --- End diff -- This code had change the configuration `spark.yarn.credentials.file`, and this configuration would be used in `DelegationTokenUpdate`, so it had to put the `start` after this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8839][SQL]High concurrence will also ca...
Github user SaintBacchus closed the pull request at: https://github.com/apache/spark/pull/7889 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10755][YARN]Set driver also update the ...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/8867#issuecomment-143173483 @tgravescs I had noticed the code of `UserGroupInformation.loginUserFromKeytab(args.principal, args.keytab)` After this login motivation, `yarn.Client` will change *KERBEROS* into *TOKEN* for the purpose of setting the token for the AM. ```scala /** Set up security tokens for launching our ApplicationMaster container. */ private def setupSecurityToken(amContainer: ContainerLaunchContext): Unit = { val dob = new DataOutputBuffer credentials.writeTokenStorageToStream(dob) amContainer.setTokens(ByteBuffer.wrap(dob.getData)) } ``` After this, the `Client` will use *TOKEN* in the RPC connection. If I login again with keytab after this, the SAALClient will use *KERBEROS* again, and this can avoid token expired exception. I had tested the recent spark and it still will throw this exception. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10766][SPARK-SUBMIT]Add some configurat...
GitHub user SaintBacchus opened a pull request: https://github.com/apache/spark/pull/8918 [SPARK-10766][SPARK-SUBMIT]Add some configuration for the client process in cluster mode. Add this four configurations for the client only in cluster mode: * `spark.client.memory` of property and `--client-memory` of cli command * `spark.client.extraClassPath` of property and `--client-class-path` of cli command * `spark.client.extraJavaOptions` of property and `--client-java-options` of cli command * `spark.client.extraLibraryPath` of property and `--client-library-path` of cli command You can merge this pull request into a Git repository by running: $ git pull https://github.com/SaintBacchus/spark SPARK-10766 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8918.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8918 commit 94a707ae2fcb4d41718e97160ec905876f716193 Author: huangzhaowei Date: 2015-09-25T09:21:21Z [SPARK-10766][SPARK-SUBMIT]Add some configuration for the client process in cluster mode. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10755][YARN]Set driver also update the ...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/8867#issuecomment-142970395 @tgravescs I wrote a simple `DFSCliet` application to continuously write string into hdfs and this can work over the configuration `dfs.namenode.delegation.token.max-lifetime`. So I open the DEBUG logging and find some regularities: If using the **KERBEROS** to gain the authority of the `NameNode`, it can work over the it. > 15/09/24 19:53:38 DEBUG SaslRpcClient: Use **KERBEROS** authentication for protocol ClientNamenodeProtocolPB But if using **TOKEN**, the application may existed with *token expired exception*. > 15/09/24 19:53:58 DEBUG SaslRpcClient: Use **TOKEN** authentication for protocol ClientNamenodeProtocolPB Spark was using the *Token*. One way to resolve this issue is that login with keytab again then the mode of the `SaslRpcClient` will be changed into *KERBEROS*. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10786][SQL]Take the whole statement to ...
GitHub user SaintBacchus opened a pull request: https://github.com/apache/spark/pull/8895 [SPARK-10786][SQL]Take the whole statement to generate the CommandProcessor In the now implementation of `SparkSQLCLIDriver.scala`: `val proc: CommandProcessor = CommandProcessorFactory.get(Array(tokens(0)), hconf)` `CommandProcessorFactory` only take the first token of the statement, and this will be hard to diff the statement `delete jar xxx` and `delete from xxx`. So maybe it's better to take the whole statement into the `CommandProcessorFactory`. And in [HiveCommand](https://github.com/SaintBacchus/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/processors/HiveCommand.java#L76), it already special handing these two statement. ```java if(command.length > 1 && "from".equalsIgnoreCase(command[1])) { //special handling for SQL "delete from where..." return null; } ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/SaintBacchus/spark SPARK-10786 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8895.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8895 commit d44672e3c8cf068c899392a870efa86e274bfde3 Author: huangzhaowei Date: 2015-09-24T02:16:00Z [SPARK-10786][SQL]Take the whole statement to generate the CommandProcessor --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10755][YARN]Set driver also update the ...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/8867#issuecomment-142516215 @harishreedharan I set `fs.hdfs.impl.disable.cache` to avoid cache mechanism in the hadoop. I tested in the yarn-client mode, if I apply this pr the application will be OK and I remove it the application goes down. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10755][YARN]Set driver also update the ...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/8867#issuecomment-142468611 @harishreedharan @tgravescs Hadoop RPC actually will do the re-login with the keytab but the token only can persist 7 days by default. So it must be updated. The test step is below: 1. shorter the configutation `dfs.namenode.delegation.token.max-lifetime` and `dfs.namenode.delegation.token.renew-interval`, maybe 10min 2.start a `spark-shell` or `spark-sql` 3.After 15min, execuse a job Then the application will fail with token expired exception. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10755][YARN]Set driver also update the ...
GitHub user SaintBacchus opened a pull request: https://github.com/apache/spark/pull/8867 [SPARK-10755][YARN]Set driver also update the token for long-running application In the yarn-client mode, driver will write the event logs into hdfs and get the partition information from hdfs, so it's nessary to update the token from the `AMDelegationTokenRenewer`. In the yarn-cluster mode, driver is company with AM and token will update by AM. But it's still better to update the token for client process since the client wants to delete the staging dir with a expired token. You can merge this pull request into a Git repository by running: $ git pull https://github.com/SaintBacchus/spark SPARK-10755 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8867.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8867 commit 00fd0bc4cd2d6b31ba197629fbe1e9e07a2497bc Author: huangzhaowei Date: 2015-09-22T11:00:47Z [SPARK_10755][YARN]Set driver also update the token for long-running application. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8839][SQL]High concurrence will also ca...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/7889#issuecomment-131079850 @zsxwing hive had a configuration named `hive.server2.thrift.max.worker.threads` which had already limit the concurrence. But my problem was not caused by `trimSessionIfNecessary `. In high concurrence, `onStatementStart ` will be executed before the `onSessionCreated `. which cause this problem. As this patch had conflicts with master, I will test it again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8839][SQL]High concurrence will also ca...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/7889#issuecomment-127141862 @liancheng @tianyi had reviewed the patch before, can you take some time to review this again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8839][SQL]High concurrence will also ca...
GitHub user SaintBacchus opened a pull request: https://github.com/apache/spark/pull/7889 [SPARK-8839][SQL]High concurrence will also cause the `key not found` error in HiveThriftServer2 This PR is related to [7239](https://github.com/apache/spark/pull/7239). It's show in a high concurrence scenario. When there are about 500 clients connecting to the server at the same time, the method `onStatementStart` will be executed before `onSessionCreated` at about 10% probability. So it's better to add a wait for the session to build up. You can merge this pull request into a Git repository by running: $ git pull https://github.com/SaintBacchus/spark KeyNotFound Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7889.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7889 commit 25b3c1d568d7b99de956b9f310bb2fb846403fe1 Author: huangzhaowei Date: 2015-08-03T06:33:34Z Resolved another reason for SPARK-8839 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8592] [CORE] CoarseGrainedExecutorBacke...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/7110#issuecomment-123997780 Meet this problem too, do you have any update informations @xuchenCN @darkcrawler01 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9091][STREAMING]Add the CompressionCode...
Github user SaintBacchus closed the pull request at: https://github.com/apache/spark/pull/7442 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9091][STREAMING]Add the CompressionCode...
Github user SaintBacchus commented on a diff in the pull request: https://github.com/apache/spark/pull/7442#discussion_r34973017 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala --- @@ -906,12 +908,16 @@ abstract class DStream[T: ClassTag] ( /** * Save each RDD in this DStream as at text file, using string representation * of elements. The file name at each batch interval is generated based on - * `prefix` and `suffix`: "prefix-TIME_IN_MS.suffix". + * `prefix` and `suffix`: "prefix-TIME_IN_MS.suffix". If the `CompressionCodec` + * is defined, it will use specific `CompressionCodec` to compress the text. */ - def saveAsTextFiles(prefix: String, suffix: String = ""): Unit = ssc.withScope { --- End diff -- Do you mean it's no need to change this API and leave this to user by themselves? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9091][STREAMING]Add the CompressionCode...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/7442#issuecomment-122785045 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9091][STREAMING]Add the CompressionCode...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/7442#issuecomment-121941467 @tdas @srowen Can you review this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9091][STREAMING]Add the CompressionCode...
GitHub user SaintBacchus opened a pull request: https://github.com/apache/spark/pull/7442 [SPARK-9091][STREAMING]Add the CompressionCodec to the saveAsTextFiles interface in DStream. Add the `CompressionCodec` to the `saveAsTextFiles` interface. To be compatible with old interface, use the `Option` to adapt the code. [Jira Address](https://issues.apache.org/jira/browse/SPARK-9091) You can merge this pull request into a Git repository by running: $ git pull https://github.com/SaintBacchus/spark SPARK-9091 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7442.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7442 commit f60c25e37f114feda952daf54c37c2d4b7290795 Author: huangzhaowei Date: 2015-07-16T10:57:17Z [SPARK-9091][STREAMING]Add the CompressionCodec to the saveAsTextFiles interface. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8974] The thread of spark-dynamic-execu...
Github user SaintBacchus commented on a diff in the pull request: https://github.com/apache/spark/pull/7352#discussion_r34409077 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -211,7 +212,16 @@ private[spark] class ExecutorAllocationManager( listenerBus.addListener(listener) val scheduleTask = new Runnable() { - override def run(): Unit = Utils.logUncaughtExceptions(schedule()) + override def run(): Unit = { --- End diff -- It's all the same code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8839][SQL]ThriftServer2 will remove ses...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/7239#issuecomment-120199090 @liancheng Can you merge it into master if it's OK? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8755][Streaming]Login user before readi...
Github user SaintBacchus closed the pull request at: https://github.com/apache/spark/pull/7158 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8820][Streaming] Add a configuration to...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/7218#issuecomment-119857376 @harishreedharan Can you also review this PR plz --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8851][YARN] In Yarn client mode, Client...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/7255#issuecomment-119855467 @harishreedharan I had tested your PR with my issue, it actually work. But I doubt few user may start up `SparkContext` directly bypassing the 'SparkSubmit' , can they still use this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8839][SQL]ThriftServer2 will remove ses...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/7239#issuecomment-119842176 @tianyi Thanks for review and comment , I had removed it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8839][SQL]ThriftServer2 will remove ses...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/7239#issuecomment-119780393 @tianyi reducing a little memory if there is no new client coming soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8839][SQL]ThriftServer2 will remove ses...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/7239#issuecomment-119425235 @tianyi In my solution all the unfinished sessions will keep in memory. If we don't check after session finish, we have to wait a new client to trigger this check. > do the checking work when a session opened or an execution started --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8839][SQL]ThriftServer2 will remove ses...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/7239#issuecomment-119400914 @liancheng I had updated the description. Now I did not know why the session number will exceed the client number. Do you have any idea? If we can't avoid this mechanism in Spark Code, my modify may be a temporary solution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8839][SQL]ThriftServer2 will remove ses...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/7239#issuecomment-119391653 @liancheng Maybe I mistook this issue, but it actually existed. The deeper reason I don't mention is that if there are 200 connections at the same time, but the session may be 300 or above. So if we still want to keep the `retainedStatements` it always will have this issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8755][Streaming]Login user before readi...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/7158#issuecomment-119220142 @tgravescs the checkpoint had done this three things: 1.Read the checkpoint file 2.Deserialize checkpoint file and get the properties 3.Initialize the `SparkContext` The issue was in Step One. The code in `Client.scala` will run in Step Three. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8755][Streaming]Login user before readi...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/7158#issuecomment-119210684 @tgravescs I report this issue. Can you also take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8851][YARN] In Yarn client mode, Client...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/7255#issuecomment-119175926 @harishreedharan I had tested it out, your batch did not apply my issue Use this command twice (both the principal is expired) ``` bin/spark-submit --class xx.KafkaWordCount --master yarn-client --principal spark/hadoop.hadoop@hadoop.com --keytab spark.keytab ``` The second time it will throw this exception: ``` javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] ``` WordCount Code: ```scala val ssc = StreamingContext.getOrCreate("checkpoint", ()=> {wordCountFunction(args)}) ssc.start() ssc.awaitTermination() def wordCountFunction(args: Array[String]) = { val Array(zkQuorum, group, topics, numThreads) = args val sparkConf = new SparkConf().setAppName("KafkaWordCount") val ssc = new StreamingContext(sparkConf, Seconds(5)) ssc.checkpoint("checkpoint") val topicMap = topics.split(",").map((_,numThreads.toInt)).toMap val lines = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap).map(_._2) val words = lines.flatMap(_.split(" ")) val wordCounts = words.map(x => (x, 1L)).reduceByKey(_+_).transform(x=>x.sortByKey()) wordCounts.print() ssc } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8851][YARN] In Yarn client mode, Client...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/7255#issuecomment-119130485 @harishreedharan I think it's not the same problem I reported. I issue is that: `Streaming` will read the ckeckpoint file before it starts up a `SparkContext`, so in `Yarn-Client` mode we have to `login` before initial the `SparkContext`. Will you take a look at my issue again if you have time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8755][Streaming]Login user before readi...
GitHub user SaintBacchus reopened a pull request: https://github.com/apache/spark/pull/7158 [SPARK-8755][Streaming]Login user before reading the checkpoint file in hdfs. If the user set `spark.yarn.principal` and `spark.yarn.keytab` , he does not need `kinit` in the client machine. But when the application was recorved from checkpoint file, it had to `kinit`, because: The checkpoint did not use this configurations before it use a DFSClient to fetch the ckeckpoint file. But there is a small problem, the `UserGroupInformation.loginUserFromKeytab` will be called twice in checkpoint application. Ignored it in this PR. [Jira Address](https://issues.apache.org/jira/browse/SPARK-8755) You can merge this pull request into a Git repository by running: $ git pull https://github.com/SaintBacchus/spark SPARK-8755 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7158.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7158 commit 9ddd5b4bf5ca8c0759d411ac44e3ea02a578d1ba Author: huangzhaowei Date: 2015-07-01T12:18:50Z [SPARK-8755][Streaming]Login user before reading hdfs file. commit be0df01ef86af835a64f4c69af4a5607c3c6f5a9 Author: huangzhaowei Date: 2015-07-01T14:15:55Z Modify some code style. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8839][SQL]ThriftServer2 will remove ses...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/7239#issuecomment-119120336 > add a `filter` before `take` It's a better idea @tianyi. I had modified the implement: 1. If there are hundreds of connections, we keep they in memory 2. When each session finished, we trigger `trimSessionIfNecessary` to remove the finished sessions and keep the list size below `retainedStatements` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8839][SQL]ThriftServer2 will remove ses...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/7239#issuecomment-119067960 Hi, @liancheng will you take a look at this issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8755][Streaming]Login user before readi...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/7158#issuecomment-119067831 OK, I close this PR now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8755][Streaming]Login user before readi...
Github user SaintBacchus closed the pull request at: https://github.com/apache/spark/pull/7158 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8755][Streaming]Login user before readi...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/7158#issuecomment-119043741 @harishreedharan this is an issue only in Client mode. Will your PR cover this issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8839][SQL]ThriftServer2 will remove ses...
GitHub user SaintBacchus opened a pull request: https://github.com/apache/spark/pull/7239 [SPARK-8839][SQL]ThriftServer2 will remove session and execution no matter it's finished or not. [Code](https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala#L220) in HiveThriftServer2 use `take` to get the element which will be removed. In scala [doc](http://www.scala-lang.org/api/2.10.4/#scala.collection.IterableLike) `take` had a note: > Note: might return different results for different runs, unless the underlying collection type is ordered. `take` does not take the first elements in the list, and it may remove some session and execution which still will be used. So add a check before removing it, but this solution will cause all the unfinished execution keep in the memory. [Jira Address](https://issues.apache.org/jira/browse/SPARK-8839) You can merge this pull request into a Git repository by running: $ git pull https://github.com/SaintBacchus/spark SPARK-8839 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7239.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7239 commit 9d5ceb8f980830a4a6be6c09bacbae9f005f734d Author: huangzhaowei Date: 2015-07-06T11:49:39Z [SPARK-8839][SQL]ThriftServer2 will remove session and execution no matter it's finished or not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8820][Streaming] Add a configuration to...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/7218#issuecomment-118501476 > adding yet more config and API surface area unless there's a clear need @srowen Do you mean it's not necessary to add this config? I purpose the user may not hard code the `checkpoint directory` or they must implement the config themself. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8820][Streaming] Add a configuration to...
GitHub user SaintBacchus opened a pull request: https://github.com/apache/spark/pull/7218 [SPARK-8820][Streaming] Add a configuration to set checkpoint dir. Add a configuration to set checkpoint directory for convenience to user. [Jira Address](https://issues.apache.org/jira/browse/SPARK-8820) You can merge this pull request into a Git repository by running: $ git pull https://github.com/SaintBacchus/spark SPARK-8820 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7218.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7218 commit dd0acc15a093970d1e035f621adaa95885efae99 Author: huangzhaowei Date: 2015-07-04T04:02:53Z [SPARK-8820][Streaming] Add a configuration to set checkpoint dir. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8811][SQL] Read array struct data from ...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/7209#issuecomment-118321204 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8755][Streaming]Login user before readi...
GitHub user SaintBacchus opened a pull request: https://github.com/apache/spark/pull/7158 [SPARK-8755][Streaming]Login user before reading the checkpoint file in hdfs. If the user set `spark.yarn.principal` and `spark.yarn.keytab` , he does not need `kinit` in the client machine. But when the application was recorved from checkpoint file, it had to `kinit`, because: The checkpoint did not use this configurations before it use a DFSClient to fetch the ckeckpoint file. But there is one problem, the `UserGroupInformation.loginUserFromKeytab` will be called twice in checkpoint application. [Jira](https://issues.apache.org/jira/browse/SPARK-8755) You can merge this pull request into a Git repository by running: $ git pull https://github.com/SaintBacchus/spark SPARK-8755 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7158.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7158 commit 9ddd5b4bf5ca8c0759d411ac44e3ea02a578d1ba Author: huangzhaowei Date: 2015-07-01T12:18:50Z [SPARK-8755][Streaming]Login user before reading hdfs file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8687][YARN]Fix bug: Executor can't fetc...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/7066#issuecomment-116934534 >We modify YarnClientSchedulerBackend#start to call super.start() after we have submitted the application @andrewor14 This modify is much more suitable for this problem. But if user had to set configuration in other deploy mode, they had to be cautious about this problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8688][YARN]Bug fix: disable the cache f...
Github user SaintBacchus commented on a diff in the pull request: https://github.com/apache/spark/pull/7069#discussion_r33536245 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -334,6 +334,17 @@ class SparkHadoopUtil extends Logging { * Stop the thread that does the delegation token updates. */ private[spark] def stopExecutorDelegationTokenRenewer() {} + + /** + * Disable the hadoop fs cache mechanism, otherwise DFSClient will use old token to connect nn. + */ + private[spark] + def getConfBypassingFSCache(hadoopConf: Configuration, path: Path): Configuration = { --- End diff -- I keep this function name since this function is not a general methon, it only fresh the `cache` configuration. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8119][Scheduler]Do not let Spark set to...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/6662#issuecomment-116911098 OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8119][Scheduler]Do not let Spark set to...
Github user SaintBacchus closed the pull request at: https://github.com/apache/spark/pull/6662 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8619][Streaming]Don't recover keytab an...
Github user SaintBacchus commented on a diff in the pull request: https://github.com/apache/spark/pull/7008#discussion_r33534263 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala --- @@ -44,11 +44,19 @@ class Checkpoint(@transient ssc: StreamingContext, val checkpointTime: Time) val sparkConfPairs = ssc.conf.getAll def createSparkConf(): SparkConf = { +val reloadConfs = List( + "spark.master", + "spark.yarn.keytab", + "spark.yarn.principal") + val newSparkConf = new SparkConf(loadDefaults = false).setAll(sparkConfPairs) .remove("spark.driver.host") .remove("spark.driver.port") -val newMasterOption = new SparkConf(loadDefaults = true).getOption("spark.master") -newMasterOption.foreach { newMaster => newSparkConf.setMaster(newMaster) } +val newReloadConf = new SparkConf(loadDefaults = true) +reloadConfs.foreach { conf => + newReloadConf.getOption(conf) +.foreach(confValue => newSparkConf.set(conf, confValue)) --- End diff -- Modified the code style --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8688][YARN]Bug fix: disable the cache f...
Github user SaintBacchus commented on a diff in the pull request: https://github.com/apache/spark/pull/7069#discussion_r33421898 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -334,6 +334,16 @@ class SparkHadoopUtil extends Logging { * Stop the thread that does the delegation token updates. */ private[spark] def stopExecutorDelegationTokenRenewer() {} + + /** + * Disable the hadoop fs cache mechanism, otherwise DFSClient will use old token to connect nn. + */ + private[spark] def getDiscachedConf(hadoopConf: Configuration, path: Path): Configuration = { +val newConf = new Configuration(hadoopConf) +val confKey = s"fs.${path.toUri.getScheme}.impl.disable.cache" --- End diff -- Ok, rename it to be `getConfBypassingFSCache` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8688][YARN]Bug fix: disable the cache f...
GitHub user SaintBacchus opened a pull request: https://github.com/apache/spark/pull/7069 [SPARK-8688][YARN]Bug fix: disable the cache fs to gain the HDFS connection. If `fs.hdfs.impl.disable.cache` was `false`(default), `FileSystem` will use the cached `DFSClient` which use old token. So It's better to set the `fs.hdfs.impl.disable.cache` as `true` to avoid token expired. [Jira](https://issues.apache.org/jira/browse/SPARK-8688) You can merge this pull request into a Git repository by running: $ git pull https://github.com/SaintBacchus/spark SPARK-8688 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7069.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7069 commit cf776a14725940e888ec187d210b74e1cc24c191 Author: huangzhaowei Date: 2015-06-28T08:19:17Z [SPARK-8688][YARN]Bug fix: disable the cache fs to gain the HDFS connection. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8687][YARN]Fix bug: Executor can't fetc...
GitHub user SaintBacchus opened a pull request: https://github.com/apache/spark/pull/7066 [SPARK-8687][YARN]Fix bug: Executor can't fetch the new set configuration Spark initi the properties CoarseGrainedSchedulerBackend.start ```scala // TODO (prashant) send conf instead of properties driverEndpoint = rpcEnv.setupEndpoint( CoarseGrainedSchedulerBackend.ENDPOINT_NAME, new DriverEndpoint(rpcEnv, properties)) ``` Then the yarn logic will set some configuration but not update in this `properties`. So `Executor` won't gain the `properties`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/SaintBacchus/spark SPARK-8687 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7066.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7066 commit e4dd9a8660c642c08f32a92c57199f0e1ba64b82 Author: huangzhaowei Date: 2015-06-28T07:17:57Z [SPARK-8687][YARN]Fix bug: Executor can't fetch the new set configuration. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8619][Streaming]Don't recover keytab an...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/7008#issuecomment-115148516 @harishreedharan You are right, the principal is all the same. I considerred them as couple configurations, so I added it into the list.:smile: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8619][Streaming]Don't recover keytab an...
GitHub user SaintBacchus opened a pull request: https://github.com/apache/spark/pull/7008 [SPARK-8619][Streaming]Don't recover keytab and principal configuration within Streaming chckpoint [Client.scala](https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L786) will change these configurations, so this would cause the problem that the Streaming recover logic can't find the local keytab file(since configuration was changed) ```scala sparkConf.set("spark.yarn.keytab", keytabFileName) sparkConf.set("spark.yarn.principal", args.principal) ``` Problem described at [Jira](https://issues.apache.org/jira/browse/SPARK-8619) You can merge this pull request into a Git repository by running: $ git pull https://github.com/SaintBacchus/spark SPARK-8619 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7008.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7008 commit 0d8f800c742a78870f8ab76232ed8bb18684b84e Author: huangzhaowei Date: 2015-06-25T02:27:55Z Don't recover keytab and principal configuration within Streaming checkpoint. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8367][Streaming]Add a limit for 'spark....
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/6818#issuecomment-111986910 @jerryshao I think due to the data loss bug, we can call zero is a illegal setting. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8367][Streaming]Add a limit for 'spark....
GitHub user SaintBacchus opened a pull request: https://github.com/apache/spark/pull/6818 [SPARK-8367][Streaming]Add a limit for 'spark.streaming.blockInterval` since a data loss bug. Bug had reported in the jira [SPARK-8367](https://issues.apache.org/jira/browse/SPARK-8367) The relution is limitting the configuration `spark.streaming.blockInterval` to a positive number. You can merge this pull request into a Git repository by running: $ git pull https://github.com/SaintBacchus/spark SPARK-8367 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6818.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6818 commit 3d17796d88c35294dde8d4f9ffad00dda98bd631 Author: huangzhaowei Date: 2015-06-15T02:41:36Z [SPARK_8367][Streaming]Add a limit for 'spark.streaming.blockInterval' since a data loss bug. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8119][Scheduler]Do not let Spark set to...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/6662#issuecomment-111004513 @andrewor14 did I describe the scenario clearly? can you review it again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8119][Scheduler]Do not let Spark set to...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/6662#issuecomment-109503080 @andrewor14 @vanzin I draw a simple call stack, as this: ![image](https://cloud.githubusercontent.com/assets/7404824/8017792/df6f4cf8-0c32-11e5-90ff-7192d30b8d3f.png) If the `doRequestTotalExecutors` logic happened, it reset the total executors of the application. But there was a prolem: at the monment if other executor also had down, the Spark will never pull it up again. This simple scenario can reproduce this issue: There are 2 applications and each wants 2 executor, so total 4 cup cores wanted(every executor wants one core). But the RM only has 3 cores, so when first application(A) gained 2 cores and second applicaiton(B) gained only one core waitting A release the cores. Then kill one of the A's executor, B will pull up its executor and let A wait the resource. After the `TimeOut` logic occures in A then B application has finished its job and releases its resource. As the expection, A wil push its anohter other executor again but actually it will never happen. A may be a Streaming application. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8119][Scheduler]Do not let Spark set to...
GitHub user SaintBacchus opened a pull request: https://github.com/apache/spark/pull/6662 [SPARK-8119][Scheduler]Do not let Spark set total executors when executor fails `DynamicAllocation` will set the total executor to a little number when it wants to kill some executors. But in no-DynamicAllocation scenario, Spark will also set the total executor. So it will cause such problem: sometimes an executor fails down, there is no more executor which will be pull up by spark You can merge this pull request into a Git repository by running: $ git pull https://github.com/SaintBacchus/spark SPARK-8119 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6662.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6662 commit 610c390d243a39718ccf7c506c9e5a37784cc65f Author: huangzhaowei Date: 2015-06-05T02:03:48Z [SPARK-8119][Scheduler]Do not let Spark set total executors when executor fails --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6464][Core]Add a function named 'proces...
Github user SaintBacchus closed the pull request at: https://github.com/apache/spark/pull/5152 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6584][CORE]Provide ExecutorPrefixTaskLo...
Github user SaintBacchus closed the pull request at: https://github.com/apache/spark/pull/5240 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org