[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...
Github user marsishandsome closed the pull request at: https://github.com/apache/spark/pull/9168 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/9168#issuecomment-150403538 Let's wait for the HDFS JIRA https://issues.apache.org/jira/browse/HDFS-9276. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...
Github user marsishandsome commented on a diff in the pull request: https://github.com/apache/spark/pull/9168#discussion_r42826304 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -130,6 +130,21 @@ class SparkHadoopUtil extends Logging { UserGroupInformation.loginUserFromKeytab(principalName, keytabFilename) } + def addCredentialsToCurrentUser(credentials: Credentials, freshHadoopConf: Configuration): Unit ={ +UserGroupInformation.getCurrentUser.addCredentials(credentials) + +// HACK: +// In HA mode, the function FileSystem.addDelegationTokens only returns a token for HA +// NameNode. HDFS Client will generate private tokens for each NameNode according to the +// token for HA NameNode and uses these private tokens to communicate with each NameNode. +// If spark only update token for HA NameNode, HDFS Client will use the old private tokens, +// which will cause token expired Error. +// So: +// We create a new HDFS Client, so that the new HDFS Client will generate and update the +// private tokens for each NameNode. +FileSystem.get(freshHadoopConf).close() --- End diff -- Good Idea. I will refactor the patch after HDFS-9276 is fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/9168#issuecomment-150077142 The Hadoop Client Version I'm using is: 2.5.0-cdh5.2.0, which is packaged in spark assembly jar. I've update the code, using hadoop-1 compatible API now. Please review the patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/9168#issuecomment-149814342 I've tested in both version 1.4.1 and 1.5.1. This patch works. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/9168#issuecomment-149788293 Hi all, I have updated the patch and only use Hadoop's public stable API. I will submit a patch to Hadoop. This patch is just a workaround and will be removed until the bug is fixed in Hadoop. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/9168#issuecomment-149765972 In non-HA mode, there's only one token for the name node, so this bug will not occure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...
Github user marsishandsome commented on a diff in the pull request: https://github.com/apache/spark/pull/9168#discussion_r42450902 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/AMDelegationTokenRenewer.scala --- @@ -177,6 +177,7 @@ private[yarn] class AMDelegationTokenRenewer( }) // Add the temp credentials back to the original ones. UserGroupInformation.getCurrentUser.addCredentials(tempCreds) +SparkHadoopUtil.get.updateCurrentUserHDFSDelegationToken() --- End diff -- In HA mode, there are three tokens: 1. ha token 2. namenode1 token 3. namenode2 token Spark only update ha token. HAUtil.cloneDelegationTokenForLogicalUri will copy ha token to namenode token. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/9168#issuecomment-149398746 There are several Solutions, all works: 1 set dfs.namenode.delegation.token.max-lifetime to a big value. 2 use the configuration --conf spark.hadoop.fs.hdfs.impl.disable.cache=true 3 the patch I provide --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/9168#issuecomment-149398132 The reason to my opinion is: 1 Spark AM will get a HDFS Delegation Token and add it to the Current User's Credential. This Token looks like: token1: "ha-hdfs:hadoop-namenode" -> "Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:hadoop-namenode, Ident: (HDFS_DELEGATION_TOKEN token 328709 for test)". 2 DFSClient will generate another 2 Tokens for each NameNode. token2: "ha-hdfs://xxx.xxx.xxx.xxx:8020" -> "Kind: HDFS_DELEGATION_TOKEN, Service: xxx.xxx.xxx.xxx:8020, Ident: (HDFS_DELEGATION_TOKEN token 328708 for test)" token3: "ha-hdfs://yyy:yyy:yyy:yyy:8020" -> "Kind: HDFS_DELEGATION_TOKEN, Service: yyy:yyy:yyy:yyy:8020, Ident: (HDFS_DELEGATION_TOKEN token 328708 for test)" 3 DFSClient will not generate token2 and token3 automatically, when Spark update token1. DFSClient will only use token2 and token3 to communicate with the 2 Name Nodes. 4 FileSystem has cache, calling FileSystem.get will get a cached DFSClient, which has old tokens. Spark only update token1, but DFSClient will use token2 and token3. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/9168#issuecomment-149394322 The Scenario is as follows: 1. Kerberos is enabled. 2. NameNode HA is enabled. 3. In order to test Token expired, I change the configuration of the NameNode dfs.namenode.delegation.token.max-lifetime = 40min dfs.namenode.delegation.key.update-interval = 20min dfs.namenode.delegation.token.renew-interval = 20min 4. The Spark Test Application will write a HDFS file every minute. 5. Yarn Cluster Mode is used. 6. --principal --keytab argument is used. After running 40 min, I got the Error: 15/10/16 16:09:19 ERROR ApplicationMaster: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 12.0 failed 4 times, most recent failure: Lost task 0.3 in stage 12.0 (TID 30, node153-81-74-jylt.qiyi.hadoop): org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 324309 for test) is expired at org.apache.hadoop.ipc.Client.call(Client.java:1411) at org.apache.hadoop.ipc.Client.call(Client.java:1364) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy14.create(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:287) at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy15.create(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1645) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1627) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1552) at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:396) at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:392) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:392) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:336) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:801) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123) at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1104) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...
GitHub user marsishandsome opened a pull request: https://github.com/apache/spark/pull/9168 [SPARK-11182] HDFS Delegation Token will be expired when calling "UserGroupInformation.getCurrentUser.addCredentials" in HA mode In HA mode, DFSClient will generate HDFS Delegation Token for each Name Node automatically, which will not be updated when Spark update Credentials for the current user. Spark should update these tokens in order to avoid Token Expired Error. You can merge this pull request into a Git repository by running: $ git pull https://github.com/marsishandsome/spark SPARK11182 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9168.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9168 commit 3bbfe61c74150e0de42573eaf736629164ccfe47 Author: guliangliang Date: 2015-10-19T09:45:28Z [SPARK-11182] HDFS Delegation Token will be expired when calling UserGroupInformation.getCurrentUser.addCredentials in HA mode Change-Id: Ia1833198ef694dfbc5b560bddd1eef226012787b --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8030] Spelling Mistake: 'fetchHcfsFile'...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/6575#issuecomment-107824219 @andrewor14 cool, I will close this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8030] Spelling Mistake: 'fetchHcfsFile'...
Github user marsishandsome closed the pull request at: https://github.com/apache/spark/pull/6575 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark 8030] Spelling Mistake: 'fetchHcfsFile'...
GitHub user marsishandsome opened a pull request: https://github.com/apache/spark/pull/6575 [Spark 8030] Spelling Mistake: 'fetchHcfsFile' should be 'fetchHdfsFile' Spelling Mistake in org.apache.spark.util.Utils: 'fetchHcfsFile' should be 'fetchHdfsFile' You can merge this pull request into a Git repository by running: $ git pull https://github.com/marsishandsome/spark Spark8030 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6575.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6575 commit 4f7ea5752ee0d90801f99bfc747271e2a98e86e2 Author: guliangliang Date: 2015-06-02T05:52:18Z [Spark 8030] Spelling Mistake: 'fetchHcfsFile' should be 'fetchHdfsFile' --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6420] Driver's Block Manager does not u...
Github user marsishandsome closed the pull request at: https://github.com/apache/spark/pull/5095 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6420] Driver's Block Manager does not u...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/5095#issuecomment-90516225 @srowen This change is not needed. As Spark-5331 said, Spark provides many services both to internal and external. Spark should provide a way for users to specify hostname or ip. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6420] Driver's Block Manager does not u...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/5095#issuecomment-90307943 @yongjiaw Thanks for your advice. As a workaround I set the hostname to the ip and it works. I like the solution of Spark-5113. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6491] Spark will put the current workin...
Github user marsishandsome closed the pull request at: https://github.com/apache/spark/pull/5156 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6420] Driver's Block Manager does not u...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/5095#issuecomment-86236076 @andrewor14 Yes, it works in my network. Currently spark can only find the machine's hostname and use it to communicate with others. I thinks spark should provide another choice to use ip instead of hostname. (find by spark or specify by user) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6491] Spark will put the current workin...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/5156#issuecomment-85748759 @srowen Please review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6420] Driver's Block Manager does not u...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/5095#issuecomment-85533219 @tgravescs Yes, DNS in our network is not properly configured. The yarn-node cannot connect to the client by the hostname of the client machine. My idea is to let the yarn-node use the client's ip address to connect to the client machine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6491] Spark will put the current workin...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/5156#issuecomment-85524442 @srowen I've pushed another commit and keep the original order of the classpath. A help function "addClassPath()" is used to avoid code duplication. I'm pleased to modify other codes to use "addClassPath", if you think it's ok. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6491] Spark will put the current workin...
GitHub user marsishandsome opened a pull request: https://github.com/apache/spark/pull/5156 [SPARK-6491] Spark will put the current working dir to the CLASSPATH When running "bin/computer-classpath.sh", the output will be: :/spark/conf:/spark/assembly/target/scala-2.10/spark-assembly-1.3.0-hadoop2.5.0-cdh5.2.0.jar:/spark/lib_managed/jars/datanucleus-rdbms-3.2.9.jar:/spark/lib_managed/jars/datanucleus-api-jdo-3.2.6.jar:/spark/lib_managed/jars/datanucleus-core-3.2.10.jar Java will add the current working dir to the CLASSPATH, if the first ":" exists, which is not expected by spark users. For example, if I call spark-shell in the folder /root. And there exists a "core-site.xml" under /root/. Spark will use this file as HADOOP CONF file, even if I have already set HADOOP_CONF_DIR=/etc/hadoop/conf. You can merge this pull request into a Git repository by running: $ git pull https://github.com/marsishandsome/spark Spark6491 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5156.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5156 commit 35c25d483b7e2f8cd7568e8e089da70b4a667122 Author: guliangliang Date: 2015-03-24T06:48:16Z [SPARK-6491] Spark will put the current working dir to the CLASSPATH --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6420] Driver's Block Manager does not u...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/5095#issuecomment-85257353 @tgravescs You are right. Maybe we should provide two choices: Ip and Hostname. Both will be automatically figured out by Spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6420] Driver's Block Manager does not u...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/5095#issuecomment-85256831 @tgravescs In our local network, the yarn-node do not know the hostname of the client. So I have to set spark.driver.host to the client's ip address, so the driver will use it's ip address in stead of hostname. But the driver's blockmanager will still use it's hostname. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Driver's Block Manager does not use "spark.dri...
GitHub user marsishandsome opened a pull request: https://github.com/apache/spark/pull/5095 Driver's Block Manager does not use "spark.driver.host" in Yarn-Client mode In my cluster, the yarn node does not know the client's host name. So I set "spark.driver.host" to the ip address of the client. But the driver's Block Manager does not use "spark.driver.host" but the hostname in Yarn-Client mode. I got the following error: TaskSetManager: Lost task 1.1 in stage 0.0 (TID 2, hadoop-node1538098): java.io.IOException: Failed to connect to example-hostname at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156) at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78) at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) at org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43) at org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.UnresolvedAddressException at sun.nio.ch.Net.checkAddress(Net.java:127) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:644) at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:193) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:200) at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1029) at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:496) at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:481) at io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:47) at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:496) at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:481) at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:463) at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:849) at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:199) at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:165) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) ... 1 more You can merge this pull request into a Git repository by running: $ git pull https://github.com/marsishandsome/spark Spark6420 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5095.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5095 commit 2f9701d182eecc814df1730cb659fbe1622d1288 Author: guliangliang Date: 2015-03-19T23:11:17Z [SPARK-6420] Driver's Block Manager does not use spark.driver.host in Yarn-Client mode --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/4525#issuecomment-76505657 @andrewor14 please check --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/4525#issuecomment-76342809 Hi @andrewor14, is there anything I can do for this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5771] Number of Cores in Completed Appl...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/4567#issuecomment-76342480 Thanks @jerryshao --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/4525#issuecomment-74231723 I've updated my implementation according @vanzin 's advice. Thanks @vanzin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5771] Number of Cores in Completed Appl...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/4567#issuecomment-74220077 ![default](https://cloud.githubusercontent.com/assets/5574887/6184443/a59134c2-b39c-11e4-8206-f546276f80c7.PNG) For Running Applications, Cores Using and Cores Requested will be shown. For Completed Applications, only Cores Requested will be shown. What do you think about it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/4525#issuecomment-74197007 @vanzin To my opinion, it's a typical Producer-Consumer Problem. I'm a little confused with your approach. Would you please explain it in detail for me? Will a shared Container be used in your approach? If not, how to pass data from Producer to Consumer? If yes, what's the differences between your approach and mine? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5771] Number of Cores in Completed Appl...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/4567#issuecomment-74191436 @andrewor14 What about showing the core number users requested (total-executor-cores) for now? Users or Administrators (at least me) may want to see this information. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/4525#issuecomment-74078446 Hi @vanzin, the failed test passed on my local environment. I have no idea why it failed. Would please check it for me? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5771] Number of Cores in Completed Appl...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/4567#issuecomment-74074999 ![default](https://cloud.githubusercontent.com/assets/5574887/6168735/c8600e78-b302-11e4-8c3b-1e04854d0735.PNG) I'm really confused with the core number show above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5771] Number of Cores in Completed Appl...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/4567#issuecomment-74072604 The problem here is that the Cores displayed here is confused, because it depends on whether sc.stop() is called or not. So I choose to display core max here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5771] Number of Cores in Completed Appl...
GitHub user marsishandsome opened a pull request: https://github.com/apache/spark/pull/4567 [SPARK-5771] Number of Cores in Completed Applications of Standalone Master Web Page always be 0 if sc.stop() is called In Standalone mode, the number of cores in Completed Applications of the Master Web Page will always be zero, if sc.stop() is called. But the number will always be right, if sc.stop() is not called. The reason maybe: after sc.stop() is called, the function removeExecutor of class ApplicationInfo will be called, thus reduce the variable coresGranted to zero. The variable coresGranted is used to display the number of Cores on the Web Page. You can merge this pull request into a Git repository by running: $ git pull https://github.com/marsishandsome/spark Spark5771 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4567.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4567 commit cfbd97d68e322ce25882a9aa08ae665c6ba24ad0 Author: guliangliang Date: 2015-02-12T13:21:38Z [SPARK-5771] Number of Cores in Completed Applications of Standalone Master Web Page always be 0 if sc.stop() is called --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/4525#issuecomment-74046962 I've updated the implementation. Two background threads are used to load the log files: 1. one thread to check the file list 2. another to fetch and parse the log files There my be some race condition problems if a thread pool is used to fetch and parse the log files. The following problems must be taken care: 1. The threads in the pool share a common unparsed file list, which is produced by another thread 2. The threads in the pool update a common parsed file list 3. The unparsed file list is sorted by file update time 4. The parsed file list is sorted by application finish time 5. The UI thread can at the same time get the content of both unparsed file list and parsed file list Other reasons why I choose the two-thread implementation are: 1. If a thread pool is used, the network will be the next bottleneck. 2. It's ok for users, at least for me, if the missing meta information will be finished loading in 3 hours. At least they can visit the job detail webpage. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/4525#issuecomment-74020260 @vanzin Thanks for your advice. I will improve the implementation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...
GitHub user marsishandsome opened a pull request: https://github.com/apache/spark/pull/4525 [SPARK-5522] Accelerate the Histroty Server start When starting the history server, all the log files will be fetched and parsed in order to get the applications' meta data e.g. App Name, Start Time, Duration, etc. In our production cluster, there exist 2600 log files (160G) in HDFS and it costs 3 hours to restart the history server, which is a little bit too long for us. It would be better, if the history server can show logs with missing information during start-up and fill the missing information after fetching and parsing a log file. You can merge this pull request into a Git repository by running: $ git pull https://github.com/marsishandsome/spark Spark5522 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4525.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4525 commit be5670c937d163b3c8238248c80bcf472333678f Author: guliangliang Date: 2015-02-11T06:45:01Z [SPARK-5522] Accelerate the Histroty Server start --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5733] Error Link in Pagination of Histr...
GitHub user marsishandsome opened a pull request: https://github.com/apache/spark/pull/4523 [SPARK-5733] Error Link in Pagination of HistroyPage when showing Incomplete Applications The links in pagination of HistroyPage is wrong when showing Incomplete Applications. If "2" is click on the following page "http://history-server:18080/?page=1&showIncomplete=true";, it will go to "http://history-server:18080/?page=2"; instead of "http://history-server:18080/?page=2&showIncomplete=true";. You can merge this pull request into a Git repository by running: $ git pull https://github.com/marsishandsome/spark Spark5733 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4523.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4523 commit 9d7b5931f6f802892de6d86c6728c4d31edfe105 Author: guliangliang Date: 2015-02-11T05:39:07Z [SPARK-5733] Error Link in Pagination of HistroyPage when showing Incomplete Applications --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org