[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...

2015-12-15 Thread marsishandsome
Github user marsishandsome closed the pull request at:

https://github.com/apache/spark/pull/9168


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...

2015-10-22 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/9168#issuecomment-150403538
  
Let's wait for the HDFS JIRA 
https://issues.apache.org/jira/browse/HDFS-9276.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...

2015-10-22 Thread marsishandsome
Github user marsishandsome commented on a diff in the pull request:

https://github.com/apache/spark/pull/9168#discussion_r42826304
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
---
@@ -130,6 +130,21 @@ class SparkHadoopUtil extends Logging {
 UserGroupInformation.loginUserFromKeytab(principalName, keytabFilename)
   }
 
+  def addCredentialsToCurrentUser(credentials: Credentials, 
freshHadoopConf: Configuration): Unit ={
+UserGroupInformation.getCurrentUser.addCredentials(credentials)
+
+// HACK:
+// In HA mode, the function FileSystem.addDelegationTokens only 
returns a token for HA
+// NameNode. HDFS Client will generate private tokens for each 
NameNode according to the
+// token for HA NameNode and uses these private tokens to communicate 
with each NameNode.
+// If spark only update token for HA NameNode, HDFS Client will use 
the old private tokens,
+// which will cause token expired Error.
+// So:
+// We create a new HDFS Client, so that the new HDFS Client will 
generate and update the
+// private tokens for each NameNode.
+FileSystem.get(freshHadoopConf).close()
--- End diff --

Good Idea. I will refactor the patch after HDFS-9276 is fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...

2015-10-21 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/9168#issuecomment-150077142
  
The Hadoop Client Version I'm using is: 2.5.0-cdh5.2.0, which is packaged 
in spark assembly jar.

I've update the code, using hadoop-1 compatible API now.

Please review the patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...

2015-10-21 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/9168#issuecomment-149814342
  
I've tested in both version 1.4.1 and 1.5.1.
This patch works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...

2015-10-20 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/9168#issuecomment-149788293
  
Hi all,

I have updated the patch and only use Hadoop's public stable API.
I will submit a patch to Hadoop.
This patch is just a workaround and will be removed until the bug is fixed 
in Hadoop.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...

2015-10-20 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/9168#issuecomment-149765972
  
In non-HA mode, there's only one token for the name node, so this bug will 
not occure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...

2015-10-19 Thread marsishandsome
Github user marsishandsome commented on a diff in the pull request:

https://github.com/apache/spark/pull/9168#discussion_r42450902
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/AMDelegationTokenRenewer.scala 
---
@@ -177,6 +177,7 @@ private[yarn] class AMDelegationTokenRenewer(
 })
 // Add the temp credentials back to the original ones.
 UserGroupInformation.getCurrentUser.addCredentials(tempCreds)
+SparkHadoopUtil.get.updateCurrentUserHDFSDelegationToken()
--- End diff --

In HA mode, there are three tokens:
1. ha token
2. namenode1 token
3. namenode2 token

Spark only update ha token.
HAUtil.cloneDelegationTokenForLogicalUri will copy ha token to namenode 
token.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...

2015-10-19 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/9168#issuecomment-149398746
  
There are several Solutions, all works:
1 set dfs.namenode.delegation.token.max-lifetime to a big value.
2 use the configuration --conf spark.hadoop.fs.hdfs.impl.disable.cache=true
3 the patch I provide


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...

2015-10-19 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/9168#issuecomment-149398132
  
The reason to my opinion is:
1 Spark AM will get a HDFS Delegation Token and add it to the Current 
User's Credential.
This Token looks like: 
token1: "ha-hdfs:hadoop-namenode" -> "Kind: HDFS_DELEGATION_TOKEN, Service: 
ha-hdfs:hadoop-namenode, Ident: (HDFS_DELEGATION_TOKEN token 328709 for test)".

2 DFSClient will generate another 2 Tokens for each NameNode.
token2: "ha-hdfs://xxx.xxx.xxx.xxx:8020" -> "Kind: HDFS_DELEGATION_TOKEN, 
Service: xxx.xxx.xxx.xxx:8020, Ident: (HDFS_DELEGATION_TOKEN token 328708 for 
test)"
token3: "ha-hdfs://yyy:yyy:yyy:yyy:8020" -> "Kind: HDFS_DELEGATION_TOKEN, 
Service: yyy:yyy:yyy:yyy:8020, Ident: (HDFS_DELEGATION_TOKEN token 328708 for 
test)"

3 DFSClient will not generate token2 and token3 automatically, when Spark 
update token1.
DFSClient will only use token2 and token3 to communicate with the 2 Name 
Nodes.

4 FileSystem has cache, calling FileSystem.get will get a cached DFSClient, 
which has old tokens.
Spark only update token1, but DFSClient will use token2 and token3.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...

2015-10-19 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/9168#issuecomment-149394322
  
The Scenario is as follows:
1. Kerberos is enabled.

2. NameNode HA is enabled.

3. In order to test Token expired, I change the configuration of the 
NameNode
dfs.namenode.delegation.token.max-lifetime = 40min
dfs.namenode.delegation.key.update-interval = 20min
dfs.namenode.delegation.token.renew-interval = 20min

4. The Spark Test Application will write a HDFS file every minute.

5. Yarn Cluster Mode is used.

6. --principal --keytab argument is used.


After running 40 min, I got  the Error:
15/10/16 16:09:19 ERROR ApplicationMaster: User class threw exception: 
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 12.0 failed 4 times, most recent failure: Lost task 0.3 in stage 12.0 
(TID 30, node153-81-74-jylt.qiyi.hadoop): 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 token (HDFS_DELEGATION_TOKEN token 324309 for test) is expired
at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy14.create(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:287)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy15.create(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1645)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1627)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1552)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:396)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:392)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:392)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:336)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:801)
at 
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
at 
org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1104)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...

2015-10-19 Thread marsishandsome
GitHub user marsishandsome opened a pull request:

https://github.com/apache/spark/pull/9168

[SPARK-11182] HDFS Delegation Token will be expired when calling 
"UserGroupInformation.getCurrentUser.addCredentials" in HA mode

In HA mode, DFSClient will generate HDFS Delegation Token for each Name 
Node automatically, which will not be updated when Spark update Credentials for 
the current user.
Spark should update these tokens in order to avoid Token Expired Error.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marsishandsome/spark SPARK11182

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9168.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9168


commit 3bbfe61c74150e0de42573eaf736629164ccfe47
Author: guliangliang 
Date:   2015-10-19T09:45:28Z

[SPARK-11182] HDFS Delegation Token will be expired when calling 
UserGroupInformation.getCurrentUser.addCredentials in HA mode

Change-Id: Ia1833198ef694dfbc5b560bddd1eef226012787b




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8030] Spelling Mistake: 'fetchHcfsFile'...

2015-06-01 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/6575#issuecomment-107824219
  
@andrewor14 cool, I will close this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8030] Spelling Mistake: 'fetchHcfsFile'...

2015-06-01 Thread marsishandsome
Github user marsishandsome closed the pull request at:

https://github.com/apache/spark/pull/6575


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark 8030] Spelling Mistake: 'fetchHcfsFile'...

2015-06-01 Thread marsishandsome
GitHub user marsishandsome opened a pull request:

https://github.com/apache/spark/pull/6575

[Spark 8030] Spelling Mistake: 'fetchHcfsFile' should be 'fetchHdfsFile'

Spelling Mistake in org.apache.spark.util.Utils: 'fetchHcfsFile' should be 
'fetchHdfsFile'

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marsishandsome/spark Spark8030

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6575.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6575


commit 4f7ea5752ee0d90801f99bfc747271e2a98e86e2
Author: guliangliang 
Date:   2015-06-02T05:52:18Z

[Spark 8030] Spelling Mistake: 'fetchHcfsFile' should be 'fetchHdfsFile'




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6420] Driver's Block Manager does not u...

2015-04-07 Thread marsishandsome
Github user marsishandsome closed the pull request at:

https://github.com/apache/spark/pull/5095


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6420] Driver's Block Manager does not u...

2015-04-07 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/5095#issuecomment-90516225
  
@srowen This change is not needed. 

As Spark-5331 said, Spark provides many services both to internal and 
external. Spark should provide a way for users to specify hostname or ip.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6420] Driver's Block Manager does not u...

2015-04-06 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/5095#issuecomment-90307943
  
@yongjiaw Thanks for your advice. As a workaround I set the hostname to the 
ip and it works. I like the solution of Spark-5113. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6491] Spark will put the current workin...

2015-03-27 Thread marsishandsome
Github user marsishandsome closed the pull request at:

https://github.com/apache/spark/pull/5156


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6420] Driver's Block Manager does not u...

2015-03-25 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/5095#issuecomment-86236076
  
@andrewor14 Yes, it works in my network.
Currently spark can only find the machine's hostname and use it to 
communicate with others.
I thinks spark should provide another choice  to use ip instead of 
hostname. (find by spark or specify by user)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6491] Spark will put the current workin...

2015-03-24 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/5156#issuecomment-85748759
  
@srowen Please review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6420] Driver's Block Manager does not u...

2015-03-24 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/5095#issuecomment-85533219
  
@tgravescs Yes, DNS in our network is not properly configured. 

The yarn-node cannot connect to the client by the hostname of the client 
machine. 

My idea is to let the yarn-node use the client's ip address to connect to 
the client machine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6491] Spark will put the current workin...

2015-03-24 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/5156#issuecomment-85524442
  
@srowen I've pushed another commit and keep the original order of the 
classpath. 
A help function "addClassPath()" is used to avoid code duplication.
I'm pleased to modify other codes to use "addClassPath", if you think it's 
ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6491] Spark will put the current workin...

2015-03-23 Thread marsishandsome
GitHub user marsishandsome opened a pull request:

https://github.com/apache/spark/pull/5156

[SPARK-6491] Spark will put the current working dir to the CLASSPATH

When running "bin/computer-classpath.sh", the output will be:

:/spark/conf:/spark/assembly/target/scala-2.10/spark-assembly-1.3.0-hadoop2.5.0-cdh5.2.0.jar:/spark/lib_managed/jars/datanucleus-rdbms-3.2.9.jar:/spark/lib_managed/jars/datanucleus-api-jdo-3.2.6.jar:/spark/lib_managed/jars/datanucleus-core-3.2.10.jar
Java will add the current working dir to the CLASSPATH, if the first ":" 
exists, which is not expected by spark users.
For example, if I call spark-shell in the folder /root. And there exists a 
"core-site.xml" under /root/. Spark will use this file as HADOOP CONF file, 
even if I have already set HADOOP_CONF_DIR=/etc/hadoop/conf.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marsishandsome/spark Spark6491

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5156.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5156


commit 35c25d483b7e2f8cd7568e8e089da70b4a667122
Author: guliangliang 
Date:   2015-03-24T06:48:16Z

[SPARK-6491] Spark will put the current working dir to the CLASSPATH




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6420] Driver's Block Manager does not u...

2015-03-23 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/5095#issuecomment-85257353
  
@tgravescs You are right. Maybe we should provide two choices: Ip and 
Hostname. Both will be automatically figured out by Spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6420] Driver's Block Manager does not u...

2015-03-23 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/5095#issuecomment-85256831
  
@tgravescs In our local network, the yarn-node do not know the hostname of 
the client. So I have to set spark.driver.host to the client's ip address, so 
the driver will use it's ip address in stead of hostname. But the driver's 
blockmanager will still use it's hostname.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Driver's Block Manager does not use "spark.dri...

2015-03-19 Thread marsishandsome
GitHub user marsishandsome opened a pull request:

https://github.com/apache/spark/pull/5095

Driver's Block Manager does not use "spark.driver.host" in Yarn-Client mode

In my cluster, the yarn node does not know the client's host name.
So I set "spark.driver.host" to the ip address of the client.
But the driver's Block Manager does not use "spark.driver.host" but the 
hostname in Yarn-Client mode.

I got the following error:

 TaskSetManager: Lost task 1.1 in stage 0.0 (TID 2, hadoop-node1538098): 
java.io.IOException: Failed to connect to example-hostname
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191)
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
at 
org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:127)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:644)
at 
io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:193)
at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:200)
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1029)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:496)
at 
io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:481)
at 
io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:47)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:496)
at 
io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:481)
at 
io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:463)
at 
io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:849)
at 
io.netty.channel.AbstractChannel.connect(AbstractChannel.java:199)
at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:165)
at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
... 1 more


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/marsishandsome/spark Spark6420

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5095.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5095


commit 2f9701d182eecc814df1730cb659fbe1622d1288
Author: guliangliang 
Date:   2015-03-19T23:11:17Z

[SPARK-6420] Driver's Block Manager does not use spark.driver.host in 
Yarn-Client mode




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...

2015-02-27 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/4525#issuecomment-76505657
  
@andrewor14 please check


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...

2015-02-26 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/4525#issuecomment-76342809
  
Hi @andrewor14, is there anything I can do for this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5771] Number of Cores in Completed Appl...

2015-02-26 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/4567#issuecomment-76342480
  
Thanks @jerryshao 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...

2015-02-13 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/4525#issuecomment-74231723
  
I've updated my implementation according @vanzin 's advice.
Thanks @vanzin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5771] Number of Cores in Completed Appl...

2015-02-13 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/4567#issuecomment-74220077
  

![default](https://cloud.githubusercontent.com/assets/5574887/6184443/a59134c2-b39c-11e4-8206-f546276f80c7.PNG)

For Running Applications, Cores Using and Cores Requested will be shown.
For Completed Applications, only Cores Requested will be shown.

What do you think about it?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...

2015-02-12 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/4525#issuecomment-74197007
  
@vanzin To my opinion, it's a typical Producer-Consumer Problem. 

I'm a little confused with your approach. Would you please explain it in 
detail for me? Will a shared Container be used in your approach? If not, how to 
pass data from Producer to Consumer? If yes, what's the differences between 
your approach and mine?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5771] Number of Cores in Completed Appl...

2015-02-12 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/4567#issuecomment-74191436
  
@andrewor14 What about showing the core number users requested 
(total-executor-cores) for now? 
Users or Administrators (at least me) may want to see this information.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...

2015-02-12 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/4525#issuecomment-74078446
  
Hi @vanzin, the failed test passed on my local environment. I have no idea 
why it failed. Would please check it for me?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5771] Number of Cores in Completed Appl...

2015-02-12 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/4567#issuecomment-74074999
  

![default](https://cloud.githubusercontent.com/assets/5574887/6168735/c8600e78-b302-11e4-8c3b-1e04854d0735.PNG)

I'm really confused with the core number show above.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5771] Number of Cores in Completed Appl...

2015-02-12 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/4567#issuecomment-74072604
  
The problem here is that the Cores displayed here is confused, because it 
depends on whether sc.stop() is called or not.

So I choose to display core max here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5771] Number of Cores in Completed Appl...

2015-02-12 Thread marsishandsome
GitHub user marsishandsome opened a pull request:

https://github.com/apache/spark/pull/4567

[SPARK-5771] Number of Cores in Completed Applications of Standalone Master 
Web Page always be 0 if sc.stop() is called

In Standalone mode, the number of cores in Completed Applications of the 
Master Web Page will always be zero, if sc.stop() is called.
But the number will always be right, if sc.stop() is not called.
The reason maybe: 
after sc.stop() is called, the function removeExecutor of class 
ApplicationInfo will be called, thus reduce the variable coresGranted to zero. 
The variable coresGranted is used to display the number of Cores on the Web 
Page.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marsishandsome/spark Spark5771

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4567.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4567


commit cfbd97d68e322ce25882a9aa08ae665c6ba24ad0
Author: guliangliang 
Date:   2015-02-12T13:21:38Z

[SPARK-5771] Number of Cores in Completed Applications of Standalone Master 
Web Page always be 0 if sc.stop() is called




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...

2015-02-12 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/4525#issuecomment-74046962
  
I've updated the implementation.

Two background threads are used to load the log files:
1. one thread to check the file list
2. another to fetch and parse the log files

There my be some race condition problems if a thread pool is used to fetch 
and parse the log files.
The following problems must be taken care:
1. The threads in the pool share a common unparsed file list, which is 
produced by another thread
2. The threads in the pool update a common parsed file list
3. The unparsed file list is sorted by file update time
4. The parsed file list is sorted by application finish time
5. The UI thread can at the same time get the content of both unparsed file 
list and parsed file list

Other reasons why I choose the two-thread implementation are: 
1. If a thread pool is used, the network will be the next bottleneck.
2. It's ok for users, at least for me, if the missing meta information will 
be finished loading in 3 hours. At least they can visit the job detail webpage.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...

2015-02-11 Thread marsishandsome
Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/4525#issuecomment-74020260
  
@vanzin Thanks for your advice. I will improve the implementation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...

2015-02-10 Thread marsishandsome
GitHub user marsishandsome opened a pull request:

https://github.com/apache/spark/pull/4525

[SPARK-5522] Accelerate the Histroty Server start

When starting the history server, all the log files will be fetched and 
parsed in order to get the applications' meta data e.g. App Name, Start Time, 
Duration, etc. In our production cluster, there exist 2600 log files (160G) in 
HDFS and it costs 3 hours to restart the history server, which is a little bit 
too long for us.

It would be better, if the history server can show logs with missing 
information during start-up and fill the missing information after fetching and 
parsing a log file.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marsishandsome/spark Spark5522

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4525.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4525


commit be5670c937d163b3c8238248c80bcf472333678f
Author: guliangliang 
Date:   2015-02-11T06:45:01Z

[SPARK-5522] Accelerate the Histroty Server start




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5733] Error Link in Pagination of Histr...

2015-02-10 Thread marsishandsome
GitHub user marsishandsome opened a pull request:

https://github.com/apache/spark/pull/4523

[SPARK-5733] Error Link in Pagination of HistroyPage when showing 
Incomplete Applications

The links in pagination of HistroyPage is wrong when showing Incomplete 
Applications.

If "2" is click on the following page 
"http://history-server:18080/?page=1&showIncomplete=true";, it will go to 
"http://history-server:18080/?page=2"; instead of 
"http://history-server:18080/?page=2&showIncomplete=true";.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marsishandsome/spark Spark5733

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4523.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4523


commit 9d7b5931f6f802892de6d86c6728c4d31edfe105
Author: guliangliang 
Date:   2015-02-11T05:39:07Z

[SPARK-5733] Error Link in Pagination of HistroyPage when showing 
Incomplete Applications




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org