[GitHub] spark pull request #15297: [WIP][SPARK-9862]Handling data skew

2016-10-10 Thread SaintBacchus
Github user SaintBacchus commented on a diff in the pull request:

https://github.com/apache/spark/pull/15297#discussion_r82730696
  
--- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ---
@@ -138,13 +138,16 @@ private[spark] abstract class MapOutputTracker(conf: 
SparkConf) extends Logging
* and the second item is a sequence of (shuffle block id, 
shuffle block size) tuples
* describing the shuffle blocks that are stored at that block 
manager.
*/
-  def getMapSizesByExecutorId(shuffleId: Int, startPartition: Int, 
endPartition: Int)
+  def getMapSizesByExecutorId(shuffleId: Int, startPartition: Int, 
endPartition: Int,
+  mapid: Int = -1)
--- End diff --

It's better to use `Seq[Int]` to fetch many maps in one time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15297: [WIP][SPARK-9862]Handling data skew

2016-10-10 Thread SaintBacchus
Github user SaintBacchus commented on a diff in the pull request:

https://github.com/apache/spark/pull/15297#discussion_r82728585
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SkewShuffleRowRDD.scala 
---
@@ -0,0 +1,147 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import java.util.Arrays
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+
+class SkewCoalescedPartitioner(
+val parent: Partitioner,
--- End diff --

Nit: code format


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14887: [SPARK-17321][YARN] YARN shuffle service should use good...

2016-08-30 Thread SaintBacchus
Github user SaintBacchus commented on the issue:

https://github.com/apache/spark/pull/14887
  
If there are some bad disk in local-dirs, `NodeManager` will not pass these 
bad disk to spark executor. So it's not necessary to check it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14530: [SPARK-16868][Web Ui] Fix executor be both dead and aliv...

2016-08-11 Thread SaintBacchus
Github user SaintBacchus commented on the issue:

https://github.com/apache/spark/pull/14530
  
I will re-run this case, and dig into why the executor will double register.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14530: [SPARK-16868][Web Ui] Fix executor be both dead and aliv...

2016-08-11 Thread SaintBacchus
Github user SaintBacchus commented on the issue:

https://github.com/apache/spark/pull/14530
  
\cc  @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14534: [SPARK-16941]Use concurrentHashMap instead of scala Map ...

2016-08-09 Thread SaintBacchus
Github user SaintBacchus commented on the issue:

https://github.com/apache/spark/pull/14534
  
any other comment?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14534: [SPARK-16941]Use concurrentHashMap instead of sca...

2016-08-09 Thread SaintBacchus
Github user SaintBacchus commented on a diff in the pull request:

https://github.com/apache/spark/pull/14534#discussion_r74028969
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/server/SparkSQLOperationManager.scala
 ---
@@ -39,15 +38,19 @@ private[thriftserver] class SparkSQLOperationManager()
   val handleToOperation = ReflectionUtils
 .getSuperField[JMap[OperationHandle, Operation]](this, 
"handleToOperation")
 
-  val sessionToActivePool = Map[SessionHandle, String]()
-  val sessionToContexts = Map[SessionHandle, SQLContext]()
+  val sessionToActivePool = new ConcurrentHashMap[SessionHandle, String]()
--- End diff --

`sessionToActivePool` and `sessionToContext` will be used in 
`SparkSQLSessionManager` at `openSession` and `closeSession`  methond. To make 
this field as private, it must add new funciton here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14534: [SPARK-16941]Use concurrentHashMap instead of sca...

2016-08-09 Thread SaintBacchus
Github user SaintBacchus commented on a diff in the pull request:

https://github.com/apache/spark/pull/14534#discussion_r74023262
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/server/SparkSQLOperationManager.scala
 ---
@@ -39,15 +38,19 @@ private[thriftserver] class SparkSQLOperationManager()
   val handleToOperation = ReflectionUtils
 .getSuperField[JMap[OperationHandle, Operation]](this, 
"handleToOperation")
 
-  val sessionToActivePool = Map[SessionHandle, String]()
-  val sessionToContexts = Map[SessionHandle, SQLContext]()
+  val sessionToActivePool = new ConcurrentHashMap[SessionHandle, String]()
--- End diff --

the whole class is private, it this necessary to make flied to be private?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14534: [SPARK-16941]Use concurrentHashMap instead of scala Map ...

2016-08-08 Thread SaintBacchus
Github user SaintBacchus commented on the issue:

https://github.com/apache/spark/pull/14534
  
cc/ @srowen Is this OK?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14530: [SPARK-16868][Web Ui] Fix executor be both dead a...

2016-08-07 Thread SaintBacchus
Github user SaintBacchus commented on a diff in the pull request:

https://github.com/apache/spark/pull/14530#discussion_r73827019
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/StorageStatusListener.scala ---
@@ -77,6 +77,18 @@ class StorageStatusListener(conf: SparkConf) extends 
SparkListener {
   val maxMem = blockManagerAdded.maxMem
   val storageStatus = new StorageStatus(blockManagerId, maxMem)
   executorIdToStorageStatus(executorId) = storageStatus
+
+  // Try to remove the dead storage status if same executor register 
the block manger twice.
+  removeDeadExecutorStorageStatus(executorId)
+}
+  }
+
+  private def removeDeadExecutorStorageStatus(executorId: String): Unit = {
+deadExecutorStorageStatus.zipWithIndex.foreach { case (status, index) 
=>
--- End diff --

`retain` seems to be a method in `MapLike`, but I can't find any similar 
method in `ListBuffer`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14534: [SPARK-16941]Add SynchronizedMap trait with Map i...

2016-08-07 Thread SaintBacchus
GitHub user SaintBacchus opened a pull request:

https://github.com/apache/spark/pull/14534

[SPARK-16941]Add SynchronizedMap trait with Map in SparkSQLOperationManager.

## What changes were proposed in this pull request?
ThriftServer will have some thread-safe problem in 
**SparkSQLOperationManager**.
Add a SynchronizedMap trait for the maps in it to avoid this problem.

Details in [SPARK-16941](https://issues.apache.org/jira/browse/SPARK-16941)


## How was this patch tested?
NA

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SaintBacchus/spark SPARK-16941

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14534.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14534


commit 4af58bc3c9e3ff436e6258aff96a663cf55aa8ba
Author: huangzhaowei 
Date:   2016-08-08T04:06:17Z

Add SynchronizedMap trait with Map in SparkSQLOperationManager to avoid 
concurrency problem.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14530: [SPARK-16868][Web Ui] Fix executor be both dead a...

2016-08-07 Thread SaintBacchus
GitHub user SaintBacchus opened a pull request:

https://github.com/apache/spark/pull/14530

[SPARK-16868][Web Ui] Fix executor be both dead and alive on executor ui.

## What changes were proposed in this pull request?
In a heavy pressure of the spark application, since the executor will 
register it to driver block manager twice(because of heart beats), the executor 
will show as picture show:

![image](https://cloud.githubusercontent.com/assets/7404824/17467245/c1359094-5d4e-11e6-843a-f6d6347e1bf6.png)

## How was this patch tested?
NA


Details in: [SPARK-16868](https://issues.apache.org/jira/browse/SPARK-16868)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SaintBacchus/spark SPARK-16868

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14530.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14530


commit 6fe4d13fb743f9f3ca5808ba3a7c7c6923e45d0a
Author: huangzhaowei 
Date:   2016-08-03T08:37:17Z

Try to remove dead storage status on BlockManagerAdded event to avoid 
duplicate executor in WebUI.

commit 85b385f47c0751549befc00a31bb554e24443932
Author: huangzhaowei 
Date:   2016-08-05T02:04:22Z

Merge branch 'master' into SPARK-16868




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14679] [UI] Fix UI DAG visualization OO...

2016-04-17 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/12437#issuecomment-211227098
  
@rdblue Can this PR fix the case like this:
```java
2016-02-24 15:40:20,260 | ERROR | [qtp1927776715-4120] | Failed to make dot 
file of stage 619 | org.apache.spark.Logging$class.logError(Logging.scala:96)
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at java.util.Arrays.copyOf(Arrays.java:3332)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10473][YARN]Login again in the driver t...

2016-02-26 Thread SaintBacchus
Github user SaintBacchus closed the pull request at:

https://github.com/apache/spark/pull/8942


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12523][YARN]Support long-running of the...

2016-02-24 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/10645#issuecomment-188588644
  
@tgravescs we supported to run `spark on hbase` beyond 7 days. It work well 
but I did not test with hive `metastore` and this is the similar case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor][SPARK-13482][Configuration]Make consis...

2016-02-24 Thread SaintBacchus
GitHub user SaintBacchus opened a pull request:

https://github.com/apache/spark/pull/11360

[Minor][SPARK-13482][Configuration]Make consistency of the configuraiton 
named  in TransportConf.

`spark.storage.memoryMapThreshold` has two kind of the value, one is 
2*1024*1024 as integer and the other one is '2m' as string.
"2m" is recommanded in document but it will go wrong if the code goes into 
`TransportConf#memoryMapBytes`.

[Jira](https://issues.apache.org/jira/browse/SPARK-13482)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SaintBacchus/spark SPARK-13482

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11360.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11360


commit f8367ee7f9685503b8ef495b1cd34047e4926af4
Author: huangzhaowei 
Date:   2016-02-25T03:13:16Z

Make consistency of the configuraiton named  in TransportConf.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Streaming][UI][SPARK-12672]Use the uiRoot fun...

2016-01-06 Thread SaintBacchus
GitHub user SaintBacchus opened a pull request:

https://github.com/apache/spark/pull/10617

[Streaming][UI][SPARK-12672]Use the uiRoot function instead of default root 
path to gain the streaming batch url.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SaintBacchus/spark SPARK-12672

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10617.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10617


commit 70a12b68f157d5f3175941cca8624fa32e702f65
Author: huangzhaowei 
Date:   2016-01-06T08:13:45Z

Use the uiRoot function instead of default root path to gain the streaming 
batch url.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12316] Wait a minutes to avoid cycle ca...

2015-12-30 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/10475#issuecomment-168105166
  
This only work in cluster, but this is easy to reproduce in cluster   
1. Start-up a yarn-client spark application
2. Remove the staging dir when AM finished write the token to HDFS but the 
driver had not read it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12316] Wait a minutes to avoid cycle ca...

2015-12-24 Thread SaintBacchus
GitHub user SaintBacchus opened a pull request:

https://github.com/apache/spark/pull/10475

[SPARK-12316] Wait a minutes to avoid cycle calling.

When application end, AM will clean the staging dir.
But if the driver trigger to update the delegation token, it will can't 
find the right token file and then it will endless cycle call the method 
'updateCredentialsIfRequired'.
Then it lead driver StackOverflowError.
https://issues.apache.org/jira/browse/SPARK-12316

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SaintBacchus/spark SPARK-12316

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10475.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10475


commit b1ba56be4dba90933c5a17dfd875f6a9d9f74b6e
Author: huangzhaowei 
Date:   2015-12-25T07:18:12Z

Wait a minutes to avoid cycle calling.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10766][SPARK-SUBMIT]Add some configurat...

2015-11-19 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/8918#issuecomment-158258804
  
OK, I close.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10766][SPARK-SUBMIT]Add some configurat...

2015-11-19 Thread SaintBacchus
Github user SaintBacchus closed the pull request at:

https://github.com/apache/spark/pull/8918


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11043][SQL]BugFix:Set the operator log ...

2015-11-02 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/9056#issuecomment-153281531
  
I had modify the code as you @chenghao-intel comment and also add a simple 
test case for it  @JoshRosen .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11043][SQL]BugFix:Set the operator log ...

2015-10-20 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/9056#issuecomment-149757304
  
It does the same issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10473][YARN]Login again in the driver t...

2015-10-19 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/8942#issuecomment-149442226
  
We had considered to the way to reopen the file. In that way it may have to 
consider the synchronization problem between event log producer and consumer 
with more codes. 
Later, I found this way and it's more clear. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10473][YARN]Login again in the driver t...

2015-10-19 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/8942#issuecomment-149439492
  
I'm not very clear about how to use `doAs` for the `EventLoggingListener`. 
You can open a PR and I will help to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10766][SPARK-SUBMIT]Add some configurat...

2015-10-16 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/8918#issuecomment-148879845
  
@andrewor14 as I described in 
[JIRA](https://issues.apache.org/jira/browse/SPARK-10766), in yarn-cluster 
mode, it's hard to set the class path of the client process. But if I want use 
the hbase, I had to set the hbase jars into the class path of this process.
BTW, I think spark user may want to do some thing in the this process, so I 
think it's better to enhance the configuration of client.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11000][YARN]Bug fix: Derby have booted ...

2015-10-15 Thread SaintBacchus
Github user SaintBacchus commented on a diff in the pull request:

https://github.com/apache/spark/pull/9026#discussion_r42205797
  
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -1272,11 +1272,24 @@ object Client extends Logging {
   val mirror = universe.runtimeMirror(getClass.getClassLoader)
 
   try {
-val hiveClass = 
mirror.classLoader.loadClass("org.apache.hadoop.hive.ql.metadata.Hive")
-val hive = hiveClass.getMethod("get").invoke(null)
-
-val hiveConf = hiveClass.getMethod("getConf").invoke(hive)
 val hiveConfClass = 
mirror.classLoader.loadClass("org.apache.hadoop.hive.conf.HiveConf")
+val hiveConf = hiveConfClass.newInstance()
+
+// Set metastore to be a local temp directory to avoid conflict of 
the `metaStore client`
+// in `HiveContext` which will use the same derby dataBase by 
default.
+val hiveConfSet = (param: String, value: String) => hiveConfClass
+  .getMethod("set", classOf[Unit])
+  .invoke(hiveConf, param, value)
+val tempDir = Utils.createTempDir()
+val localMetastore = new File(tempDir, "metastore")
+hiveConfSet("hive.metastore.warehouse.dir", 
localMetastore.toURI.toString)
+hiveConfSet("javax.jdo.option.ConnectionURL",
+  
s"jdbc:derby:;databaseName=${localMetastore.getAbsolutePath};create=true")
+hiveConfSet("datanucleus.rdbms.datastoreAdapterClassName",
+  "org.datanucleus.store.rdbms.adapter.DerbyAdapter")
+
+val hiveClass = 
mirror.classLoader.loadClass("org.apache.hadoop.hive.ql.metadata.Hive")
+val hive = hiveClass.getMethod("get").invoke(null, 
hiveConf.asInstanceOf[Object])
--- End diff --

Good idea


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11000][YARN]Bug fix: Derby have booted ...

2015-10-13 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/9026#issuecomment-147660393
  
Actually there are two metastores. In hive-1.2.1 when we use 
`metastoe.Hive`, it will create the metastore in static code block. As spark 
have two class loader(main class loader and hive metastore class loader), there 
will be two metasotres.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11000][YARN]Bug fix: Derby have booted ...

2015-10-12 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/9026#issuecomment-147614244
  
@srowen In this issue there is only one `HiveContext`, but there will have 
two `metastoe.Hive` instance in two different class loaderes. And in the 
implement of  `metastoe.Hive` it will create the each database instance in 
loading this class.
So we have to set the configuration `javax.jdo.option.ConnectionURL` to a 
temp dir to avoid the problem I mentioned.
And actually this logic was refer to the implement of 
[SparkSQLCLIDriver](https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala#L84)。


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11043][SQL]BugFix:Set the operator log ...

2015-10-09 Thread SaintBacchus
GitHub user SaintBacchus opened a pull request:

https://github.com/apache/spark/pull/9056

[SPARK-11043][SQL]BugFix:Set the operator log in the thrift server.

`SessionManager` will set the `operationLog` if the configuration 
`hive.server2.logging.operation.enabled` is true in version of hive 1.2.1.
But the spark did not adapt to this change, so no matter enabled the 
configuration or not, spark thrift server will always log the warn message.
PS: if `hive.server2.logging.operation.enabled` is false, it should log the 
warn message (the same as hive thrift server).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SaintBacchus/spark SPARK-11043

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9056.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9056


commit 74b2a46d269ef91857f0d3aed203e171dad7eef1
Author: huangzhaowei 
Date:   2015-10-10T02:22:08Z

[SPARK-11043][SQL]BugFix:Set the operator log in the thrift server.

commit eeb04490198052c4e013bf4bdcf68e77eac5eea8
Author: huangzhaowei 
Date:   2015-10-10T02:31:04Z

Fix the code style.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11000][YARN]Bug fix: Derby have booted ...

2015-10-09 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/9026#issuecomment-147023274
  
/cc @marmbrus @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11000][YARN]Bug fix: Derby have booted ...

2015-10-08 Thread SaintBacchus
GitHub user SaintBacchus opened a pull request:

https://github.com/apache/spark/pull/9026

[SPARK-11000][YARN]Bug fix: Derby have booted the database twice in yarn 
security mode.


[obtainTokenForHiveMetastore](https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L1267)
 in yarn.Client.scala will init the `Hive`.
It will create a connect to the database and the meta store client in 
`HiveContext` will also create a connect to the database. If use the derby by 
default, it will go wrong.
So I specilized the configuration of the `javax.jdo.option.ConnectionURL` 
in the `obtainTokenForHiveMetastore` to avoid this issue.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SaintBacchus/spark SPARK-11000

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9026.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9026


commit 0fab8c74977927be9a505025754b39fcbef9d614
Author: huangzhaowei 
Date:   2015-10-08T07:38:10Z

[SPARK-11000][YARN]Bug fix: Derby have booted the database twice in yarn 
security mode.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10473][YARN]Login again in the driver t...

2015-10-07 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/8942#issuecomment-146425804
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10766][SPARK-SUBMIT]Add some configurat...

2015-10-07 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/8918#issuecomment-146390642
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10786][SQL]Take the whole statement to ...

2015-10-07 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/8895#issuecomment-146390491
  
@liancheng Can you take a look at this small change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10755][YARN]Set driver also update the ...

2015-10-07 Thread SaintBacchus
Github user SaintBacchus closed the pull request at:

https://github.com/apache/spark/pull/8867


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10473][YARN]Login again in the driver t...

2015-10-07 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/8942#issuecomment-146390416
  
Yeah @tgravescs I'm running in yarn client mode. I'm sure that 
`HDFS_DELEGATION_TOKEN token 2339 for spark` is the original token gained by 
the driver. But I don't know which is the valid token used for the event-log 
writer. I set `dfs.namenode.delegation.token.max-lifetime` to be 5 minutes.
In our test, the event log will work fine if the login again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10473][YARN]Login again in the driver t...

2015-09-29 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/8942#issuecomment-144272777
  
@harishreedharan The evenLog will still be stopped by the `token expired` 
exception. 
The event log was a long-running output stream, #8867 can't update its 
inner token.
```
java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:153)
at 
org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:153)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:153)
at 
org.apache.spark.scheduler.EventLoggingListener.onStageCompleted(EventLoggingListener.scala:176)
at 
org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:32)
at 
org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:32)
at 
org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:32)
at 
org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:56)
at 
org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
at 
org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:82)
at 
org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1217)
at 
org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:66)
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 token (HDFS_DELEGATION_TOKEN token 2339 for spark) can't be found in cache
at org.apache.hadoop.ipc.Client.call(Client.java:1511)
at org.apache.hadoop.ipc.Client.call(Client.java:1442)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:416)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
at 
org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1652)
at 
org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1453)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:579)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10755][YARN]Set driver also update the ...

2015-09-29 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/8867#issuecomment-144262717
  
@tgravescs @harishreedharan this fix will still loss the event log, maybe 
it's not a better approach.
so we had raise a new [approach](https://github.com/apache/spark/pull/8942) 
to resolve this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10473][YARN]Login again in the driver t...

2015-09-29 Thread SaintBacchus
GitHub user SaintBacchus opened a pull request:

https://github.com/apache/spark/pull/8942

[SPARK-10473][YARN]Login again in the driver to avoid the events lossing.

As discussed  with @tgravescs and @harishreedharan  at the 
[8867](https://github.com/apache/spark/pull/8867#issuecomment-142970395), if 
the `SaslRpcClient`'s authentication is *TOKEN*, it will have the `token 
expired` exception.
 But if the authentication is *KERBEROS*`, it will renew the token 
automatically.
This modify can change to authentication  from *TOKEN * into *KERBEROS *.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SaintBacchus/spark SPARK-10473

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8942.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8942


commit fd1f73531514865ecf0b632af628650b0b6f1983
Author: huangzhaowei 
Date:   2015-09-30T02:03:00Z

[SPARK-10473][YARN]Login again in the driver to avoid the events lossing.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10755][YARN]Set driver also update the ...

2015-09-29 Thread SaintBacchus
Github user SaintBacchus commented on a diff in the pull request:

https://github.com/apache/spark/pull/8867#discussion_r40750990
  
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -544,6 +545,7 @@ private[spark] class Client(
   logInfo(s"Credentials file set to: $credentialsFile")
   val renewalInterval = getTokenRenewalInterval(stagingDirPath)
   sparkConf.set("spark.yarn.token.renewal.interval", 
renewalInterval.toString)
+  SparkHadoopUtil.get.startExecutorDelegationTokenRenewer(sparkConf)
--- End diff --

This code had change the configuration `spark.yarn.credentials.file`, and 
this configuration would be used in `DelegationTokenUpdate`, so it had to put 
the `start` after this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8839][SQL]High concurrence will also ca...

2015-09-29 Thread SaintBacchus
Github user SaintBacchus closed the pull request at:

https://github.com/apache/spark/pull/7889


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10755][YARN]Set driver also update the ...

2015-09-25 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/8867#issuecomment-143173483
  
@tgravescs I had noticed the code of 
`UserGroupInformation.loginUserFromKeytab(args.principal, args.keytab)`
After this login motivation, `yarn.Client` will change *KERBEROS* into 
*TOKEN* for the purpose of setting the token for the AM.
```scala
  /** Set up security tokens for launching our ApplicationMaster container. 
*/
  private def setupSecurityToken(amContainer: ContainerLaunchContext): Unit 
= {
val dob = new DataOutputBuffer
credentials.writeTokenStorageToStream(dob)
amContainer.setTokens(ByteBuffer.wrap(dob.getData))
  }
```
After this, the `Client` will use *TOKEN* in the RPC connection.

If I login again with keytab after this, the SAALClient will use *KERBEROS* 
again, and this can avoid token expired exception.
I had tested the recent spark and it still will throw this exception.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10766][SPARK-SUBMIT]Add some configurat...

2015-09-25 Thread SaintBacchus
GitHub user SaintBacchus opened a pull request:

https://github.com/apache/spark/pull/8918

[SPARK-10766][SPARK-SUBMIT]Add some configuration for the client process in 
cluster mode.

Add this four configurations for the client only in cluster mode:
*  `spark.client.memory` of property and `--client-memory` of cli command
* `spark.client.extraClassPath` of property and `--client-class-path` of 
cli command
* `spark.client.extraJavaOptions` of property and `--client-java-options` 
of cli command
* `spark.client.extraLibraryPath` of property and `--client-library-path` 
of cli command

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SaintBacchus/spark SPARK-10766

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8918.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8918


commit 94a707ae2fcb4d41718e97160ec905876f716193
Author: huangzhaowei 
Date:   2015-09-25T09:21:21Z

[SPARK-10766][SPARK-SUBMIT]Add some configuration for the client process in 
cluster mode.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10755][YARN]Set driver also update the ...

2015-09-24 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/8867#issuecomment-142970395
  
@tgravescs I wrote a simple `DFSCliet` application to continuously write 
string into hdfs and this can work over the configuration 
`dfs.namenode.delegation.token.max-lifetime`.
So I open the DEBUG logging and find some regularities:
If using the **KERBEROS** to gain the authority of the `NameNode`, it can 
work over the it.
> 15/09/24 19:53:38 DEBUG SaslRpcClient: Use **KERBEROS** authentication 
for protocol ClientNamenodeProtocolPB

But if using **TOKEN**, the application may existed with *token expired 
exception*.
> 15/09/24 19:53:58 DEBUG SaslRpcClient: Use **TOKEN** authentication for 
protocol ClientNamenodeProtocolPB

Spark was using the *Token*.
 One way to resolve this issue is that login with keytab again then the 
mode of the `SaslRpcClient` will be changed into *KERBEROS*.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10786][SQL]Take the whole statement to ...

2015-09-23 Thread SaintBacchus
GitHub user SaintBacchus opened a pull request:

https://github.com/apache/spark/pull/8895

[SPARK-10786][SQL]Take the whole statement to generate the CommandProcessor

In the now implementation of `SparkSQLCLIDriver.scala`: 
`val proc: CommandProcessor = CommandProcessorFactory.get(Array(tokens(0)), 
hconf)`
`CommandProcessorFactory` only take the first token of the statement, and 
this will be hard to diff the statement `delete jar xxx` and `delete from xxx`.
So maybe it's better to take the whole statement into the 
`CommandProcessorFactory`.

And in 
[HiveCommand](https://github.com/SaintBacchus/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/processors/HiveCommand.java#L76),
 it already special handing these two statement.
```java
if(command.length > 1 && "from".equalsIgnoreCase(command[1])) {
  //special handling for SQL "delete from  where..."
  return null;
}
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SaintBacchus/spark SPARK-10786

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8895.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8895


commit d44672e3c8cf068c899392a870efa86e274bfde3
Author: huangzhaowei 
Date:   2015-09-24T02:16:00Z

[SPARK-10786][SQL]Take the whole statement to generate the CommandProcessor




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10755][YARN]Set driver also update the ...

2015-09-23 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/8867#issuecomment-142516215
  
@harishreedharan I set `fs.hdfs.impl.disable.cache` to avoid cache 
mechanism in the hadoop.
I tested in the yarn-client mode, if I apply this pr the application will 
be OK and I remove it the application goes down.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10755][YARN]Set driver also update the ...

2015-09-22 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/8867#issuecomment-142468611
  
@harishreedharan @tgravescs Hadoop RPC actually will do the re-login with 
the keytab but the token only can persist 7 days by default. So it must be 
updated.
The test step is below:
1. shorter the configutation `dfs.namenode.delegation.token.max-lifetime` 
and `dfs.namenode.delegation.token.renew-interval`, maybe 10min
2.start a `spark-shell` or  `spark-sql`
3.After 15min, execuse a job
Then the application will fail with token expired exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10755][YARN]Set driver also update the ...

2015-09-22 Thread SaintBacchus
GitHub user SaintBacchus opened a pull request:

https://github.com/apache/spark/pull/8867

[SPARK-10755][YARN]Set driver also update the token for long-running 
application

In the yarn-client mode, driver will write the event logs into hdfs and get 
the partition information from hdfs, so it's nessary to update the token from 
the `AMDelegationTokenRenewer`.
In the yarn-cluster mode, driver is company with AM and token will update 
by AM. But it's still better to update the token for client process since the 
client wants to delete the staging dir with a expired token.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SaintBacchus/spark SPARK-10755

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8867.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8867


commit 00fd0bc4cd2d6b31ba197629fbe1e9e07a2497bc
Author: huangzhaowei 
Date:   2015-09-22T11:00:47Z

[SPARK_10755][YARN]Set driver also update the token for long-running 
application.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8839][SQL]High concurrence will also ca...

2015-08-14 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/7889#issuecomment-131079850
  
@zsxwing hive had a configuration named 
`hive.server2.thrift.max.worker.threads`  which had already limit the 
concurrence.
But my problem was not caused by `trimSessionIfNecessary `. 
In high concurrence, `onStatementStart ` will be executed before the 
`onSessionCreated `. which cause this problem.
As this patch had conflicts with master, I will test it again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8839][SQL]High concurrence will also ca...

2015-08-02 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/7889#issuecomment-127141862
  
@liancheng @tianyi had reviewed the patch before, can you take some time to 
review this again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8839][SQL]High concurrence will also ca...

2015-08-02 Thread SaintBacchus
GitHub user SaintBacchus opened a pull request:

https://github.com/apache/spark/pull/7889

[SPARK-8839][SQL]High concurrence will also cause the `key not found` error 
in HiveThriftServer2

This PR is related to [7239](https://github.com/apache/spark/pull/7239).
It's show in a high concurrence scenario.
When there are about 500 clients connecting to the server at the same time, 
the method `onStatementStart`  will be executed before `onSessionCreated` at 
about 10% probability.
So it's better to add a wait for the session to build up.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SaintBacchus/spark KeyNotFound

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7889.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7889


commit 25b3c1d568d7b99de956b9f310bb2fb846403fe1
Author: huangzhaowei 
Date:   2015-08-03T06:33:34Z

Resolved another reason for SPARK-8839




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8592] [CORE] CoarseGrainedExecutorBacke...

2015-07-22 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/7110#issuecomment-123997780
  
Meet this problem too, do you have any update informations @xuchenCN 
@darkcrawler01 ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9091][STREAMING]Add the CompressionCode...

2015-07-20 Thread SaintBacchus
Github user SaintBacchus closed the pull request at:

https://github.com/apache/spark/pull/7442


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9091][STREAMING]Add the CompressionCode...

2015-07-20 Thread SaintBacchus
Github user SaintBacchus commented on a diff in the pull request:

https://github.com/apache/spark/pull/7442#discussion_r34973017
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala ---
@@ -906,12 +908,16 @@ abstract class DStream[T: ClassTag] (
   /**
* Save each RDD in this DStream as at text file, using string 
representation
* of elements. The file name at each batch interval is generated based 
on
-   * `prefix` and `suffix`: "prefix-TIME_IN_MS.suffix".
+   * `prefix` and `suffix`: "prefix-TIME_IN_MS.suffix". If the 
`CompressionCodec`
+   * is defined, it will use specific `CompressionCodec` to compress the 
text.
*/
-  def saveAsTextFiles(prefix: String, suffix: String = ""): Unit = 
ssc.withScope {
--- End diff --

Do you mean it's no need to change this API and leave this to user by 
themselves?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9091][STREAMING]Add the CompressionCode...

2015-07-20 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/7442#issuecomment-122785045
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9091][STREAMING]Add the CompressionCode...

2015-07-16 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/7442#issuecomment-121941467
  
@tdas @srowen Can you review this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9091][STREAMING]Add the CompressionCode...

2015-07-16 Thread SaintBacchus
GitHub user SaintBacchus opened a pull request:

https://github.com/apache/spark/pull/7442

[SPARK-9091][STREAMING]Add the CompressionCodec to the saveAsTextFiles 
interface in DStream.

Add the `CompressionCodec` to the `saveAsTextFiles` interface. 
To be compatible with old interface, use the `Option` to adapt the code.

[Jira Address](https://issues.apache.org/jira/browse/SPARK-9091)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SaintBacchus/spark SPARK-9091

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7442.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7442


commit f60c25e37f114feda952daf54c37c2d4b7290795
Author: huangzhaowei 
Date:   2015-07-16T10:57:17Z

[SPARK-9091][STREAMING]Add the CompressionCodec to the saveAsTextFiles 
interface.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8974] The thread of spark-dynamic-execu...

2015-07-10 Thread SaintBacchus
Github user SaintBacchus commented on a diff in the pull request:

https://github.com/apache/spark/pull/7352#discussion_r34409077
  
--- Diff: 
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -211,7 +212,16 @@ private[spark] class ExecutorAllocationManager(
 listenerBus.addListener(listener)
 
 val scheduleTask = new Runnable() {
-  override def run(): Unit = Utils.logUncaughtExceptions(schedule())
+  override def run(): Unit = {
--- End diff --

It's all the same code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8839][SQL]ThriftServer2 will remove ses...

2015-07-09 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/7239#issuecomment-120199090
  
@liancheng Can you merge it into master if it's OK?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8755][Streaming]Login user before readi...

2015-07-09 Thread SaintBacchus
Github user SaintBacchus closed the pull request at:

https://github.com/apache/spark/pull/7158


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8820][Streaming] Add a configuration to...

2015-07-09 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/7218#issuecomment-119857376
  
@harishreedharan Can you also review this PR plz


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8851][YARN] In Yarn client mode, Client...

2015-07-09 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/7255#issuecomment-119855467
  
@harishreedharan I had tested your PR with my issue, it actually work.
But I doubt  few user may start up `SparkContext` directly bypassing the 
'SparkSubmit' , can they still use this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8839][SQL]ThriftServer2 will remove ses...

2015-07-08 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/7239#issuecomment-119842176
  
@tianyi Thanks for review and comment , I had removed it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8839][SQL]ThriftServer2 will remove ses...

2015-07-08 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/7239#issuecomment-119780393
  
@tianyi reducing a little memory  if there is no new client coming soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8839][SQL]ThriftServer2 will remove ses...

2015-07-07 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/7239#issuecomment-119425235
  
@tianyi In my solution all the unfinished sessions will keep in memory.  If 
we don't check after session finish, we have to wait a new client to trigger 
this check.
> do the checking work when a session opened or an execution started


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8839][SQL]ThriftServer2 will remove ses...

2015-07-07 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/7239#issuecomment-119400914
  
@liancheng I had updated the description.
 Now I did not know why the session number will exceed the client number. 
Do you have any idea?
If we can't avoid this mechanism in Spark Code, my modify may be a 
temporary solution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8839][SQL]ThriftServer2 will remove ses...

2015-07-07 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/7239#issuecomment-119391653
  
@liancheng Maybe I mistook this issue, but it actually existed.
The deeper reason I don't mention is that if there are 200 connections at 
the same time, but the session may be 300 or above.  So if we still want to 
keep the `retainedStatements` it always will have this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8755][Streaming]Login user before readi...

2015-07-07 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/7158#issuecomment-119220142
  
@tgravescs the checkpoint had done this three things:
1.Read the checkpoint file
2.Deserialize checkpoint file and get the properties
3.Initialize the `SparkContext`
The issue was in Step One.  The code in `Client.scala` will run in Step 
Three. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8755][Streaming]Login user before readi...

2015-07-07 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/7158#issuecomment-119210684
  
@tgravescs I report this issue. Can you also take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8851][YARN] In Yarn client mode, Client...

2015-07-07 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/7255#issuecomment-119175926
  
@harishreedharan I  had tested it out, your batch did not apply my issue
Use this command twice (both the principal is expired)
```
bin/spark-submit --class xx.KafkaWordCount --master yarn-client --principal 
spark/hadoop.hadoop@hadoop.com   --keytab spark.keytab
```
The second time it will throw this exception:
```
 javax.security.sasl.SaslException: GSS initiate failed [Caused by 
GSSException: No valid credentials provided (Mechanism level: Failed to find 
any Kerberos tgt)]
```

WordCount Code:
```scala
val ssc = StreamingContext.getOrCreate("checkpoint", ()=> 
{wordCountFunction(args)})
ssc.start()
ssc.awaitTermination()


  def wordCountFunction(args: Array[String]) = {
val Array(zkQuorum, group, topics, numThreads) = args
val sparkConf = new SparkConf().setAppName("KafkaWordCount")
val ssc =  new StreamingContext(sparkConf, Seconds(5))
ssc.checkpoint("checkpoint")

val topicMap = topics.split(",").map((_,numThreads.toInt)).toMap
val lines = KafkaUtils.createStream(ssc, zkQuorum, group, 
topicMap).map(_._2)
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 
1L)).reduceByKey(_+_).transform(x=>x.sortByKey())
wordCounts.print()
ssc
  }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8851][YARN] In Yarn client mode, Client...

2015-07-07 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/7255#issuecomment-119130485
  
@harishreedharan I think it's not the same problem I reported.
I issue is that: `Streaming` will read the ckeckpoint file before it starts 
up a `SparkContext`, so in `Yarn-Client` mode we have to `login` before initial 
the `SparkContext`.
Will you take a look at my issue again if you have time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8755][Streaming]Login user before readi...

2015-07-07 Thread SaintBacchus
GitHub user SaintBacchus reopened a pull request:

https://github.com/apache/spark/pull/7158

[SPARK-8755][Streaming]Login user before reading the checkpoint file in 
hdfs.

If the user set `spark.yarn.principal` and `spark.yarn.keytab` , he does 
not need `kinit` in the client machine.
But when the application was recorved from checkpoint file, it had to 
`kinit`, because:
The checkpoint did not use this configurations before it use a DFSClient to 
fetch the ckeckpoint file.

But there is a small problem, the 
`UserGroupInformation.loginUserFromKeytab` will be called twice in checkpoint 
application. Ignored it in this PR.

[Jira Address](https://issues.apache.org/jira/browse/SPARK-8755)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SaintBacchus/spark SPARK-8755

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7158.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7158


commit 9ddd5b4bf5ca8c0759d411ac44e3ea02a578d1ba
Author: huangzhaowei 
Date:   2015-07-01T12:18:50Z

[SPARK-8755][Streaming]Login user before reading hdfs file.

commit be0df01ef86af835a64f4c69af4a5607c3c6f5a9
Author: huangzhaowei 
Date:   2015-07-01T14:15:55Z

Modify some code style.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8839][SQL]ThriftServer2 will remove ses...

2015-07-07 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/7239#issuecomment-119120336
  
> add a `filter` before `take`

It's a better idea @tianyi. 
I had modified the implement:
1. If there are hundreds of connections, we keep they in memory
2. When each session finished, we trigger `trimSessionIfNecessary` to 
remove the finished sessions and keep the list size below `retainedStatements`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8839][SQL]ThriftServer2 will remove ses...

2015-07-06 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/7239#issuecomment-119067960
  
Hi, @liancheng will you take a look at this issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8755][Streaming]Login user before readi...

2015-07-06 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/7158#issuecomment-119067831
  
OK, I close this PR now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8755][Streaming]Login user before readi...

2015-07-06 Thread SaintBacchus
Github user SaintBacchus closed the pull request at:

https://github.com/apache/spark/pull/7158


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8755][Streaming]Login user before readi...

2015-07-06 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/7158#issuecomment-119043741
  
@harishreedharan this is an issue only in Client mode. 
Will your PR cover this issue? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8839][SQL]ThriftServer2 will remove ses...

2015-07-06 Thread SaintBacchus
GitHub user SaintBacchus opened a pull request:

https://github.com/apache/spark/pull/7239

[SPARK-8839][SQL]ThriftServer2 will remove session and execution no matter 
it's finished or not.


[Code](https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala#L220)
 in HiveThriftServer2 use `take` to get the element which will be removed. 
In scala 
[doc](http://www.scala-lang.org/api/2.10.4/#scala.collection.IterableLike) 
`take` had a note:
> Note: might return different results for different runs, unless the 
underlying collection type is ordered.

`take` does not take the first elements in the list, and it may remove some 
session and execution which still will be used.
So add a check before removing it, but this solution will cause all the 
unfinished execution keep in the memory.

[Jira Address](https://issues.apache.org/jira/browse/SPARK-8839)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SaintBacchus/spark SPARK-8839

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7239.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7239


commit 9d5ceb8f980830a4a6be6c09bacbae9f005f734d
Author: huangzhaowei 
Date:   2015-07-06T11:49:39Z

[SPARK-8839][SQL]ThriftServer2 will remove session and execution no matter 
it's finished or not.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8820][Streaming] Add a configuration to...

2015-07-04 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/7218#issuecomment-118501476
  
> adding yet more config and API surface area unless there's a clear need

@srowen  Do you mean it's not necessary to add this config?  
I purpose the user may not hard code the `checkpoint directory` or they 
must implement the config themself.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8820][Streaming] Add a configuration to...

2015-07-03 Thread SaintBacchus
GitHub user SaintBacchus opened a pull request:

https://github.com/apache/spark/pull/7218

[SPARK-8820][Streaming] Add a configuration to set checkpoint dir.

Add a configuration to set checkpoint directory  for convenience to user.
[Jira Address](https://issues.apache.org/jira/browse/SPARK-8820)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SaintBacchus/spark SPARK-8820

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7218.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7218


commit dd0acc15a093970d1e035f621adaa95885efae99
Author: huangzhaowei 
Date:   2015-07-04T04:02:53Z

[SPARK-8820][Streaming] Add a configuration to set checkpoint dir.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8811][SQL] Read array struct data from ...

2015-07-03 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/7209#issuecomment-118321204
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8755][Streaming]Login user before readi...

2015-07-01 Thread SaintBacchus
GitHub user SaintBacchus opened a pull request:

https://github.com/apache/spark/pull/7158

[SPARK-8755][Streaming]Login user before reading the checkpoint file in 
hdfs.

If the user set `spark.yarn.principal` and `spark.yarn.keytab` , he does 
not need `kinit` in the client machine.
But when the application was recorved from checkpoint file, it had to 
`kinit`, because:
The checkpoint did not use this configurations before it use a DFSClient to 
fetch the ckeckpoint file.

But there is one problem, the `UserGroupInformation.loginUserFromKeytab` 
will be called twice in checkpoint application.
[Jira](https://issues.apache.org/jira/browse/SPARK-8755)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SaintBacchus/spark SPARK-8755

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7158.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7158


commit 9ddd5b4bf5ca8c0759d411ac44e3ea02a578d1ba
Author: huangzhaowei 
Date:   2015-07-01T12:18:50Z

[SPARK-8755][Streaming]Login user before reading hdfs file.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8687][YARN]Fix bug: Executor can't fetc...

2015-06-29 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/7066#issuecomment-116934534
  
>We modify YarnClientSchedulerBackend#start to call super.start() after we 
have submitted the application

@andrewor14 This modify is much more suitable for this problem. But if user 
had to set configuration in other deploy mode, they had to be cautious about 
this problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8688][YARN]Bug fix: disable the cache f...

2015-06-29 Thread SaintBacchus
Github user SaintBacchus commented on a diff in the pull request:

https://github.com/apache/spark/pull/7069#discussion_r33536245
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
---
@@ -334,6 +334,17 @@ class SparkHadoopUtil extends Logging {
* Stop the thread that does the delegation token updates.
*/
   private[spark] def stopExecutorDelegationTokenRenewer() {}
+
+  /**
+   * Disable the hadoop fs cache mechanism, otherwise DFSClient will use 
old token to connect nn.
+   */
+  private[spark]
+  def getConfBypassingFSCache(hadoopConf: Configuration, path: Path): 
Configuration = {
--- End diff --

I keep this function name since this function is not a general methon, it 
only fresh the `cache` configuration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8119][Scheduler]Do not let Spark set to...

2015-06-29 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/6662#issuecomment-116911098
  
OK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8119][Scheduler]Do not let Spark set to...

2015-06-29 Thread SaintBacchus
Github user SaintBacchus closed the pull request at:

https://github.com/apache/spark/pull/6662


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8619][Streaming]Don't recover keytab an...

2015-06-29 Thread SaintBacchus
Github user SaintBacchus commented on a diff in the pull request:

https://github.com/apache/spark/pull/7008#discussion_r33534263
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala ---
@@ -44,11 +44,19 @@ class Checkpoint(@transient ssc: StreamingContext, val 
checkpointTime: Time)
   val sparkConfPairs = ssc.conf.getAll
 
   def createSparkConf(): SparkConf = {
+val reloadConfs = List(
+  "spark.master",
+  "spark.yarn.keytab",
+  "spark.yarn.principal")
+
 val newSparkConf = new SparkConf(loadDefaults = 
false).setAll(sparkConfPairs)
   .remove("spark.driver.host")
   .remove("spark.driver.port")
-val newMasterOption = new SparkConf(loadDefaults = 
true).getOption("spark.master")
-newMasterOption.foreach { newMaster => 
newSparkConf.setMaster(newMaster) }
+val newReloadConf = new SparkConf(loadDefaults = true)
+reloadConfs.foreach { conf =>
+  newReloadConf.getOption(conf)
+.foreach(confValue => newSparkConf.set(conf, confValue))
--- End diff --

Modified the code style


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8688][YARN]Bug fix: disable the cache f...

2015-06-28 Thread SaintBacchus
Github user SaintBacchus commented on a diff in the pull request:

https://github.com/apache/spark/pull/7069#discussion_r33421898
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
---
@@ -334,6 +334,16 @@ class SparkHadoopUtil extends Logging {
* Stop the thread that does the delegation token updates.
*/
   private[spark] def stopExecutorDelegationTokenRenewer() {}
+
+  /**
+   * Disable the hadoop fs cache mechanism, otherwise DFSClient will use 
old token to connect nn.
+   */
+  private[spark] def getDiscachedConf(hadoopConf: Configuration, path: 
Path): Configuration = {
+val newConf = new Configuration(hadoopConf)
+val confKey = s"fs.${path.toUri.getScheme}.impl.disable.cache"
--- End diff --

Ok, rename it to be `getConfBypassingFSCache`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8688][YARN]Bug fix: disable the cache f...

2015-06-28 Thread SaintBacchus
GitHub user SaintBacchus opened a pull request:

https://github.com/apache/spark/pull/7069

[SPARK-8688][YARN]Bug fix: disable the cache fs to gain the HDFS connection.

If `fs.hdfs.impl.disable.cache` was `false`(default), `FileSystem` will use 
the cached `DFSClient` which use old token.
So It's better to set the `fs.hdfs.impl.disable.cache`  as `true` to avoid 
token expired.

[Jira](https://issues.apache.org/jira/browse/SPARK-8688)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SaintBacchus/spark SPARK-8688

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7069.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7069


commit cf776a14725940e888ec187d210b74e1cc24c191
Author: huangzhaowei 
Date:   2015-06-28T08:19:17Z

[SPARK-8688][YARN]Bug fix: disable the cache fs to gain the HDFS connection.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8687][YARN]Fix bug: Executor can't fetc...

2015-06-28 Thread SaintBacchus
GitHub user SaintBacchus opened a pull request:

https://github.com/apache/spark/pull/7066

[SPARK-8687][YARN]Fix bug: Executor can't fetch the new set configuration

Spark initi the properties CoarseGrainedSchedulerBackend.start 
```scala
// TODO (prashant) send conf instead of properties
driverEndpoint = rpcEnv.setupEndpoint(
  CoarseGrainedSchedulerBackend.ENDPOINT_NAME, new 
DriverEndpoint(rpcEnv, properties))
```
Then the yarn logic will set some configuration but not update in this 
`properties`.
So `Executor` won't gain the `properties`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SaintBacchus/spark SPARK-8687

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7066.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7066


commit e4dd9a8660c642c08f32a92c57199f0e1ba64b82
Author: huangzhaowei 
Date:   2015-06-28T07:17:57Z

[SPARK-8687][YARN]Fix bug: Executor can't fetch the new set configuration.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8619][Streaming]Don't recover keytab an...

2015-06-25 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/7008#issuecomment-115148516
  
@harishreedharan You are right, the principal is all the same. 
I considerred them as couple configurations, so I added it into the 
list.:smile:


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8619][Streaming]Don't recover keytab an...

2015-06-24 Thread SaintBacchus
GitHub user SaintBacchus opened a pull request:

https://github.com/apache/spark/pull/7008

[SPARK-8619][Streaming]Don't recover keytab and principal configuration 
within Streaming chckpoint


[Client.scala](https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L786)
 will change these configurations, so this would cause the problem that the 
Streaming recover logic can't find the local keytab file(since configuration 
was changed)
```scala
  sparkConf.set("spark.yarn.keytab", keytabFileName)
  sparkConf.set("spark.yarn.principal", args.principal)
```

Problem described at 
[Jira](https://issues.apache.org/jira/browse/SPARK-8619)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SaintBacchus/spark SPARK-8619

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7008.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7008


commit 0d8f800c742a78870f8ab76232ed8bb18684b84e
Author: huangzhaowei 
Date:   2015-06-25T02:27:55Z

Don't recover keytab and principal configuration within Streaming 
checkpoint.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8367][Streaming]Add a limit for 'spark....

2015-06-15 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/6818#issuecomment-111986910
  
@jerryshao I think due to the data loss bug, we can call zero is a illegal 
setting.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8367][Streaming]Add a limit for 'spark....

2015-06-14 Thread SaintBacchus
GitHub user SaintBacchus opened a pull request:

https://github.com/apache/spark/pull/6818

[SPARK-8367][Streaming]Add a limit for 'spark.streaming.blockInterval`  
since a data loss bug.

Bug had reported in the jira 
[SPARK-8367](https://issues.apache.org/jira/browse/SPARK-8367)
The relution is limitting the configuration `spark.streaming.blockInterval` 
to a positive number.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SaintBacchus/spark SPARK-8367

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6818.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6818


commit 3d17796d88c35294dde8d4f9ffad00dda98bd631
Author: huangzhaowei 
Date:   2015-06-15T02:41:36Z

[SPARK_8367][Streaming]Add a limit for 'spark.streaming.blockInterval' 
since a data loss bug.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8119][Scheduler]Do not let Spark set to...

2015-06-10 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/6662#issuecomment-111004513
  
@andrewor14  did I describe the scenario clearly? can you review it again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8119][Scheduler]Do not let Spark set to...

2015-06-05 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/6662#issuecomment-109503080
  
@andrewor14  @vanzin I draw a simple call stack, as this:

![image](https://cloud.githubusercontent.com/assets/7404824/8017792/df6f4cf8-0c32-11e5-90ff-7192d30b8d3f.png)

If the `doRequestTotalExecutors` logic happened, it reset the total 
executors of the application.
But there was a prolem: at the monment if other executor also had down, the 
Spark will never pull it up again.
This simple scenario can reproduce this issue: 
There are 2 applications and each wants 2 executor, so total 4 cup cores 
wanted(every executor wants one core). But the RM only has 3 cores, so when 
first application(A) gained 2 cores and second applicaiton(B) gained only one 
core waitting A release the cores.
Then kill one of the A's executor, B will pull up its executor and let A 
wait the resource.
After the `TimeOut` logic occures in A  then B application has finished its 
job and releases its resource.
As the expection, A wil push its anohter other executor again but actually 
it will never happen.
A may be a Streaming application. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8119][Scheduler]Do not let Spark set to...

2015-06-04 Thread SaintBacchus
GitHub user SaintBacchus opened a pull request:

https://github.com/apache/spark/pull/6662

[SPARK-8119][Scheduler]Do not let Spark set total executors when executor 
fails

`DynamicAllocation` will set the total executor to a little number when it 
wants to kill some executors.
But in no-DynamicAllocation scenario, Spark will also set the total 
executor.
So it will cause such problem: sometimes an executor fails down, there is 
no more executor which will be pull up by spark

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SaintBacchus/spark SPARK-8119

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6662.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6662


commit 610c390d243a39718ccf7c506c9e5a37784cc65f
Author: huangzhaowei 
Date:   2015-06-05T02:03:48Z

[SPARK-8119][Scheduler]Do not let Spark set total executors when executor 
fails




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6464][Core]Add a function named 'proces...

2015-05-10 Thread SaintBacchus
Github user SaintBacchus closed the pull request at:

https://github.com/apache/spark/pull/5152


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6584][CORE]Provide ExecutorPrefixTaskLo...

2015-05-10 Thread SaintBacchus
Github user SaintBacchus closed the pull request at:

https://github.com/apache/spark/pull/5240


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >