[GitHub] spark pull request: [SPARK-2675]LiveListenerBus Queue Overflow

2014-08-30 Thread uncleGen
Github user uncleGen commented on the pull request:

https://github.com/apache/spark/pull/1356#issuecomment-53977557
  
okay!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2773]use growth rate to predict if need...

2014-09-01 Thread uncleGen
Github user uncleGen commented on the pull request:

https://github.com/apache/spark/pull/1696#issuecomment-54102947
  
@pwendell  OK!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2506]: In yarn-cluster mode, Applicatio...

2014-09-02 Thread uncleGen
Github user uncleGen closed the pull request at:

https://github.com/apache/spark/pull/1429


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [GraphX]: trim some useless informations of Ve...

2014-09-03 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/2249

[GraphX]: trim some useless informations of VertexRDD in some cases



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark master_graphx-trim

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2249.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2249


commit 09b059e61da550e03f2d497d370e7bd2c9e1832a
Author: uncleGen 
Date:   2014-09-03T13:29:07Z

[GraphX]: trim some useless informations of VertexRDD in some cases




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3373][GraphX]: trim some useless inform...

2014-09-03 Thread uncleGen
Github user uncleGen commented on the pull request:

https://github.com/apache/spark/pull/2249#issuecomment-54399143
  
@ankurdave Thanks for you comments, I will update it as soon as possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3373][GraphX]: trim some useless inform...

2014-09-03 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2249#discussion_r17095835
  
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/Graph.scala ---
@@ -262,13 +262,61 @@ abstract class Graph[VD: ClassTag, ED: ClassTag] 
protected () extends Serializab
 : Graph[VD, ED]
 
   /**
+   * Restricts the graph to only the vertices and edges satisfying the 
predicates. The resulting
+   * subgraph satisifies
+   *
+   * {{{
+   * V' = {v : for all v in V where vpred(v)}
+   * E' = {(u,v): for all (u,v) in E where epred((u,v)) && vpred(u) && 
vpred(v)}
+   * }}}
+   *
+   * @param epred the edge predicate, which takes a triplet and
+   * evaluates to true if the edge is to remain in the subgraph.  Note
+   * that only edges where both vertices satisfy the vertex
+   * predicate are considered.
+   *
+   * @param vpred the vertex predicate, which takes a vertex object and
+   * evaluates to true if the vertex is to be included in the subgraph
+   *
+   * @param updateRoutingTable whether to rebuild the routingTable in
+   * VertexRDD to reduce the shuffle when shipping VertexAttributes.
+   *
+   * @return the subgraph containing only the vertices and edges that
+   * satisfy the predicates
+   */
+  def subgraph(
+  epred: EdgeTriplet[VD,ED] => Boolean = (x => true),
+  vpred: (VertexId, VD) => Boolean = ((v, d) => true),
+  updateRoutingTable: Boolean)
+  : Graph[VD, ED]
--- End diff --

something wrong here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3636][CORE]:It is not friendly to inter...

2014-09-22 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/2488

[SPARK-3636][CORE]:It is not friendly to interrupt a Job when user passe...

... different storageLevels to a RDD

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark master-minorfix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2488.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2488


commit 0e3831cc56f2532d5fdd8500b26dfe34a62069cd
Author: uncleGen 
Date:   2014-09-22T09:48:29Z

[SPARK-3636][CORE]:It is not friendly to interrupt a Job when user passe 
different storageLevels to a RDD




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3636][CORE]:It is not friendly to inter...

2014-09-23 Thread uncleGen
Github user uncleGen closed the pull request at:

https://github.com/apache/spark/pull/2488


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3636][CORE]:It is not friendly to inter...

2014-09-23 Thread uncleGen
Github user uncleGen commented on the pull request:

https://github.com/apache/spark/pull/2488#issuecomment-56505578
  
@pwendell ah, make sense, I will close this PR. Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1774] Respect SparkSubmit --jars on YAR...

2014-09-23 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/710#discussion_r17951094
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
@@ -192,15 +236,17 @@ class SparkSubmitSuite extends FunSuite with 
ShouldMatchers {
   }
 
   test("launch simple application with spark-submit") {
-runSparkSubmit(
-  Seq(
-"--class", SimpleApplicationTest.getClass.getName.stripSuffix("$"),
-"--name", "testApp",
-"--master", "local",
-"unUsed.jar"))
+val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
--- End diff --

Here, you mean a jar with empty content? I can not pass this test. But set 
a class name( i.e. Seq.empty -> Seq("filename") ), it is OK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Bug Fix: LiveListenerBus Queue Overflow

2014-07-10 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/1356

Bug Fix: LiveListenerBus Queue Overflow

As we know, the size of  "eventQueue" is fixed. When event comes faster 
than consume speed of listener, overflow events will be thrown away with 
throwing an "logQueueFullErrorMessage'. As a result, we find spark UI in 
chaotic state. In my Strategy, those overflow events are "cached" in 
"blockManager" first , and replayed by "replayThread" when "eventQueue" is 
free. The unique drawback is some display delay in Spark UI. But it is better 
than event loss. This delay will disappear when event stress alleviates.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1356.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1356

----
commit 52906a911b111f3848e21c3b449b76237f658051
Author: uncleGen 
Date:   2014-07-10T12:17:32Z

Bug Fix: LiveListenerBus Queue Overflow




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [GraphX]: override the "setName" function to s...

2014-08-19 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/2033

[GraphX]: override the "setName" function to set EdgeRDD's name manually 
just as VertexRDD does.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark master_origin

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2033.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2033


commit 801994b9c5567c2188708e67fd86b41e681ce7a8
Author: uncleGen 
Date:   2014-08-19T15:08:20Z

Update EdgeRDD.scala

[GraphX]: override the "setName" function to set EdgeRDD's name manually 
just as VertexRDD does.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE]: Bug Fix in Storage UI

2014-08-20 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/2076

[SPARK-3170][CORE]: Bug Fix in Storage UI

current compeleted stage only need to remove its own partitions that are no 
longer cached. Currently, "Storage" in Spark UI may lost some rdds which are 
cached actually.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark master_ui_storage

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2076.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2076


commit 7633f4f62b01df78e649f55c79dfa8db48e21718
Author: uncleGen 
Date:   2014-08-21T03:40:36Z

Bug Fix in Storage UI




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE]: Bug Fix in Storage UI

2014-08-21 Thread uncleGen
Github user uncleGen commented on the pull request:

https://github.com/apache/spark/pull/2076#issuecomment-52908740
  
@srowen yes! Not only in "StorageTab", "ExectutorTab" may also lose some 
rdd-infos which have been overwritten by following rdd in a same task.

"StorageTab": when multiple stages run simultaneously, completed stage will 
remove rdd-info which belong to other stages which are still running.

"ExectutorTab": In a dependency chain of rdds, task can only update the 
latest rdd info. Like the following example:

  val r1 = sc.paralize(..).cache()
  val r2 = r1.map(...).cache()
  val n = r2.count()

Currently, "ExecutorTab" show a wrong info about the "Memory Used", it only 
shows the usage of r2, but not r1 and r2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE]: Bug Fix in Storage UI

2014-08-22 Thread uncleGen
Github user uncleGen commented on the pull request:

https://github.com/apache/spark/pull/2076#issuecomment-53144395
  
@pwendell  Okay! I will add them as soon as possible and pay more attention.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

2014-08-26 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/2131

[SPARK-3170][CORE][BUG]:RDD info loss in "StorageTab" and "ExecutorTab"

compeleted stage only need to remove its own partitions that are no longer 
cached. However, "StorageTab" may lost some rdds which are cached actually. Not 
only in "StorageTab", "ExectutorTab" may also lose some rdd info which have 
been overwritten by last rdd in a same task.
1. "StorageTab": when multiple stages run simultaneously, completed stage 
will remove rdd info which belong to other stages that are still running.
2. "ExectutorTab": taskcontext may lose some "updatedBlocks" info of  rdds  
in a dependency chain. Like the following example:
 val r1 = sc.paralize(..).cache()
 val r2 = r1.map(...).cache()
 val n = r2.count()

When count the r2, r1 and r2 will be cached finally. So in 
CacheManager.getOrCompute, the taskcontext should contain "updatedBlocks" of r1 
and r2. Currently, the "updatedBlocks" only contain the info of r2. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark master_ui_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2131.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2131


commit c82ba82ae90c92244e63811f30e1aeb05608c57a
Author: uncleGen 
Date:   2014-08-26T06:54:04Z

Bug Fix: RDD info loss in "StorageTab" and "ExecutorTab"




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE]: Bug Fix in Storage UI

2014-08-26 Thread uncleGen
Github user uncleGen commented on the pull request:

https://github.com/apache/spark/pull/2076#issuecomment-53383270
  
@andrewor14  @pwendell  @srowen  
As my branch is not up to date, I decide to close this and submit a new PR.
 Please Review It : https://github.com/apache/spark/pull/2131


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE]: Bug Fix in Storage UI

2014-08-26 Thread uncleGen
Github user uncleGen closed the pull request at:

https://github.com/apache/spark/pull/2076


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

2014-08-26 Thread uncleGen
Github user uncleGen commented on the pull request:

https://github.com/apache/spark/pull/2131#issuecomment-53521662
  
Hi @andrewor14, test it again please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

2014-08-27 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2131#discussion_r16761636
  
--- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
@@ -68,7 +68,9 @@ private[spark] class CacheManager(blockManager: 
BlockManager) extends Logging {
   // Otherwise, cache the values and keep track of any updates in 
block statuses
   val updatedBlocks = new ArrayBuffer[(BlockId, BlockStatus)]
   val cachedValues = putInBlockManager(key, computedValues, 
storageLevel, updatedBlocks)
-  context.taskMetrics.updatedBlocks = Some(updatedBlocks)
+  val metrics = context.taskMetrics
+  val lastUpdatedBlocks = 
metrics.updatedBlocks.getOrElse(Seq[(BlockId, BlockStatus)]())
+  metrics.updatedBlocks = Some(lastUpdatedBlocks ++ 
updatedBlocks.toSeq)
--- End diff --

@andrewor14 IMHO, the "getOrCompute" can be called more than once per task 
(indirect recursively). In this code snippet: 

   val rdd1 = sc.parallelize(...).cache()
   val rdd2 = rdd1.map(...).cache()
   val count = rdd2.count()

This code snippet will submit one stage . We take task-1 as an example. 
Task-1 firstly calls getOrCompute(rdd-2) , and then calls getOrCompute(rdd-1) 
inside "getOrCompute(rdd-2)". Therefore, it will generates and caches block 
rdd-1-1 and  block rdd-2-1 one by one. At the end of getOrCompute(rdd-1), the 
taskMetrics.updatedBlocks of task-1 will be seq(rdd-1-1). Then at the end of 
getOrCompute(rdd-2), the taskMetrics.updatedBlocks will be seq(rdd-1-1, 
rdd-2-1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

2014-08-27 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2131#discussion_r16762096
  
--- Diff: core/src/test/scala/org/apache/spark/CacheManagerSuite.scala ---
@@ -87,4 +99,12 @@ class CacheManagerSuite extends FunSuite with 
BeforeAndAfter with EasyMockSugar
   assert(value.toList === List(1, 2, 3, 4))
 }
   }
+
+  test("verity task metrics updated correctly") {
+blockManager = sc.env.blockManager
+cacheManager = new CacheManager(blockManager)
+val context = new TaskContext(0, 0, 0)
+cacheManager.getOrCompute(rdd3, split, context, 
StorageLevel.MEMORY_ONLY)
+assert(context.taskMetrics.updatedBlocks.getOrElse(Seq()).size == 2)
--- End diff --

sorry for my poor coding, I will review again


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

2014-08-27 Thread uncleGen
Github user uncleGen commented on the pull request:

https://github.com/apache/spark/pull/2131#issuecomment-53549872
  
@andrewor14 sorry for my poor coding. Unit test passed locally, test it 
again pls.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16863: Swamidass & Baldi Approximations

2017-02-08 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16863
  
Please review http://spark.apache.org/contributing.html before opening a 
pull request.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16818: [SPARK-19451][SQL][Core] Underlying integer overflow in ...

2017-02-09 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16818
  
cc @cloud-fan @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16857: [SPARK-19517][SS] KafkaSource fails to initialize partit...

2017-02-09 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16857
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16857: [SPARK-19517][SS] KafkaSource fails to initialize partit...

2017-02-09 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16857
  
I  think it is best to add a new unit test for this issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16818: [SPARK-19451][SQL][Core] Underlying integer overflow in ...

2017-02-09 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16818
  
cc @gatorsmile also


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16827: [SPARK-19482][CORE] Fail it if 'spark.master' is set wit...

2017-02-09 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16827
  
@srowen Could you please take a second review? 
cc @rxin also


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16827: [SPARK-19482][CORE] Fail it if 'spark.master' is set wit...

2017-02-10 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16827
  
@srowen make senses. What about logging a warning/error message if 
'spark.master' is set with different values?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16827: [SPARK-19482][CORE] Fail it if 'spark.master' is ...

2017-02-13 Thread uncleGen
Github user uncleGen closed the pull request at:

https://github.com/apache/spark/pull/16827


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16827: [SPARK-19482][CORE] Fail it if 'spark.master' is set wit...

2017-02-13 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16827
  
@srowen make sense, close it first before there is a follow-up


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16818: [SPARK-19451][SQL][Core] Underlying integer overflow in ...

2017-02-13 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16818
  
cc @hvanhovell and @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16920: [MINOR][DOCS] Add jira url in pull request descri...

2017-02-13 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/16920

[MINOR][DOCS] Add jira url in pull request description

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.



-- (**the following is new added**) --
```
## Discussion detail.

(Please fill in related JIRA url; otherwise, remove this. The format should 
be:)

For more detailed discussion, please refers to:
- [SPARK-XXX](https://issues.apache.org/jira/browse/SPARK-XXX)
```

## Discussion detail.

(Please fill in related JIRA url; otherwise, remove this. The format should 
be:)

For more detailed discussion, please refers to:
- [SPARK-XXX](https://issues.apache.org/jira/browse/SPARK-XXX)




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark pr-template

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16920.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16920


commit 510a80a49bc40da7a312ea1906f57f16f10bfbc8
Author: uncleGen 
Date:   2017-02-14T03:22:15Z

add jira url in pull request description

commit e61125acb60944e48a5be4b8218ae925e1b543b6
Author: uncleGen 
Date:   2017-02-14T03:33:21Z

update




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16920: [MINOR][DOCS] Add jira url in pull request description

2017-02-13 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16920
  
cc @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...

2017-02-14 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16818#discussion_r101187326
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala ---
@@ -180,16 +180,20 @@ class WindowSpec private[sql](
   private def between(typ: FrameType, start: Long, end: Long): WindowSpec 
= {
 val boundaryStart = start match {
   case 0 => CurrentRow
-  case Long.MinValue => UnboundedPreceding
-  case x if x < 0 => ValuePreceding(-start.toInt)
-  case x if x > 0 => ValueFollowing(start.toInt)
+  case x if x < Int.MinValue => UnboundedPreceding
--- End diff --

In fact, the type of `start` and `end` should not be `Long` here, but we 
can not change it for compatibility.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16936: [SPARK-19605][DStream] Fail it if existing resour...

2017-02-15 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/16936

[SPARK-19605][DStream] Fail it if existing resource is not enough to run 
streaming job

## What changes were proposed in this pull request?

For more detailed discussion, please review:
- [SPARK-19605](https://issues.apache.org/jira/browse/SPARK-19605)

## How was this patch tested?

add new unit test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark SPARK-19605

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16936.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16936


commit 4093330ee0c85f2c374db3c3ff1d05832c77b89e
Author: uncleGen 
Date:   2017-02-15T07:58:34Z

SPARK-19605: Fail it if existing resource is not enough to run streaming job




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16656: [SPARK-18116][DStream] Report stream input information a...

2017-02-15 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16656
  
@zsxwing Could you please take a review?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16936: [SPARK-19605][DStream] Fail it if existing resour...

2017-02-15 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16936#discussion_r101223957
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala 
---
@@ -938,7 +958,7 @@ object SlowTestReceiver {
 /** Streaming application for testing DStream and RDD creation sites */
 package object testPackage extends Assertions {
   def test() {
-val conf = new SparkConf().setMaster("local").setAppName("CreationSite 
test")
+val conf = new 
SparkConf().setMaster("local[2]").setAppName("CreationSite test")
 val ssc = new StreamingContext(conf, Milliseconds(100))
--- End diff --

unit test fail here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16936: [SPARK-19605][DStream] Fail it if existing resour...

2017-02-15 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16936#discussion_r101223862
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala 
---
@@ -215,10 +215,6 @@ package object config {
 
   /* Executor configuration. */
 
-  private[spark] val EXECUTOR_CORES = ConfigBuilder("spark.executor.cores")
-.intConf
-.createWithDefault(1)
-
   private[spark] val EXECUTOR_MEMORY_OVERHEAD = 
ConfigBuilder("spark.yarn.executor.memoryOverhead")
--- End diff --

multi-definition error in ApplicationMaster.scala, remove this as we add it 
in core


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16936: [SPARK-19605][DStream] Fail it if existing resour...

2017-02-15 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16936#discussion_r101223265
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala
 ---
@@ -437,6 +438,74 @@ class ReceiverTracker(ssc: StreamingContext, 
skipReceiverLaunch: Boolean = false
   }
 
   /**
+   * Check if existing resource is enough to run job.
+   */
+  private def checkResourceValid(): Unit = {
+val coresPerTask = ssc.conf.get(CPUS_PER_TASK)
+
+def localCpuCount: Int = Runtime.getRuntime.availableProcessors()
+
+ssc.conf.get("spark.master") match {
+  case m if m.startsWith("yarn") =>
+val numCoresPerExecutor = ssc.conf.get(EXECUTOR_CORES)
+val numExecutors = getTargetExecutorNumber()
+if (numExecutors * numCoresPerExecutor / coresPerTask < 
numReceivers) {
+  throw new SparkException("There are no enough resource to run 
Spark Streaming job: " +
+s"existing resource can only be used to scheduler some of 
receivers." +
+s"$numExecutors executors, $numCoresPerExecutor cores per 
executor, $coresPerTask " +
+s"cores per task and $numReceivers receivers.")
+}
+  case m if m.startsWith("spark") || m.startsWith("mesos") =>
+val coresMax = ssc.conf.get(CORES_MAX).getOrElse(0)
+if (coresMax / coresPerTask < numReceivers) {
+  throw new SparkException("There are no enough resource to run 
Spark Streaming job: " +
+s"existing resource can only be used to scheduler some of 
receivers." +
+s"$coresMax cores totally, $coresPerTask cores per task and 
$numReceivers receivers.")
+}
+  case m if m.startsWith("local") =>
+m match {
+  case "local" =>
+throw new SparkException("There are no enough resource to run 
Spark Streaming job.")
+  case SparkMasterRegex.LOCAL_N_REGEX(threads) =>
+val threadCount = if (threads == "*") localCpuCount else 
threads.toInt
+if (threadCount / coresPerTask < numReceivers) {
+  throw new SparkException("There are no enough resource to 
run Spark Streaming job: " +
+s"existing resource can only be used to scheduler some of 
receivers." +
+s"$threadCount threads, $coresPerTask threads per task and 
$numReceivers " +
+s"receivers.")
+}
+  case SparkMasterRegex.LOCAL_N_FAILURES_REGEX(threads, 
maxFailures) =>
+val threadCount = if (threads == "*") localCpuCount else 
threads.toInt
+if (threadCount / coresPerTask < numReceivers) {
+  throw new SparkException("There are no enough resource to 
run Spark Streaming job: " +
+s"existing resource can only be used to scheduler some of 
receivers." +
+s"$threadCount threads, $coresPerTask threads per task and 
$numReceivers " +
+s"receivers.")
+}
+  case SparkMasterRegex.LOCAL_CLUSTER_REGEX(numSlaves, 
coresPerSlave, memoryPerSlave) =>
+val coresMax = numSlaves.toInt * coresPerSlave.toInt
+if (coresMax / coresPerTask < numReceivers) {
+  throw new SparkException("There are no enough resource to 
run Spark Streaming job: " +
+s"existing resource can only be used to scheduler some of 
receivers." +
+s"$numSlaves slaves, $coresPerSlave cores per slave, 
$coresPerTask " +
+s"cores per task and $numReceivers receivers.")
+}
+}
+}
+  }
+
+  private def getTargetExecutorNumber(): Int = {
+if (Utils.isDynamicAllocationEnabled(ssc.conf)) {
+  ssc.conf.get(DYN_ALLOCATION_MAX_EXECUTORS)
+} else {
+  val targetNumExecutors =
+sys.env.get("SPARK_EXECUTOR_INSTANCES").map(_.toInt).getOrElse(2)
+  // System property can override environment variable.
--- End diff --

here "2" refers to "YarnSparkHadoopUtil.DEFAULT_NUMBER_EXECUTORS"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16920: [MINOR][DOCS] Add jira url in pull request description

2017-02-15 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16920
  
@rxin just a fast link to jira :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16920: [MINOR][DOCS] Add jira url in pull request descri...

2017-02-15 Thread uncleGen
Github user uncleGen closed the pull request at:

https://github.com/apache/spark/pull/16920


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16936: [SPARK-19605][DStream] Fail it if existing resource is n...

2017-02-15 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16936
  
@srowen 
> Does it really not work if not enough receivers can schedule? 

That's not what I want to express. What I mean is the stream output can not 
operate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16949: [SPARK-16122][CORE] Add rest api for job environm...

2017-02-15 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/16949

[SPARK-16122][CORE] Add rest api for job environment

## What changes were proposed in this pull request?

add rest api for job environment.

## How was this patch tested?

existing ut.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark SPARK-16122

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16949.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16949


commit a3d2abbc05948a425fbeadb2af00438087f7eb58
Author: uncleGen 
Date:   2017-02-16T01:45:43Z

add rest api for job environment




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16818: [SPARK-19451][SQL][Core] Underlying integer overflow in ...

2017-02-15 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16818
  
cc @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment

2017-02-15 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16949
  
 jenkins crushed. retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment

2017-02-15 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16949
  
cc @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment

2017-02-16 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16949
  
terminated by signal 9. retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16920: [MINOR][DOCS] Add jira url in pull request description

2017-02-16 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16920
  
@rxin cool, `Jirafy` is enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment

2017-02-16 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16949
  
@srowen good question!IMHO,we should add this API:

- provide complete API, the same as users see in webui
- if this is a security issue, we should address it in other ways
- maybe, existing API also has security issue as you said. Maybe, we need 
some authorization check or something else, also you said security issue.

Any suggestion is appreciated!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16936: [SPARK-19605][DStream] Fail it if existing resource is n...

2017-02-16 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16936
  
Let us call @zsxwing for some suggestions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment

2017-02-16 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16949
  
@vanzin I opened a jira (https://issues.apache.org/jira/browse/SPARK-19642) 
to research and address the potential security flaws. Do you mind if I continue 
this pr?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16972: [SPARK-19556][CORE][WIP] Broadcast data is not en...

2017-02-16 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/16972

[SPARK-19556][CORE][WIP] Broadcast data is not encrypted when I/O 
encryption is on

## What changes were proposed in this pull request?

`TorrentBroadcast` uses a couple of "back doors" into the block manager to 
write and read data.

The thing these block manager methods have in common is that they bypass 
the encryption code; so broadcast data is stored unencrypted in the block 
manager, causing unencrypted data to be written to disk if those blocks need to 
be evicted from memory.

The correct fix here is actually not to change `TorrentBroadcast`, but to 
fix the block manager so that:

- data stored in memory is not encrypted
- data written to disk is encrypted

This would simplify the code paths that use BlockManager / 
SerializerManager APIs.

## How was this patch tested?

update and add unit test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark SPARK-19556

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16972.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16972


commit 63d909b4a0cc108fd8756436b21c65614abb9466
Author: uncleGen 
Date:   2017-02-15T14:12:47Z

cp

commit f9a91d63af3191b853ef88bd48293bcc19f3ec4c
Author: uncleGen 
Date:   2017-02-17T03:54:44Z

refactor blockmanager: data stored in memory is not encrypted, data written 
to disk is encrypted




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16972: [SPARK-19556][CORE][WIP] Broadcast data is not encrypted...

2017-02-16 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16972
  
@vanzin I will add some unit test in future. But could you please review 
this first? I think I may be missing something.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16972: [SPARK-19556][CORE][WIP] Broadcast data is not en...

2017-02-16 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16972#discussion_r101682663
  
--- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala ---
@@ -21,17 +21,25 @@ import java.io.{FileOutputStream, IOException, 
RandomAccessFile}
 import java.nio.ByteBuffer
 import java.nio.channels.FileChannel.MapMode
 
-import com.google.common.io.Closeables
+import scala.reflect.ClassTag
+
+import com.google.common.io.{ByteStreams, Closeables}
+import org.apache.commons.io.IOUtils
 
-import org.apache.spark.SparkConf
 import org.apache.spark.internal.Logging
-import org.apache.spark.util.Utils
+import org.apache.spark.SparkConf
+import org.apache.spark.security.CryptoStreamUtils
+import org.apache.spark.serializer.SerializerManager
+import org.apache.spark.util.{ByteBufferInputStream, 
ByteBufferOutputStream, Utils}
 import org.apache.spark.util.io.ChunkedByteBuffer
 
 /**
  * Stores BlockManager blocks on disk.
  */
-private[spark] class DiskStore(conf: SparkConf, diskManager: 
DiskBlockManager) extends Logging {
+private[spark] class DiskStore(
+conf: SparkConf,
+serializerManager: SerializerManager,
--- End diff --

add `serializerManager ` to do decryption work in `DiskStore`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16972: [SPARK-19556][CORE][WIP] Broadcast data is not en...

2017-02-16 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16972#discussion_r101682730
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -344,7 +370,7 @@ private[spark] class MemoryStore(
 val serializationStream: SerializationStream = {
   val autoPick = !blockId.isInstanceOf[StreamBlockId]
   val ser = serializerManager.getSerializer(classTag, 
autoPick).newInstance()
-  ser.serializeStream(serializerManager.wrapStream(blockId, 
redirectableStream))
+  ser.serializeStream(serializerManager.wrapForCompression(blockId, 
redirectableStream))
 }
--- End diff --

`MemoryStore` will not do encryption work


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16972: [SPARK-19556][CORE][WIP] Broadcast data is not en...

2017-02-16 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16972#discussion_r101682505
  
--- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala ---
@@ -73,17 +81,52 @@ private[spark] class DiskStore(conf: SparkConf, 
diskManager: DiskBlockManager) e
   }
 
   def putBytes(blockId: BlockId, bytes: ChunkedByteBuffer): Unit = {
+val bytesToStore = if (serializerManager.encryptionEnabled) {
+  try {
+val data = bytes.toByteBuffer
+val in = new ByteBufferInputStream(data, true)
+val byteBufOut = new ByteBufferOutputStream(data.remaining())
+val out = CryptoStreamUtils.createCryptoOutputStream(byteBufOut, 
conf,
+  serializerManager.encryptionKey.get)
+try {
+  ByteStreams.copy(in, out)
+} finally {
+  in.close()
+  out.close()
+}
+new ChunkedByteBuffer(byteBufOut.toByteBuffer)
+  } finally {
+bytes.dispose()
+  }
+} else {
+  bytes
+}
+
 put(blockId) { fileOutputStream =>
   val channel = fileOutputStream.getChannel
   Utils.tryWithSafeFinally {
-bytes.writeFully(channel)
+bytesToStore.writeFully(channel)
   } {
 channel.close()
   }
 }
   }
 
   def getBytes(blockId: BlockId): ChunkedByteBuffer = {
+val bytes = readBytes(blockId)
+
+val in = 
serializerManager.wrapForEncryption(bytes.toInputStream(dispose = true))
+new ChunkedByteBuffer(ByteBuffer.wrap(IOUtils.toByteArray(in)))
+  }
+
+  def getBytesAsValues[T](blockId: BlockId, classTag: ClassTag[T]): 
Iterator[T] = {
+val bytes = readBytes(blockId)
+
+serializerManager
+  .dataDeserializeStream(blockId, bytes.toInputStream(dispose = 
true))(classTag)
+  }
+
+  private[storage] def readBytes(blockId: BlockId): ChunkedByteBuffer = {
--- End diff --

abstract it for unit test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16972: [SPARK-19556][CORE][WIP] Broadcast data is not en...

2017-02-16 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16972#discussion_r101682451
  
--- Diff: core/src/test/scala/org/apache/spark/storage/DiskStoreSuite.scala 
---
@@ -39,27 +40,27 @@ class DiskStoreSuite extends SparkFunSuite {
 val blockId = BlockId("rdd_1_2")
 val diskBlockManager = new DiskBlockManager(new SparkConf(), 
deleteFilesOnStop = true)
 
-val diskStoreMapped = new DiskStore(new SparkConf().set(confKey, "0"), 
diskBlockManager)
+val conf = new SparkConf()
+val serializer = new KryoSerializer(conf)
+val serializerManager = new SerializerManager(serializer, conf)
+
+conf.set(confKey, "0")
+val diskStoreMapped = new DiskStore(conf, serializerManager, 
diskBlockManager)
 diskStoreMapped.putBytes(blockId, byteBuffer)
-val mapped = diskStoreMapped.getBytes(blockId)
+val mapped = diskStoreMapped.readBytes(blockId)
 assert(diskStoreMapped.remove(blockId))
 
-val diskStoreNotMapped = new DiskStore(new SparkConf().set(confKey, 
"1m"), diskBlockManager)
+conf.set(confKey, "1m")
+val diskStoreNotMapped = new DiskStore(conf, serializerManager, 
diskBlockManager)
 diskStoreNotMapped.putBytes(blockId, byteBuffer)
-val notMapped = diskStoreNotMapped.getBytes(blockId)
+val notMapped = diskStoreNotMapped.readBytes(blockId)
 
 // Not possible to do isInstanceOf due to visibility of HeapByteBuffer
 
assert(notMapped.getChunks().forall(_.getClass.getName.endsWith("HeapByteBuffer")),
   "Expected HeapByteBuffer for un-mapped read")
 assert(mapped.getChunks().forall(_.isInstanceOf[MappedByteBuffer]),
   "Expected MappedByteBuffer for mapped read")
 
-def arrayFromByteBuffer(in: ByteBuffer): Array[Byte] = {
-  val array = new Array[Byte](in.remaining())
--- End diff --

remove unused


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16972: [SPARK-19556][CORE][WIP] Broadcast data is not encrypted...

2017-02-17 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16972
  
working on ut faliure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment

2017-02-17 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16949
  
@vanzin @ajbozarth sure, I will check related code in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16857: [SPARK-19517][SS] KafkaSource fails to initialize...

2017-02-17 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16857#discussion_r101764130
  
--- Diff: 
external/kafka-0-10-sql/src/test/resources/kafka-source-initial-offset-version-2.1.0.bin
 ---
@@ -0,0 +1 @@
+2{"kafka-initial-offset-2-1-0":{"2":0,"1":0,"0":0}}
--- End diff --

here maybe end with a new line?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...

2017-02-19 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16818#discussion_r101948606
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala ---
@@ -180,16 +180,20 @@ class WindowSpec private[sql](
   private def between(typ: FrameType, start: Long, end: Long): WindowSpec 
= {
 val boundaryStart = start match {
   case 0 => CurrentRow
-  case Long.MinValue => UnboundedPreceding
-  case x if x < 0 => ValuePreceding(-start.toInt)
-  case x if x > 0 => ValueFollowing(start.toInt)
+  case x if x < Int.MinValue => UnboundedPreceding
--- End diff --

cc @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment

2017-02-19 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16949
  
cc @vanzin Take a second review please!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16977: [SPARK-19651][CORE] ParallelCollectionRDD.collect...

2017-02-19 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16977#discussion_r101954581
  
--- Diff: 
core/src/main/scala/org/apache/spark/rdd/ParallelCollectionRDD.scala ---
@@ -105,6 +105,17 @@ private[spark] class ParallelCollectionRDD[T: 
ClassTag](
   override def getPreferredLocations(s: Partition): Seq[String] = {
 locationPrefs.getOrElse(s.index, Nil)
   }
+
+  override def collect(): Array[T] = toArray(data)
+
+  override def take(num: Int): Array[T] = toArray(data.take(num))
+
+  private def toArray(data: Seq[T]): Array[T] = {
+// We serialize the data and deserialize it back, to simulate the 
behavior of sending it to
+// remote executors and collect it back.
+val ser = sc.env.closureSerializer.newInstance()
+ser.deserialize[Seq[T]](ser.serialize(data)).toArray
+  }
--- End diff --

Why should we simulate like this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16972: [SPARK-19556][CORE][WIP] Broadcast data is not en...

2017-02-20 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16972#discussion_r101969639
  
--- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala ---
@@ -73,17 +81,52 @@ private[spark] class DiskStore(conf: SparkConf, 
diskManager: DiskBlockManager) e
   }
 
   def putBytes(blockId: BlockId, bytes: ChunkedByteBuffer): Unit = {
+val bytesToStore = if (serializerManager.encryptionEnabled) {
+  try {
+val data = bytes.toByteBuffer
+val in = new ByteBufferInputStream(data, true)
+val byteBufOut = new ByteBufferOutputStream(data.remaining())
+val out = CryptoStreamUtils.createCryptoOutputStream(byteBufOut, 
conf,
+  serializerManager.encryptionKey.get)
+try {
+  ByteStreams.copy(in, out)
+} finally {
+  in.close()
+  out.close()
+}
+new ChunkedByteBuffer(byteBufOut.toByteBuffer)
+  } finally {
+bytes.dispose()
+  }
+} else {
+  bytes
+}
+
 put(blockId) { fileOutputStream =>
   val channel = fileOutputStream.getChannel
   Utils.tryWithSafeFinally {
-bytes.writeFully(channel)
+bytesToStore.writeFully(channel)
   } {
 channel.close()
   }
 }
   }
 
   def getBytes(blockId: BlockId): ChunkedByteBuffer = {
+val bytes = readBytes(blockId)
+
+val in = 
serializerManager.wrapForEncryption(bytes.toInputStream(dispose = true))
+new ChunkedByteBuffer(ByteBuffer.wrap(IOUtils.toByteArray(in)))
--- End diff --

I see, make sense. It seems like to be much more complex than I thought in 
decryption.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...

2017-02-20 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16818#discussion_r102115145
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala ---
@@ -180,16 +180,20 @@ class WindowSpec private[sql](
   private def between(typ: FrameType, start: Long, end: Long): WindowSpec 
= {
 val boundaryStart = start match {
   case 0 => CurrentRow
-  case Long.MinValue => UnboundedPreceding
-  case x if x < 0 => ValuePreceding(-start.toInt)
-  case x if x > 0 => ValueFollowing(start.toInt)
+  case x if x < Int.MinValue => UnboundedPreceding
--- End diff --

cc @hvanhovell and @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment

2017-02-20 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16949
  
cc @srowen also.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17011: [SPARK-19676][CORE] Flaky test: FsHistoryProvider...

2017-02-21 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/17011

[SPARK-19676][CORE] Flaky test: FsHistoryProviderSuite.SPARK-3697: ignore 
directories that cannot be read.

## What changes were proposed in this pull request?

Flaky test: FsHistoryProviderSuite.SPARK-3697: ignore directories that 
cannot be read.
## How was this patch tested?

unit test pass locally


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark SPARK-19676

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17011.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17011


commit 11e9b1c3749c1687fbcd3870104f7a535d84b722
Author: uncleGen 
Date:   2017-02-21T08:03:07Z

Flaky test: FsHistoryProviderSuite.SPARK-3697: ignore directories that 
cannot be read.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16977: [SPARK-19651][CORE] ParallelCollectionRDD.collect should...

2017-02-21 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16977
  
build successfully in local, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17011: [SPARK-19676][CORE] Flaky test: FsHistoryProviderSuite.S...

2017-02-21 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/17011
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17011: [SPARK-19676][CORE] Flaky test: FsHistoryProviderSuite.S...

2017-02-21 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/17011
  
@srowen @vanzin I will test in some other platforms.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17011: [SPARK-19676][CORE] Flaky test: FsHistoryProviderSuite.S...

2017-02-21 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/17011
  
@srowen @vanzin I think the root cause is I test it in root user. So it 
always be readable no matter what access permission. IMHO, it is OK to add once 
extra access permission check, as the code scop is catching the 
AccessControlException. Maybe, I need to make the comment more clear. What's 
your opinion?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17011: [SPARK-19676][CORE] Flaky test: FsHistoryProvider...

2017-02-21 Thread uncleGen
Github user uncleGen closed the pull request at:

https://github.com/apache/spark/pull/17011


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment

2017-02-21 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16949
  
cc @srowen and @vanzin also.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16972: [SPARK-19556][CORE][WIP] Broadcast data is not en...

2017-02-21 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16972#discussion_r102390214
  
--- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala ---
@@ -73,17 +81,52 @@ private[spark] class DiskStore(conf: SparkConf, 
diskManager: DiskBlockManager) e
   }
 
   def putBytes(blockId: BlockId, bytes: ChunkedByteBuffer): Unit = {
+val bytesToStore = if (serializerManager.encryptionEnabled) {
+  try {
+val data = bytes.toByteBuffer
+val in = new ByteBufferInputStream(data, true)
+val byteBufOut = new ByteBufferOutputStream(data.remaining())
+val out = CryptoStreamUtils.createCryptoOutputStream(byteBufOut, 
conf,
+  serializerManager.encryptionKey.get)
+try {
+  ByteStreams.copy(in, out)
+} finally {
+  in.close()
+  out.close()
+}
+new ChunkedByteBuffer(byteBufOut.toByteBuffer)
+  } finally {
+bytes.dispose()
+  }
+} else {
+  bytes
+}
+
 put(blockId) { fileOutputStream =>
   val channel = fileOutputStream.getChannel
   Utils.tryWithSafeFinally {
-bytes.writeFully(channel)
+bytesToStore.writeFully(channel)
   } {
 channel.close()
   }
 }
   }
 
   def getBytes(blockId: BlockId): ChunkedByteBuffer = {
+val bytes = readBytes(blockId)
+
+val in = 
serializerManager.wrapForEncryption(bytes.toInputStream(dispose = true))
+new ChunkedByteBuffer(ByteBuffer.wrap(IOUtils.toByteArray(in)))
--- End diff --

@vanzin After take some to think about it, I find it may perplex the issue 
if we seperate `MemoryStore` with un-encrypted data and `DiskStore`with 
encrypted data. As get data from remote, we will encrypt data if it is stored 
in memory in un-encrypted style. Besides, when we 
`maybeCacheDiskBytesInMemory`, we will decrypt them again. I've thought about 
caching disk data into memory in encrypted style, and then decrypt them lazily 
when used. It makes things much complicated. Maybe, it is better to keep the 
original style, i.e. keep data encrypted (if can) in memory and disk. We should 
narrow this problem. Any suggesstion?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16980: [SPARK-19617][SS]fix structured streaming restart bug

2017-02-21 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16980
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16980: [SPARK-19617][SS]fix structured streaming restart bug

2017-02-21 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16980
  
The JIRA ID is not SPARK-19645?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16691: [SPARK-19349][DStreams]improve resource ready che...

2017-02-22 Thread uncleGen
Github user uncleGen closed the pull request at:

https://github.com/apache/spark/pull/16691


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16936: [SPARK-19605][DStream] Fail it if existing resour...

2017-02-22 Thread uncleGen
Github user uncleGen closed the pull request at:

https://github.com/apache/spark/pull/16936


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17025: [SPARK-19690][SS] Join a streaming DataFrame with...

2017-02-22 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/17025

[SPARK-19690][SS] Join a streaming DataFrame with a batch DataFrame which 
has an aggregation may not work

## What changes were proposed in this pull request?

`StatefulAggregationStrategy` should check logicplan is streaming or not

Test code:

```
case class Record(key: Int, value: String)
val df = spark.createDataFrame((1 to 100).map(i => Record(i, 
s"value_$i"))).groupBy("value").count
val lines = spark.readStream.format("socket").option("host", 
"localhost").option("port", "").load 
val words = lines.as[String].flatMap(_.split(" ")) 
val wordCounts = words.join(df, "value")
```

before pr:

```
== Physical Plan ==
*Project [value#13, count#19L]
+- *BroadcastHashJoin [value#13], [value#1], Inner, BuildRight
   :- *Filter isnotnull(value#13)
   :  +- *SerializeFromObject [staticinvoke(class 
org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, 
java.lang.String, true], true) AS value#13]
   : +- MapPartitions , obj#12: java.lang.String
   :+- DeserializeToObject value#5.toString, obj#11: 
java.lang.String
   :   +- StreamingRelation textSocket, [value#5]
   +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, 
true]))
  +- *HashAggregate(keys=[value#1], functions=[count(1)])
 +- StateStoreSave [value#1], OperatorStateId(,0,0), 
Append, 0
+- *HashAggregate(keys=[value#1], functions=[merge_count(1)])
   +- StateStoreRestore [value#1], 
OperatorStateId(,0,0)
  +- *HashAggregate(keys=[value#1], 
functions=[merge_count(1)])
 +- Exchange hashpartitioning(value#1, 200)
+- *HashAggregate(keys=[value#1], 
functions=[partial_count(1)])
   +- *Project [value#1]
  +- *Filter isnotnull(value#1)
 +- LocalTableScan [key#0, value#1]
```

after pr:

```
== Physical Plan ==
*Project [value#13, count#19L]
+- *BroadcastHashJoin [value#13], [value#1], Inner, BuildRight
   :- *Filter isnotnull(value#13)
   :  +- *SerializeFromObject [staticinvoke(class 
org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, 
java.lang.String, true], true) AS value#13]
   : +- MapPartitions , obj#12: java.lang.String
   :+- DeserializeToObject value#5.toString, obj#11: 
java.lang.String
   :   +- StreamingRelation textSocket, [value#5]
   +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, 
true]))
  +- *HashAggregate(keys=[value#1], functions=[count(1)])
 +- Exchange hashpartitioning(value#1, 200)
+- *HashAggregate(keys=[value#1], functions=[partial_count(1)])
   +- *Project [value#1]
  +- *Filter isnotnull(value#1)
 +- LocalTableScan [key#0, value#1]
```

## How was this patch tested?

add new unit test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark SPARK-19690

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17025.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17025


commit 0d0c19a2f99b6417f419f72fb28c155a70dda201
Author: uncleGen 
Date:   2017-02-22T10:18:31Z

Join a streaming DataFrame with a batch DataFrame which has an aggregation 
may not work




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17025: [SPARK-19690][SS] Join a streaming DataFrame with a batc...

2017-02-22 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/17025
  
woring on test failure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17025: [SPARK-19690][SS] Join a streaming DataFrame with...

2017-02-22 Thread uncleGen
Github user uncleGen closed the pull request at:

https://github.com/apache/spark/pull/17025


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17025: [SPARK-19690][SS] Join a streaming DataFrame with a batc...

2017-02-22 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/17025
  
wip


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17033: [DOCS] application environment rest api

2017-02-22 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/17033

[DOCS] application environment rest api

## What changes were proposed in this pull request?

application environment rest api

## How was this patch tested?

jenkins


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark doc-restapi-environment

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17033.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17033


commit 312981b22549971f4e58ad8e91bca984300efab7
Author: uncleGen 
Date:   2017-02-23T05:28:28Z

Docs for rest api of environment.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17033: [DOCS] application environment rest api

2017-02-22 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/17033
  
cc @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17033: [DOCS] application environment rest api

2017-02-22 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/17033
  
cc @vanzin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17033: [SPARK-16122][DOCS] application environment rest api

2017-02-23 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/17033
  
@vanzin done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17052: [SPARK-19690][SS] Join a streaming DataFrame with...

2017-02-23 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/17052

[SPARK-19690][SS] Join a streaming DataFrame with a batch DataFrame which 
has an aggregation may not work

## What changes were proposed in this pull request?

`StatefulAggregationStrategy` should check logicplan is streaming or not

Test code:

```
case class Record(key: Int, value: String)
val df = spark.createDataFrame((1 to 100).map(i => Record(i, 
s"value_$i"))).groupBy("value").count
val lines = spark.readStream.format("socket").option("host", 
"localhost").option("port", "").load 
val words = lines.as[String].flatMap(_.split(" ")) 
val result = words.join(df, "value")
```

before pr:

```
== Physical Plan ==
*Project [value#13, count#19L]
+- *BroadcastHashJoin [value#13], [value#1], Inner, BuildRight
   :- *Filter isnotnull(value#13)
   :  +- *SerializeFromObject [staticinvoke(class 
org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, 
java.lang.String, true], true) AS value#13]
   : +- MapPartitions , obj#12: java.lang.String
   :+- DeserializeToObject value#5.toString, obj#11: 
java.lang.String
   :   +- StreamingRelation textSocket, [value#5]
   +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, 
true]))
  +- *HashAggregate(keys=[value#1], functions=[count(1)])
 +- StateStoreSave [value#1], OperatorStateId(,0,0), 
Append, 0
+- *HashAggregate(keys=[value#1], functions=[merge_count(1)])
   +- StateStoreRestore [value#1], 
OperatorStateId(,0,0)
  +- *HashAggregate(keys=[value#1], 
functions=[merge_count(1)])
 +- Exchange hashpartitioning(value#1, 200)
+- *HashAggregate(keys=[value#1], 
functions=[partial_count(1)])
   +- *Project [value#1]
  +- *Filter isnotnull(value#1)
 +- LocalTableScan [key#0, value#1]
```

after pr:

```
== Physical Plan ==
*Project [value#13, count#19L]
+- *BroadcastHashJoin [value#13], [value#1], Inner, BuildRight
   :- *Filter isnotnull(value#13)
   :  +- *SerializeFromObject [staticinvoke(class 
org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, 
java.lang.String, true], true) AS value#13]
   : +- MapPartitions , obj#12: java.lang.String
   :+- DeserializeToObject value#5.toString, obj#11: 
java.lang.String
   :   +- StreamingRelation textSocket, [value#5]
   +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, 
true]))
  +- *HashAggregate(keys=[value#1], functions=[count(1)])
 +- Exchange hashpartitioning(value#1, 200)
+- *HashAggregate(keys=[value#1], functions=[partial_count(1)])
   +- *Project [value#1]
  +- *Filter isnotnull(value#1)
 +- LocalTableScan [key#0, value#1]
```

## How was this patch tested?

add new unit test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark SPARK-19690

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17052.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17052


commit e45b06e2495e09c6d7e7a50ee509044b526bf8d0
Author: uncleGen 
Date:   2017-02-22T10:18:31Z

Join a streaming DataFrame with a batch DataFrame which has an aggregation 
may not work

commit 9eb57b7294f2636e370be86cf975509917fdd861
Author: uncleGen 
Date:   2017-02-24T06:38:41Z

code clean




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...

2017-02-24 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/17052
  
cc @zsxwing 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17052: [SPARK-19690][SS] Join a streaming DataFrame with...

2017-02-24 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17052#discussion_r102922659
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -535,7 +535,8 @@ case class Range(
 case class Aggregate(
 groupingExpressions: Seq[Expression],
 aggregateExpressions: Seq[NamedExpression],
-child: LogicalPlan)
+child: LogicalPlan,
+stateful: Boolean = false)
   extends UnaryNode {
--- End diff --

`stateful` indicates if the aggregate is base on streaming or batch, 
resolved by `ResolveStatefulAggregate` rule


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17052: [SPARK-19690][SS] Join a streaming DataFrame with...

2017-02-24 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17052#discussion_r102922775
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala 
---
@@ -393,6 +393,17 @@ case class PreprocessTableInsertion(conf: SQLConf) 
extends Rule[LogicalPlan] {
 }
 
 /**
+ * A rule to do check if the aggregate is stateful.
+ */
+class ResolveStatefulAggregate() extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case agg @ Aggregate(groupingExpressions, aggregateExpressions, child, 
_)
+  if agg.isStreaming =>
+Aggregate(groupingExpressions, aggregateExpressions, child, 
stateful = true)
+  }
--- End diff --

resolve one aggregate, determine statefule or not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...

2017-02-24 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/17052
  
@zsxwing got it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...

2017-02-26 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/17052
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17080: [SPARK-19739][CORE] propagate S3 session token to...

2017-02-27 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/17080

[SPARK-19739][CORE] propagate S3 session token to cluser

## What changes were proposed in this pull request?

propagate S3 session token to cluser

## How was this patch tested?

existing ut


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark SPARK-19739

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17080.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17080


commit 0ae5aa73c70ae2f46a2d16087b5c55652d1e0282
Author: uncleGen 
Date:   2017-02-27T08:12:10Z

propagate S3 session token to cluser




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17082: [SPARK-19749][SS] Name socket source with a meani...

2017-02-27 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/17082

[SPARK-19749][SS] Name socket source with a meaningful name

## What changes were proposed in this pull request?

Name socket source with a meaningful name

## How was this patch tested?

Jenkins


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark SPARK-19749

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17082.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17082


commit 68349facee3b33fd5975e90c74c882f3d922
Author: uncleGen 
Date:   2017-02-27T09:35:56Z

Name socket source with a meaningful name




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17082: [SPARK-19749][SS] Name socket source with a meaningful n...

2017-02-27 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/17082
  
@srowen  I think this is the only one souce forgotten to name. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-02-27 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14731#discussion_r103187577
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
 ---
@@ -140,7 +137,7 @@ class FileInputDStream[K, V, F <: NewInputFormat[K, V]](
* a union RDD out of them. Note that this maintains the list of files 
that were processed
* in the latest modification time in the previous call to this method. 
This is because the
* modification time returned by the FileStatus API seems to return 
times only at the
-   * granularity of seconds. And new files may have the same modification 
time as the
+   * granularity of seconds in HDFS. And new files may have the same 
modification time as the
* latest modification time in the previous call to this method yet was 
not reported in
--- End diff --

got it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16370: [SPARK-18960][SQL][SS] Avoid double reading file which i...

2016-12-22 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16370
  
@zsxwing Thanks for your reminder!!
In some ways, we really can evade this issue, just like not use `-cp`. But 
this is an user-side behaviour, we can not ensure every users know and use 
correct ways to move data. It may confuse users if they do not know this issue. 
Besides, current changes is so tiny that do not have any harm to codes. So, 
IMHO, it is OK to add the check for protection. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16370: [SPARK-18960][SQL][SS] Avoid double reading file which i...

2016-12-25 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16370
  
@zsxwing Is there any farther feedback?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16414: [SPARK-19009][DOC] Add streaming rest api doc

2016-12-26 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/16414

[SPARK-19009][DOC] Add streaming rest api doc

## What changes were proposed in this pull request?

add streaming rest api doc

## How was this patch tested?


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark SPARK-19009

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16414.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16414


commit 14086492b289e8e2eab06899f38d22d885a0e41f
Author: uncleGen 
Date:   2016-12-27T06:45:22Z

add streaming rest api doc

commit c4a5941e19ac9f3e6dc5d8cd09211f6c8b18
Author: uncleGen 
Date:   2016-12-27T06:46:15Z

Merge branch 'master' into dev-doc-stream




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >