[GitHub] spark pull request: [SPARK-2675]LiveListenerBus Queue Overflow
Github user uncleGen commented on the pull request: https://github.com/apache/spark/pull/1356#issuecomment-53977557 okay! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2773]use growth rate to predict if need...
Github user uncleGen commented on the pull request: https://github.com/apache/spark/pull/1696#issuecomment-54102947 @pwendell OKï¼ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2506]: In yarn-cluster mode, Applicatio...
Github user uncleGen closed the pull request at: https://github.com/apache/spark/pull/1429 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [GraphX]: trim some useless informations of Ve...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/2249 [GraphX]: trim some useless informations of VertexRDD in some cases You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark master_graphx-trim Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2249.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2249 commit 09b059e61da550e03f2d497d370e7bd2c9e1832a Author: uncleGen Date: 2014-09-03T13:29:07Z [GraphX]: trim some useless informations of VertexRDD in some cases --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3373][GraphX]: trim some useless inform...
Github user uncleGen commented on the pull request: https://github.com/apache/spark/pull/2249#issuecomment-54399143 @ankurdave Thanks for you comments, I will update it as soon as possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3373][GraphX]: trim some useless inform...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/2249#discussion_r17095835 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/Graph.scala --- @@ -262,13 +262,61 @@ abstract class Graph[VD: ClassTag, ED: ClassTag] protected () extends Serializab : Graph[VD, ED] /** + * Restricts the graph to only the vertices and edges satisfying the predicates. The resulting + * subgraph satisifies + * + * {{{ + * V' = {v : for all v in V where vpred(v)} + * E' = {(u,v): for all (u,v) in E where epred((u,v)) && vpred(u) && vpred(v)} + * }}} + * + * @param epred the edge predicate, which takes a triplet and + * evaluates to true if the edge is to remain in the subgraph. Note + * that only edges where both vertices satisfy the vertex + * predicate are considered. + * + * @param vpred the vertex predicate, which takes a vertex object and + * evaluates to true if the vertex is to be included in the subgraph + * + * @param updateRoutingTable whether to rebuild the routingTable in + * VertexRDD to reduce the shuffle when shipping VertexAttributes. + * + * @return the subgraph containing only the vertices and edges that + * satisfy the predicates + */ + def subgraph( + epred: EdgeTriplet[VD,ED] => Boolean = (x => true), + vpred: (VertexId, VD) => Boolean = ((v, d) => true), + updateRoutingTable: Boolean) + : Graph[VD, ED] --- End diff -- something wrong here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3636][CORE]:It is not friendly to inter...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/2488 [SPARK-3636][CORE]:It is not friendly to interrupt a Job when user passe... ... different storageLevels to a RDD You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark master-minorfix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2488.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2488 commit 0e3831cc56f2532d5fdd8500b26dfe34a62069cd Author: uncleGen Date: 2014-09-22T09:48:29Z [SPARK-3636][CORE]:It is not friendly to interrupt a Job when user passe different storageLevels to a RDD --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3636][CORE]:It is not friendly to inter...
Github user uncleGen closed the pull request at: https://github.com/apache/spark/pull/2488 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3636][CORE]:It is not friendly to inter...
Github user uncleGen commented on the pull request: https://github.com/apache/spark/pull/2488#issuecomment-56505578 @pwendell ah, make sense, I will close this PR. Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1774] Respect SparkSubmit --jars on YAR...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/710#discussion_r17951094 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -192,15 +236,17 @@ class SparkSubmitSuite extends FunSuite with ShouldMatchers { } test("launch simple application with spark-submit") { -runSparkSubmit( - Seq( -"--class", SimpleApplicationTest.getClass.getName.stripSuffix("$"), -"--name", "testApp", -"--master", "local", -"unUsed.jar")) +val unusedJar = TestUtils.createJarWithClasses(Seq.empty) --- End diff -- Here, you mean a jar with empty content? I can not pass this test. But set a class name( i.e. Seq.empty -> Seq("filename") ), it is OK. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Bug Fix: LiveListenerBus Queue Overflow
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/1356 Bug Fix: LiveListenerBus Queue Overflow As we know, the size of "eventQueue" is fixed. When event comes faster than consume speed of listener, overflow events will be thrown away with throwing an "logQueueFullErrorMessage'. As a result, we find spark UI in chaotic state. In my Strategy, those overflow events are "cached" in "blockManager" first , and replayed by "replayThread" when "eventQueue" is free. The unique drawback is some display delay in Spark UI. But it is better than event loss. This delay will disappear when event stress alleviates. You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1356.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1356 ---- commit 52906a911b111f3848e21c3b449b76237f658051 Author: uncleGen Date: 2014-07-10T12:17:32Z Bug Fix: LiveListenerBus Queue Overflow --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [GraphX]: override the "setName" function to s...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/2033 [GraphX]: override the "setName" function to set EdgeRDD's name manually just as VertexRDD does. You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark master_origin Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2033.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2033 commit 801994b9c5567c2188708e67fd86b41e681ce7a8 Author: uncleGen Date: 2014-08-19T15:08:20Z Update EdgeRDD.scala [GraphX]: override the "setName" function to set EdgeRDD's name manually just as VertexRDD does. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3170][CORE]: Bug Fix in Storage UI
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/2076 [SPARK-3170][CORE]: Bug Fix in Storage UI current compeleted stage only need to remove its own partitions that are no longer cached. Currently, "Storage" in Spark UI may lost some rdds which are cached actually. You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark master_ui_storage Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2076.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2076 commit 7633f4f62b01df78e649f55c79dfa8db48e21718 Author: uncleGen Date: 2014-08-21T03:40:36Z Bug Fix in Storage UI --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3170][CORE]: Bug Fix in Storage UI
Github user uncleGen commented on the pull request: https://github.com/apache/spark/pull/2076#issuecomment-52908740 @srowen yes! Not only in "StorageTab", "ExectutorTab" may also lose some rdd-infos which have been overwritten by following rdd in a same task. "StorageTab": when multiple stages run simultaneously, completed stage will remove rdd-info which belong to other stages which are still running. "ExectutorTab": In a dependency chain of rdds, task can only update the latest rdd info. Like the following example: val r1 = sc.paralize(..).cache() val r2 = r1.map(...).cache() val n = r2.count() Currently, "ExecutorTab" show a wrong info about the "Memory Used", it only shows the usage of r2, but not r1 and r2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3170][CORE]: Bug Fix in Storage UI
Github user uncleGen commented on the pull request: https://github.com/apache/spark/pull/2076#issuecomment-53144395 @pwendell Okay! I will add them as soon as possible and pay more attention. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/2131 [SPARK-3170][CORE][BUG]:RDD info loss in "StorageTab" and "ExecutorTab" compeleted stage only need to remove its own partitions that are no longer cached. However, "StorageTab" may lost some rdds which are cached actually. Not only in "StorageTab", "ExectutorTab" may also lose some rdd info which have been overwritten by last rdd in a same task. 1. "StorageTab": when multiple stages run simultaneously, completed stage will remove rdd info which belong to other stages that are still running. 2. "ExectutorTab": taskcontext may lose some "updatedBlocks" info of rdds in a dependency chain. Like the following example: val r1 = sc.paralize(..).cache() val r2 = r1.map(...).cache() val n = r2.count() When count the r2, r1 and r2 will be cached finally. So in CacheManager.getOrCompute, the taskcontext should contain "updatedBlocks" of r1 and r2. Currently, the "updatedBlocks" only contain the info of r2. You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark master_ui_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2131.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2131 commit c82ba82ae90c92244e63811f30e1aeb05608c57a Author: uncleGen Date: 2014-08-26T06:54:04Z Bug Fix: RDD info loss in "StorageTab" and "ExecutorTab" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3170][CORE]: Bug Fix in Storage UI
Github user uncleGen commented on the pull request: https://github.com/apache/spark/pull/2076#issuecomment-53383270 @andrewor14 @pwendell @srowen As my branch is not up to date, I decide to close this and submit a new PR. Please Review It : https://github.com/apache/spark/pull/2131 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3170][CORE]: Bug Fix in Storage UI
Github user uncleGen closed the pull request at: https://github.com/apache/spark/pull/2076 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...
Github user uncleGen commented on the pull request: https://github.com/apache/spark/pull/2131#issuecomment-53521662 Hi @andrewor14, test it again please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/2131#discussion_r16761636 --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala --- @@ -68,7 +68,9 @@ private[spark] class CacheManager(blockManager: BlockManager) extends Logging { // Otherwise, cache the values and keep track of any updates in block statuses val updatedBlocks = new ArrayBuffer[(BlockId, BlockStatus)] val cachedValues = putInBlockManager(key, computedValues, storageLevel, updatedBlocks) - context.taskMetrics.updatedBlocks = Some(updatedBlocks) + val metrics = context.taskMetrics + val lastUpdatedBlocks = metrics.updatedBlocks.getOrElse(Seq[(BlockId, BlockStatus)]()) + metrics.updatedBlocks = Some(lastUpdatedBlocks ++ updatedBlocks.toSeq) --- End diff -- @andrewor14 IMHO, the "getOrCompute" can be called more than once per task (indirect recursively). In this code snippet: val rdd1 = sc.parallelize(...).cache() val rdd2 = rdd1.map(...).cache() val count = rdd2.count() This code snippet will submit one stage . We take task-1 as an example. Task-1 firstly calls getOrCompute(rdd-2) , and then calls getOrCompute(rdd-1) inside "getOrCompute(rdd-2)". Therefore, it will generates and caches block rdd-1-1 and block rdd-2-1 one by one. At the end of getOrCompute(rdd-1), the taskMetrics.updatedBlocks of task-1 will be seq(rdd-1-1). Then at the end of getOrCompute(rdd-2), the taskMetrics.updatedBlocks will be seq(rdd-1-1, rdd-2-1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/2131#discussion_r16762096 --- Diff: core/src/test/scala/org/apache/spark/CacheManagerSuite.scala --- @@ -87,4 +99,12 @@ class CacheManagerSuite extends FunSuite with BeforeAndAfter with EasyMockSugar assert(value.toList === List(1, 2, 3, 4)) } } + + test("verity task metrics updated correctly") { +blockManager = sc.env.blockManager +cacheManager = new CacheManager(blockManager) +val context = new TaskContext(0, 0, 0) +cacheManager.getOrCompute(rdd3, split, context, StorageLevel.MEMORY_ONLY) +assert(context.taskMetrics.updatedBlocks.getOrElse(Seq()).size == 2) --- End diff -- sorry for my poor coding, I will review again --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...
Github user uncleGen commented on the pull request: https://github.com/apache/spark/pull/2131#issuecomment-53549872 @andrewor14 sorry for my poor coding. Unit test passed locally, test it again pls. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16863: Swamidass & Baldi Approximations
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16863 Please review http://spark.apache.org/contributing.html before opening a pull request. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16818: [SPARK-19451][SQL][Core] Underlying integer overflow in ...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16818 cc @cloud-fan @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16857: [SPARK-19517][SS] KafkaSource fails to initialize partit...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16857 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16857: [SPARK-19517][SS] KafkaSource fails to initialize partit...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16857 I think it is best to add a new unit test for this issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16818: [SPARK-19451][SQL][Core] Underlying integer overflow in ...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16818 cc @gatorsmile also --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16827: [SPARK-19482][CORE] Fail it if 'spark.master' is set wit...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16827 @srowen Could you please take a second review? cc @rxin also --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16827: [SPARK-19482][CORE] Fail it if 'spark.master' is set wit...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16827 @srowen make senses. What about logging a warning/error message if 'spark.master' is set with different values? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16827: [SPARK-19482][CORE] Fail it if 'spark.master' is ...
Github user uncleGen closed the pull request at: https://github.com/apache/spark/pull/16827 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16827: [SPARK-19482][CORE] Fail it if 'spark.master' is set wit...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16827 @srowen make sense, close it first before there is a follow-up --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16818: [SPARK-19451][SQL][Core] Underlying integer overflow in ...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16818 cc @hvanhovell and @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16920: [MINOR][DOCS] Add jira url in pull request descri...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/16920 [MINOR][DOCS] Add jira url in pull request description ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. -- (**the following is new added**) -- ``` ## Discussion detail. (Please fill in related JIRA url; otherwise, remove this. The format should be:) For more detailed discussion, please refers to: - [SPARK-XXX](https://issues.apache.org/jira/browse/SPARK-XXX) ``` ## Discussion detail. (Please fill in related JIRA url; otherwise, remove this. The format should be:) For more detailed discussion, please refers to: - [SPARK-XXX](https://issues.apache.org/jira/browse/SPARK-XXX) You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark pr-template Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16920.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16920 commit 510a80a49bc40da7a312ea1906f57f16f10bfbc8 Author: uncleGen Date: 2017-02-14T03:22:15Z add jira url in pull request description commit e61125acb60944e48a5be4b8218ae925e1b543b6 Author: uncleGen Date: 2017-02-14T03:33:21Z update --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16920: [MINOR][DOCS] Add jira url in pull request description
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16920 cc @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16818#discussion_r101187326 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala --- @@ -180,16 +180,20 @@ class WindowSpec private[sql]( private def between(typ: FrameType, start: Long, end: Long): WindowSpec = { val boundaryStart = start match { case 0 => CurrentRow - case Long.MinValue => UnboundedPreceding - case x if x < 0 => ValuePreceding(-start.toInt) - case x if x > 0 => ValueFollowing(start.toInt) + case x if x < Int.MinValue => UnboundedPreceding --- End diff -- In fact, the type of `start` and `end` should not be `Long` here, but we can not change it for compatibility. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16936: [SPARK-19605][DStream] Fail it if existing resour...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/16936 [SPARK-19605][DStream] Fail it if existing resource is not enough to run streaming job ## What changes were proposed in this pull request? For more detailed discussion, please review: - [SPARK-19605](https://issues.apache.org/jira/browse/SPARK-19605) ## How was this patch tested? add new unit test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark SPARK-19605 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16936.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16936 commit 4093330ee0c85f2c374db3c3ff1d05832c77b89e Author: uncleGen Date: 2017-02-15T07:58:34Z SPARK-19605: Fail it if existing resource is not enough to run streaming job --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16656: [SPARK-18116][DStream] Report stream input information a...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16656 @zsxwing Could you please take a review? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16936: [SPARK-19605][DStream] Fail it if existing resour...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16936#discussion_r101223957 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala --- @@ -938,7 +958,7 @@ object SlowTestReceiver { /** Streaming application for testing DStream and RDD creation sites */ package object testPackage extends Assertions { def test() { -val conf = new SparkConf().setMaster("local").setAppName("CreationSite test") +val conf = new SparkConf().setMaster("local[2]").setAppName("CreationSite test") val ssc = new StreamingContext(conf, Milliseconds(100)) --- End diff -- unit test fail here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16936: [SPARK-19605][DStream] Fail it if existing resour...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16936#discussion_r101223862 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala --- @@ -215,10 +215,6 @@ package object config { /* Executor configuration. */ - private[spark] val EXECUTOR_CORES = ConfigBuilder("spark.executor.cores") -.intConf -.createWithDefault(1) - private[spark] val EXECUTOR_MEMORY_OVERHEAD = ConfigBuilder("spark.yarn.executor.memoryOverhead") --- End diff -- multi-definition error in ApplicationMaster.scala, remove this as we add it in core --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16936: [SPARK-19605][DStream] Fail it if existing resour...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16936#discussion_r101223265 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala --- @@ -437,6 +438,74 @@ class ReceiverTracker(ssc: StreamingContext, skipReceiverLaunch: Boolean = false } /** + * Check if existing resource is enough to run job. + */ + private def checkResourceValid(): Unit = { +val coresPerTask = ssc.conf.get(CPUS_PER_TASK) + +def localCpuCount: Int = Runtime.getRuntime.availableProcessors() + +ssc.conf.get("spark.master") match { + case m if m.startsWith("yarn") => +val numCoresPerExecutor = ssc.conf.get(EXECUTOR_CORES) +val numExecutors = getTargetExecutorNumber() +if (numExecutors * numCoresPerExecutor / coresPerTask < numReceivers) { + throw new SparkException("There are no enough resource to run Spark Streaming job: " + +s"existing resource can only be used to scheduler some of receivers." + +s"$numExecutors executors, $numCoresPerExecutor cores per executor, $coresPerTask " + +s"cores per task and $numReceivers receivers.") +} + case m if m.startsWith("spark") || m.startsWith("mesos") => +val coresMax = ssc.conf.get(CORES_MAX).getOrElse(0) +if (coresMax / coresPerTask < numReceivers) { + throw new SparkException("There are no enough resource to run Spark Streaming job: " + +s"existing resource can only be used to scheduler some of receivers." + +s"$coresMax cores totally, $coresPerTask cores per task and $numReceivers receivers.") +} + case m if m.startsWith("local") => +m match { + case "local" => +throw new SparkException("There are no enough resource to run Spark Streaming job.") + case SparkMasterRegex.LOCAL_N_REGEX(threads) => +val threadCount = if (threads == "*") localCpuCount else threads.toInt +if (threadCount / coresPerTask < numReceivers) { + throw new SparkException("There are no enough resource to run Spark Streaming job: " + +s"existing resource can only be used to scheduler some of receivers." + +s"$threadCount threads, $coresPerTask threads per task and $numReceivers " + +s"receivers.") +} + case SparkMasterRegex.LOCAL_N_FAILURES_REGEX(threads, maxFailures) => +val threadCount = if (threads == "*") localCpuCount else threads.toInt +if (threadCount / coresPerTask < numReceivers) { + throw new SparkException("There are no enough resource to run Spark Streaming job: " + +s"existing resource can only be used to scheduler some of receivers." + +s"$threadCount threads, $coresPerTask threads per task and $numReceivers " + +s"receivers.") +} + case SparkMasterRegex.LOCAL_CLUSTER_REGEX(numSlaves, coresPerSlave, memoryPerSlave) => +val coresMax = numSlaves.toInt * coresPerSlave.toInt +if (coresMax / coresPerTask < numReceivers) { + throw new SparkException("There are no enough resource to run Spark Streaming job: " + +s"existing resource can only be used to scheduler some of receivers." + +s"$numSlaves slaves, $coresPerSlave cores per slave, $coresPerTask " + +s"cores per task and $numReceivers receivers.") +} +} +} + } + + private def getTargetExecutorNumber(): Int = { +if (Utils.isDynamicAllocationEnabled(ssc.conf)) { + ssc.conf.get(DYN_ALLOCATION_MAX_EXECUTORS) +} else { + val targetNumExecutors = +sys.env.get("SPARK_EXECUTOR_INSTANCES").map(_.toInt).getOrElse(2) + // System property can override environment variable. --- End diff -- here "2" refers to "YarnSparkHadoopUtil.DEFAULT_NUMBER_EXECUTORS" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16920: [MINOR][DOCS] Add jira url in pull request description
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16920 @rxin just a fast link to jira :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16920: [MINOR][DOCS] Add jira url in pull request descri...
Github user uncleGen closed the pull request at: https://github.com/apache/spark/pull/16920 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16936: [SPARK-19605][DStream] Fail it if existing resource is n...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16936 @srowen > Does it really not work if not enough receivers can schedule? That's not what I want to express. What I mean is the stream output can not operate. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16949: [SPARK-16122][CORE] Add rest api for job environm...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/16949 [SPARK-16122][CORE] Add rest api for job environment ## What changes were proposed in this pull request? add rest api for job environment. ## How was this patch tested? existing ut. You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark SPARK-16122 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16949.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16949 commit a3d2abbc05948a425fbeadb2af00438087f7eb58 Author: uncleGen Date: 2017-02-16T01:45:43Z add rest api for job environment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16818: [SPARK-19451][SQL][Core] Underlying integer overflow in ...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16818 cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16949 jenkins crushed. retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16949 cc @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16949 terminated by signal 9. retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16920: [MINOR][DOCS] Add jira url in pull request description
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16920 @rxin cool, `Jirafy` is enough. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16949 @srowen good questionï¼IMHOï¼we should add this API: - provide complete API, the same as users see in webui - if this is a security issue, we should address it in other ways - maybe, existing API also has security issue as you said. Maybe, we need some authorization check or something else, also you said security issue. Any suggestion is appreciated! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16936: [SPARK-19605][DStream] Fail it if existing resource is n...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16936 Let us call @zsxwing for some suggestions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16949 @vanzin I opened a jira (https://issues.apache.org/jira/browse/SPARK-19642) to research and address the potential security flaws. Do you mind if I continue this pr? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16972: [SPARK-19556][CORE][WIP] Broadcast data is not en...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/16972 [SPARK-19556][CORE][WIP] Broadcast data is not encrypted when I/O encryption is on ## What changes were proposed in this pull request? `TorrentBroadcast` uses a couple of "back doors" into the block manager to write and read data. The thing these block manager methods have in common is that they bypass the encryption code; so broadcast data is stored unencrypted in the block manager, causing unencrypted data to be written to disk if those blocks need to be evicted from memory. The correct fix here is actually not to change `TorrentBroadcast`, but to fix the block manager so that: - data stored in memory is not encrypted - data written to disk is encrypted This would simplify the code paths that use BlockManager / SerializerManager APIs. ## How was this patch tested? update and add unit test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark SPARK-19556 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16972.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16972 commit 63d909b4a0cc108fd8756436b21c65614abb9466 Author: uncleGen Date: 2017-02-15T14:12:47Z cp commit f9a91d63af3191b853ef88bd48293bcc19f3ec4c Author: uncleGen Date: 2017-02-17T03:54:44Z refactor blockmanager: data stored in memory is not encrypted, data written to disk is encrypted --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16972: [SPARK-19556][CORE][WIP] Broadcast data is not encrypted...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16972 @vanzin I will add some unit test in future. But could you please review this first? I think I may be missing something. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16972: [SPARK-19556][CORE][WIP] Broadcast data is not en...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16972#discussion_r101682663 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -21,17 +21,25 @@ import java.io.{FileOutputStream, IOException, RandomAccessFile} import java.nio.ByteBuffer import java.nio.channels.FileChannel.MapMode -import com.google.common.io.Closeables +import scala.reflect.ClassTag + +import com.google.common.io.{ByteStreams, Closeables} +import org.apache.commons.io.IOUtils -import org.apache.spark.SparkConf import org.apache.spark.internal.Logging -import org.apache.spark.util.Utils +import org.apache.spark.SparkConf +import org.apache.spark.security.CryptoStreamUtils +import org.apache.spark.serializer.SerializerManager +import org.apache.spark.util.{ByteBufferInputStream, ByteBufferOutputStream, Utils} import org.apache.spark.util.io.ChunkedByteBuffer /** * Stores BlockManager blocks on disk. */ -private[spark] class DiskStore(conf: SparkConf, diskManager: DiskBlockManager) extends Logging { +private[spark] class DiskStore( +conf: SparkConf, +serializerManager: SerializerManager, --- End diff -- add `serializerManager ` to do decryption work in `DiskStore` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16972: [SPARK-19556][CORE][WIP] Broadcast data is not en...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16972#discussion_r101682730 --- Diff: core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala --- @@ -344,7 +370,7 @@ private[spark] class MemoryStore( val serializationStream: SerializationStream = { val autoPick = !blockId.isInstanceOf[StreamBlockId] val ser = serializerManager.getSerializer(classTag, autoPick).newInstance() - ser.serializeStream(serializerManager.wrapStream(blockId, redirectableStream)) + ser.serializeStream(serializerManager.wrapForCompression(blockId, redirectableStream)) } --- End diff -- `MemoryStore` will not do encryption work --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16972: [SPARK-19556][CORE][WIP] Broadcast data is not en...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16972#discussion_r101682505 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -73,17 +81,52 @@ private[spark] class DiskStore(conf: SparkConf, diskManager: DiskBlockManager) e } def putBytes(blockId: BlockId, bytes: ChunkedByteBuffer): Unit = { +val bytesToStore = if (serializerManager.encryptionEnabled) { + try { +val data = bytes.toByteBuffer +val in = new ByteBufferInputStream(data, true) +val byteBufOut = new ByteBufferOutputStream(data.remaining()) +val out = CryptoStreamUtils.createCryptoOutputStream(byteBufOut, conf, + serializerManager.encryptionKey.get) +try { + ByteStreams.copy(in, out) +} finally { + in.close() + out.close() +} +new ChunkedByteBuffer(byteBufOut.toByteBuffer) + } finally { +bytes.dispose() + } +} else { + bytes +} + put(blockId) { fileOutputStream => val channel = fileOutputStream.getChannel Utils.tryWithSafeFinally { -bytes.writeFully(channel) +bytesToStore.writeFully(channel) } { channel.close() } } } def getBytes(blockId: BlockId): ChunkedByteBuffer = { +val bytes = readBytes(blockId) + +val in = serializerManager.wrapForEncryption(bytes.toInputStream(dispose = true)) +new ChunkedByteBuffer(ByteBuffer.wrap(IOUtils.toByteArray(in))) + } + + def getBytesAsValues[T](blockId: BlockId, classTag: ClassTag[T]): Iterator[T] = { +val bytes = readBytes(blockId) + +serializerManager + .dataDeserializeStream(blockId, bytes.toInputStream(dispose = true))(classTag) + } + + private[storage] def readBytes(blockId: BlockId): ChunkedByteBuffer = { --- End diff -- abstract it for unit test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16972: [SPARK-19556][CORE][WIP] Broadcast data is not en...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16972#discussion_r101682451 --- Diff: core/src/test/scala/org/apache/spark/storage/DiskStoreSuite.scala --- @@ -39,27 +40,27 @@ class DiskStoreSuite extends SparkFunSuite { val blockId = BlockId("rdd_1_2") val diskBlockManager = new DiskBlockManager(new SparkConf(), deleteFilesOnStop = true) -val diskStoreMapped = new DiskStore(new SparkConf().set(confKey, "0"), diskBlockManager) +val conf = new SparkConf() +val serializer = new KryoSerializer(conf) +val serializerManager = new SerializerManager(serializer, conf) + +conf.set(confKey, "0") +val diskStoreMapped = new DiskStore(conf, serializerManager, diskBlockManager) diskStoreMapped.putBytes(blockId, byteBuffer) -val mapped = diskStoreMapped.getBytes(blockId) +val mapped = diskStoreMapped.readBytes(blockId) assert(diskStoreMapped.remove(blockId)) -val diskStoreNotMapped = new DiskStore(new SparkConf().set(confKey, "1m"), diskBlockManager) +conf.set(confKey, "1m") +val diskStoreNotMapped = new DiskStore(conf, serializerManager, diskBlockManager) diskStoreNotMapped.putBytes(blockId, byteBuffer) -val notMapped = diskStoreNotMapped.getBytes(blockId) +val notMapped = diskStoreNotMapped.readBytes(blockId) // Not possible to do isInstanceOf due to visibility of HeapByteBuffer assert(notMapped.getChunks().forall(_.getClass.getName.endsWith("HeapByteBuffer")), "Expected HeapByteBuffer for un-mapped read") assert(mapped.getChunks().forall(_.isInstanceOf[MappedByteBuffer]), "Expected MappedByteBuffer for mapped read") -def arrayFromByteBuffer(in: ByteBuffer): Array[Byte] = { - val array = new Array[Byte](in.remaining()) --- End diff -- remove unused --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16972: [SPARK-19556][CORE][WIP] Broadcast data is not encrypted...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16972 working on ut faliure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16949 @vanzin @ajbozarth sure, I will check related code in. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16857: [SPARK-19517][SS] KafkaSource fails to initialize...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16857#discussion_r101764130 --- Diff: external/kafka-0-10-sql/src/test/resources/kafka-source-initial-offset-version-2.1.0.bin --- @@ -0,0 +1 @@ +2{"kafka-initial-offset-2-1-0":{"2":0,"1":0,"0":0}} --- End diff -- here maybe end with a new line? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16818#discussion_r101948606 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala --- @@ -180,16 +180,20 @@ class WindowSpec private[sql]( private def between(typ: FrameType, start: Long, end: Long): WindowSpec = { val boundaryStart = start match { case 0 => CurrentRow - case Long.MinValue => UnboundedPreceding - case x if x < 0 => ValuePreceding(-start.toInt) - case x if x > 0 => ValueFollowing(start.toInt) + case x if x < Int.MinValue => UnboundedPreceding --- End diff -- cc @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16949 cc @vanzin Take a second review please! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16977: [SPARK-19651][CORE] ParallelCollectionRDD.collect...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16977#discussion_r101954581 --- Diff: core/src/main/scala/org/apache/spark/rdd/ParallelCollectionRDD.scala --- @@ -105,6 +105,17 @@ private[spark] class ParallelCollectionRDD[T: ClassTag]( override def getPreferredLocations(s: Partition): Seq[String] = { locationPrefs.getOrElse(s.index, Nil) } + + override def collect(): Array[T] = toArray(data) + + override def take(num: Int): Array[T] = toArray(data.take(num)) + + private def toArray(data: Seq[T]): Array[T] = { +// We serialize the data and deserialize it back, to simulate the behavior of sending it to +// remote executors and collect it back. +val ser = sc.env.closureSerializer.newInstance() +ser.deserialize[Seq[T]](ser.serialize(data)).toArray + } --- End diff -- Why should we simulate like this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16972: [SPARK-19556][CORE][WIP] Broadcast data is not en...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16972#discussion_r101969639 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -73,17 +81,52 @@ private[spark] class DiskStore(conf: SparkConf, diskManager: DiskBlockManager) e } def putBytes(blockId: BlockId, bytes: ChunkedByteBuffer): Unit = { +val bytesToStore = if (serializerManager.encryptionEnabled) { + try { +val data = bytes.toByteBuffer +val in = new ByteBufferInputStream(data, true) +val byteBufOut = new ByteBufferOutputStream(data.remaining()) +val out = CryptoStreamUtils.createCryptoOutputStream(byteBufOut, conf, + serializerManager.encryptionKey.get) +try { + ByteStreams.copy(in, out) +} finally { + in.close() + out.close() +} +new ChunkedByteBuffer(byteBufOut.toByteBuffer) + } finally { +bytes.dispose() + } +} else { + bytes +} + put(blockId) { fileOutputStream => val channel = fileOutputStream.getChannel Utils.tryWithSafeFinally { -bytes.writeFully(channel) +bytesToStore.writeFully(channel) } { channel.close() } } } def getBytes(blockId: BlockId): ChunkedByteBuffer = { +val bytes = readBytes(blockId) + +val in = serializerManager.wrapForEncryption(bytes.toInputStream(dispose = true)) +new ChunkedByteBuffer(ByteBuffer.wrap(IOUtils.toByteArray(in))) --- End diff -- I see, make sense. It seems like to be much more complex than I thought in decryption. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16818#discussion_r102115145 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala --- @@ -180,16 +180,20 @@ class WindowSpec private[sql]( private def between(typ: FrameType, start: Long, end: Long): WindowSpec = { val boundaryStart = start match { case 0 => CurrentRow - case Long.MinValue => UnboundedPreceding - case x if x < 0 => ValuePreceding(-start.toInt) - case x if x > 0 => ValueFollowing(start.toInt) + case x if x < Int.MinValue => UnboundedPreceding --- End diff -- cc @hvanhovell and @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16949 cc @srowen also. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17011: [SPARK-19676][CORE] Flaky test: FsHistoryProvider...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/17011 [SPARK-19676][CORE] Flaky test: FsHistoryProviderSuite.SPARK-3697: ignore directories that cannot be read. ## What changes were proposed in this pull request? Flaky test: FsHistoryProviderSuite.SPARK-3697: ignore directories that cannot be read. ## How was this patch tested? unit test pass locally You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark SPARK-19676 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17011.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17011 commit 11e9b1c3749c1687fbcd3870104f7a535d84b722 Author: uncleGen Date: 2017-02-21T08:03:07Z Flaky test: FsHistoryProviderSuite.SPARK-3697: ignore directories that cannot be read. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16977: [SPARK-19651][CORE] ParallelCollectionRDD.collect should...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16977 build successfully in local, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17011: [SPARK-19676][CORE] Flaky test: FsHistoryProviderSuite.S...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/17011 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17011: [SPARK-19676][CORE] Flaky test: FsHistoryProviderSuite.S...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/17011 @srowen @vanzin I will test in some other platforms. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17011: [SPARK-19676][CORE] Flaky test: FsHistoryProviderSuite.S...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/17011 @srowen @vanzin I think the root cause is I test it in root user. So it always be readable no matter what access permission. IMHO, it is OK to add once extra access permission check, as the code scop is catching the AccessControlException. Maybe, I need to make the comment more clear. What's your opinion? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17011: [SPARK-19676][CORE] Flaky test: FsHistoryProvider...
Github user uncleGen closed the pull request at: https://github.com/apache/spark/pull/17011 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16949 cc @srowen and @vanzin also. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16972: [SPARK-19556][CORE][WIP] Broadcast data is not en...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16972#discussion_r102390214 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -73,17 +81,52 @@ private[spark] class DiskStore(conf: SparkConf, diskManager: DiskBlockManager) e } def putBytes(blockId: BlockId, bytes: ChunkedByteBuffer): Unit = { +val bytesToStore = if (serializerManager.encryptionEnabled) { + try { +val data = bytes.toByteBuffer +val in = new ByteBufferInputStream(data, true) +val byteBufOut = new ByteBufferOutputStream(data.remaining()) +val out = CryptoStreamUtils.createCryptoOutputStream(byteBufOut, conf, + serializerManager.encryptionKey.get) +try { + ByteStreams.copy(in, out) +} finally { + in.close() + out.close() +} +new ChunkedByteBuffer(byteBufOut.toByteBuffer) + } finally { +bytes.dispose() + } +} else { + bytes +} + put(blockId) { fileOutputStream => val channel = fileOutputStream.getChannel Utils.tryWithSafeFinally { -bytes.writeFully(channel) +bytesToStore.writeFully(channel) } { channel.close() } } } def getBytes(blockId: BlockId): ChunkedByteBuffer = { +val bytes = readBytes(blockId) + +val in = serializerManager.wrapForEncryption(bytes.toInputStream(dispose = true)) +new ChunkedByteBuffer(ByteBuffer.wrap(IOUtils.toByteArray(in))) --- End diff -- @vanzin After take some to think about it, I find it may perplex the issue if we seperate `MemoryStore` with un-encrypted data and `DiskStore`with encrypted data. As get data from remote, we will encrypt data if it is stored in memory in un-encrypted style. Besides, when we `maybeCacheDiskBytesInMemory`, we will decrypt them again. I've thought about caching disk data into memory in encrypted style, and then decrypt them lazily when used. It makes things much complicated. Maybe, it is better to keep the original style, i.e. keep data encrypted (if can) in memory and disk. We should narrow this problem. Any suggesstion? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16980: [SPARK-19617][SS]fix structured streaming restart bug
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16980 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16980: [SPARK-19617][SS]fix structured streaming restart bug
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16980 The JIRA ID is not SPARK-19645? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16691: [SPARK-19349][DStreams]improve resource ready che...
Github user uncleGen closed the pull request at: https://github.com/apache/spark/pull/16691 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16936: [SPARK-19605][DStream] Fail it if existing resour...
Github user uncleGen closed the pull request at: https://github.com/apache/spark/pull/16936 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17025: [SPARK-19690][SS] Join a streaming DataFrame with...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/17025 [SPARK-19690][SS] Join a streaming DataFrame with a batch DataFrame which has an aggregation may not work ## What changes were proposed in this pull request? `StatefulAggregationStrategy` should check logicplan is streaming or not Test code: ``` case class Record(key: Int, value: String) val df = spark.createDataFrame((1 to 100).map(i => Record(i, s"value_$i"))).groupBy("value").count val lines = spark.readStream.format("socket").option("host", "localhost").option("port", "").load val words = lines.as[String].flatMap(_.split(" ")) val wordCounts = words.join(df, "value") ``` before pr: ``` == Physical Plan == *Project [value#13, count#19L] +- *BroadcastHashJoin [value#13], [value#1], Inner, BuildRight :- *Filter isnotnull(value#13) : +- *SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, java.lang.String, true], true) AS value#13] : +- MapPartitions , obj#12: java.lang.String :+- DeserializeToObject value#5.toString, obj#11: java.lang.String : +- StreamingRelation textSocket, [value#5] +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true])) +- *HashAggregate(keys=[value#1], functions=[count(1)]) +- StateStoreSave [value#1], OperatorStateId(,0,0), Append, 0 +- *HashAggregate(keys=[value#1], functions=[merge_count(1)]) +- StateStoreRestore [value#1], OperatorStateId(,0,0) +- *HashAggregate(keys=[value#1], functions=[merge_count(1)]) +- Exchange hashpartitioning(value#1, 200) +- *HashAggregate(keys=[value#1], functions=[partial_count(1)]) +- *Project [value#1] +- *Filter isnotnull(value#1) +- LocalTableScan [key#0, value#1] ``` after pr: ``` == Physical Plan == *Project [value#13, count#19L] +- *BroadcastHashJoin [value#13], [value#1], Inner, BuildRight :- *Filter isnotnull(value#13) : +- *SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, java.lang.String, true], true) AS value#13] : +- MapPartitions , obj#12: java.lang.String :+- DeserializeToObject value#5.toString, obj#11: java.lang.String : +- StreamingRelation textSocket, [value#5] +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true])) +- *HashAggregate(keys=[value#1], functions=[count(1)]) +- Exchange hashpartitioning(value#1, 200) +- *HashAggregate(keys=[value#1], functions=[partial_count(1)]) +- *Project [value#1] +- *Filter isnotnull(value#1) +- LocalTableScan [key#0, value#1] ``` ## How was this patch tested? add new unit test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark SPARK-19690 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17025.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17025 commit 0d0c19a2f99b6417f419f72fb28c155a70dda201 Author: uncleGen Date: 2017-02-22T10:18:31Z Join a streaming DataFrame with a batch DataFrame which has an aggregation may not work --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17025: [SPARK-19690][SS] Join a streaming DataFrame with a batc...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/17025 woring on test failure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17025: [SPARK-19690][SS] Join a streaming DataFrame with...
Github user uncleGen closed the pull request at: https://github.com/apache/spark/pull/17025 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17025: [SPARK-19690][SS] Join a streaming DataFrame with a batc...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/17025 wip --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17033: [DOCS] application environment rest api
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/17033 [DOCS] application environment rest api ## What changes were proposed in this pull request? application environment rest api ## How was this patch tested? jenkins You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark doc-restapi-environment Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17033.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17033 commit 312981b22549971f4e58ad8e91bca984300efab7 Author: uncleGen Date: 2017-02-23T05:28:28Z Docs for rest api of environment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17033: [DOCS] application environment rest api
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/17033 cc @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17033: [DOCS] application environment rest api
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/17033 cc @vanzin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17033: [SPARK-16122][DOCS] application environment rest api
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/17033 @vanzin done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17052: [SPARK-19690][SS] Join a streaming DataFrame with...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/17052 [SPARK-19690][SS] Join a streaming DataFrame with a batch DataFrame which has an aggregation may not work ## What changes were proposed in this pull request? `StatefulAggregationStrategy` should check logicplan is streaming or not Test code: ``` case class Record(key: Int, value: String) val df = spark.createDataFrame((1 to 100).map(i => Record(i, s"value_$i"))).groupBy("value").count val lines = spark.readStream.format("socket").option("host", "localhost").option("port", "").load val words = lines.as[String].flatMap(_.split(" ")) val result = words.join(df, "value") ``` before pr: ``` == Physical Plan == *Project [value#13, count#19L] +- *BroadcastHashJoin [value#13], [value#1], Inner, BuildRight :- *Filter isnotnull(value#13) : +- *SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, java.lang.String, true], true) AS value#13] : +- MapPartitions , obj#12: java.lang.String :+- DeserializeToObject value#5.toString, obj#11: java.lang.String : +- StreamingRelation textSocket, [value#5] +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true])) +- *HashAggregate(keys=[value#1], functions=[count(1)]) +- StateStoreSave [value#1], OperatorStateId(,0,0), Append, 0 +- *HashAggregate(keys=[value#1], functions=[merge_count(1)]) +- StateStoreRestore [value#1], OperatorStateId(,0,0) +- *HashAggregate(keys=[value#1], functions=[merge_count(1)]) +- Exchange hashpartitioning(value#1, 200) +- *HashAggregate(keys=[value#1], functions=[partial_count(1)]) +- *Project [value#1] +- *Filter isnotnull(value#1) +- LocalTableScan [key#0, value#1] ``` after pr: ``` == Physical Plan == *Project [value#13, count#19L] +- *BroadcastHashJoin [value#13], [value#1], Inner, BuildRight :- *Filter isnotnull(value#13) : +- *SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, java.lang.String, true], true) AS value#13] : +- MapPartitions , obj#12: java.lang.String :+- DeserializeToObject value#5.toString, obj#11: java.lang.String : +- StreamingRelation textSocket, [value#5] +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true])) +- *HashAggregate(keys=[value#1], functions=[count(1)]) +- Exchange hashpartitioning(value#1, 200) +- *HashAggregate(keys=[value#1], functions=[partial_count(1)]) +- *Project [value#1] +- *Filter isnotnull(value#1) +- LocalTableScan [key#0, value#1] ``` ## How was this patch tested? add new unit test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark SPARK-19690 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17052.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17052 commit e45b06e2495e09c6d7e7a50ee509044b526bf8d0 Author: uncleGen Date: 2017-02-22T10:18:31Z Join a streaming DataFrame with a batch DataFrame which has an aggregation may not work commit 9eb57b7294f2636e370be86cf975509917fdd861 Author: uncleGen Date: 2017-02-24T06:38:41Z code clean --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/17052 cc @zsxwing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17052: [SPARK-19690][SS] Join a streaming DataFrame with...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/17052#discussion_r102922659 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -535,7 +535,8 @@ case class Range( case class Aggregate( groupingExpressions: Seq[Expression], aggregateExpressions: Seq[NamedExpression], -child: LogicalPlan) +child: LogicalPlan, +stateful: Boolean = false) extends UnaryNode { --- End diff -- `stateful` indicates if the aggregate is base on streaming or batch, resolved by `ResolveStatefulAggregate` rule --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17052: [SPARK-19690][SS] Join a streaming DataFrame with...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/17052#discussion_r102922775 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala --- @@ -393,6 +393,17 @@ case class PreprocessTableInsertion(conf: SQLConf) extends Rule[LogicalPlan] { } /** + * A rule to do check if the aggregate is stateful. + */ +class ResolveStatefulAggregate() extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case agg @ Aggregate(groupingExpressions, aggregateExpressions, child, _) + if agg.isStreaming => +Aggregate(groupingExpressions, aggregateExpressions, child, stateful = true) + } --- End diff -- resolve one aggregate, determine statefule or not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/17052 @zsxwing got it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/17052 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17080: [SPARK-19739][CORE] propagate S3 session token to...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/17080 [SPARK-19739][CORE] propagate S3 session token to cluser ## What changes were proposed in this pull request? propagate S3 session token to cluser ## How was this patch tested? existing ut You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark SPARK-19739 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17080.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17080 commit 0ae5aa73c70ae2f46a2d16087b5c55652d1e0282 Author: uncleGen Date: 2017-02-27T08:12:10Z propagate S3 session token to cluser --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17082: [SPARK-19749][SS] Name socket source with a meani...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/17082 [SPARK-19749][SS] Name socket source with a meaningful name ## What changes were proposed in this pull request? Name socket source with a meaningful name ## How was this patch tested? Jenkins You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark SPARK-19749 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17082.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17082 commit 68349facee3b33fd5975e90c74c882f3d922 Author: uncleGen Date: 2017-02-27T09:35:56Z Name socket source with a meaningful name --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17082: [SPARK-19749][SS] Name socket source with a meaningful n...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/17082 @srowen I think this is the only one souce forgotten to name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r103187577 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala --- @@ -140,7 +137,7 @@ class FileInputDStream[K, V, F <: NewInputFormat[K, V]]( * a union RDD out of them. Note that this maintains the list of files that were processed * in the latest modification time in the previous call to this method. This is because the * modification time returned by the FileStatus API seems to return times only at the - * granularity of seconds. And new files may have the same modification time as the + * granularity of seconds in HDFS. And new files may have the same modification time as the * latest modification time in the previous call to this method yet was not reported in --- End diff -- got it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16370: [SPARK-18960][SQL][SS] Avoid double reading file which i...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16370 @zsxwing Thanks for your reminder!! In some ways, we really can evade this issue, just like not use `-cp`. But this is an user-side behaviour, we can not ensure every users know and use correct ways to move data. It may confuse users if they do not know this issue. Besides, current changes is so tiny that do not have any harm to codes. So, IMHO, it is OK to add the check for protection. What do you thinkï¼ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16370: [SPARK-18960][SQL][SS] Avoid double reading file which i...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16370 @zsxwing Is there any farther feedbackï¼ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16414: [SPARK-19009][DOC] Add streaming rest api doc
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/16414 [SPARK-19009][DOC] Add streaming rest api doc ## What changes were proposed in this pull request? add streaming rest api doc ## How was this patch tested? You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark SPARK-19009 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16414.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16414 commit 14086492b289e8e2eab06899f38d22d885a0e41f Author: uncleGen Date: 2016-12-27T06:45:22Z add streaming rest api doc commit c4a5941e19ac9f3e6dc5d8cd09211f6c8b18 Author: uncleGen Date: 2016-12-27T06:46:15Z Merge branch 'master' into dev-doc-stream --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org