[GitHub] spark issue #15840: [SPARK-18398][SQL] Fix nullabilities of MapObjects and o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15840 **[Test build #68607 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68607/consoleFull)** for PR 15840 at commit [`0fc36a8`](https://github.com/apache/spark/commit/0fc36a80f546a3ccfe97c70f0a94241208b9ed39). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15840: [SPARK-18398][SQL] Fix nullabilities of MapObjects and o...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/15840 I replaced nullability checking where we can use `CodegenContext.nullSafeExec()`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15780: [SPARK-18284][SQL] Make ExpressionEncoder.serializer.nul...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/15780 I see. I will look at this carefully. While I looked at this, I did not understand why only a tuple causes a failure. Do you have any insight on how a tuple is handled? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15862: [SPARK-18382][WEBUI] "run at null:-1" in UI when no file...
Github user sarutak commented on the issue: https://github.com/apache/spark/pull/15862 Merging into `master`/`branch-2.0`/`branch-2.1`. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15862: [SPARK-18382][WEBUI] "run at null:-1" in UI when ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15862 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15862: [SPARK-18382][WEBUI] "run at null:-1" in UI when no file...
Github user sarutak commented on the issue: https://github.com/apache/spark/pull/15862 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15878: [SPARK-18430] [SQL] Fixed Exception Messages when Hittin...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15878 Does this apply to all functions? If yes, I'd put a test in SQLQueryTestSuite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15863: [SPARK-18419][SQL] Fix JDBCOptions and DataSource to be ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15863 **[Test build #68606 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68606/consoleFull)** for PR 15863 at commit [`2e249ea`](https://github.com/apache/spark/commit/2e249ea06f00f814efc80524d08d17d2146cc553). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13557: [SPARK-15819][PYSPARK][ML] Add KMeanSummary in KMeans of...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/13557 ping @zjffdu Could you address @jkbradley 's comment, then I can help to get this in. It's better we can merge this into Spark 2.1. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15861: [SPARK-18294][CORE] Implement commit protocol to support...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15861 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15861: [SPARK-18294][CORE] Implement commit protocol to support...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15861 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68600/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15863: [SPARK-18419][SQL] Fix JDBCOptions and DataSource to be ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15863 Thank you for review, @cloud-fan . Actually, this PR includes two different ones. The first commit is about a bug and the others (from seconds) is potential improvement. First, I'll separate them to make each PRs more clearer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15861: [SPARK-18294][CORE] Implement commit protocol to support...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15861 **[Test build #68600 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68600/consoleFull)** for PR 15861 at commit [`2a73827`](https://github.com/apache/spark/commit/2a73827b740fb30afddd7f320a9c530c050e7e3c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15857: [SPARK-18300][SQL] Do not apply foldable propagat...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/15857#discussion_r87746263 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -449,7 +452,7 @@ object FoldablePropagation extends Rule[LogicalPlan] { stop = true j -// These 3 operators take attributes as constructor parameters, and these attributes +// These 4 operators take attributes as constructor parameters, and these attributes --- End diff -- If you think so, I have no objection for that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15878: [SPARK-18430] [SQL] Fixed Exception Messages when Hittin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15878 **[Test build #68605 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68605/consoleFull)** for PR 15878 at commit [`a43acce`](https://github.com/apache/spark/commit/a43acced66fa15228b6c32176bb9f28dacb6f024). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15878: [SPARK-18430] [SQL] Fixed Exception Messages when...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/15878 [SPARK-18430] [SQL] Fixed Exception Messages when Hitting an Invocation Exception of Function Lookup ### What changes were proposed in this pull request? When the exception is an invocation exception during function lookup, we return a useless/confusing error message: For example, ```Scala df.selectExpr("concat_ws()") ``` Below is the error message we got: ``` null; line 1 pos 0 org.apache.spark.sql.AnalysisException: null; line 1 pos 0 ``` To get the meaningful error message, we need to get the cause. The fix is exactly the same as what we did in https://github.com/apache/spark/pull/12136. After the fix, the message we got is the exception issued in the constuctor of function implementation: ``` requirement failed: concat_ws requires at least one argument.; line 1 pos 0 org.apache.spark.sql.AnalysisException: requirement failed: concat_ws requires at least one argument.; line 1 pos 0 ``` ### How was this patch tested? Added test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark functionNotFound Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15878.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15878 commit a43acced66fa15228b6c32176bb9f28dacb6f024 Author: gatorsmile Date: 2016-11-14T06:28:11Z fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15876: [SPARK-11496][GraphX][FOLLOWUP] Add param checking for r...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15876 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15876: [SPARK-11496][GraphX][FOLLOWUP] Add param checking for r...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15876 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68603/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15876: [SPARK-11496][GraphX][FOLLOWUP] Add param checking for r...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15876 **[Test build #68603 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68603/consoleFull)** for PR 15876 at commit [`85a25ad`](https://github.com/apache/spark/commit/85a25ad81312bc6d935eb89f5e886cfcd2edee72). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15877: [SPARK-18429] [SQL] implement a new Aggregate for CountM...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15877 cc @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15877: [SPARK-18429] [SQL] implement a new Aggregate for CountM...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15877 **[Test build #68604 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68604/consoleFull)** for PR 15877 at commit [`a4753e4`](https://github.com/apache/spark/commit/a4753e4fa2c8272576fbd44c4b4670947b332289). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15877: [SPARK-18429] [SQL] implement a new Aggregate for CountM...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/15877 cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15877: [SPARK-18429] [SQL] implement a new Aggregate for...
GitHub user wzhfy opened a pull request: https://github.com/apache/spark/pull/15877 [SPARK-18429] [SQL] implement a new Aggregate for CountMinSketch ## What changes were proposed in this pull request? This PR implements a new Aggregate to generate count min sketch, which is a wrapper of CountMinSketch. ## How was this patch tested? add test cases You can merge this pull request into a Git repository by running: $ git pull https://github.com/wzhfy/spark cms Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15877.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15877 commit a4753e4fa2c8272576fbd44c4b4670947b332289 Author: wangzhenhua Date: 2016-11-14T05:59:48Z Agg for CountMinSketch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15875: [SPARK-18428][DOC] Update docs for Graph.op
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15875 @rxin This has already been notified in `Summary List of Operators` : `Note that some function signatures have been simplified (e.g., default arguments and type constraints removed) and some more advanced functionality has been removed so please consult the API docs for the official list of operations.` But it's not said in `VertexRDD`. I will revert this change and and a notification in `VertexRDD`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15876: [SPARK-11496][GraphX][FOLLOWUP] Add param checking for r...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15876 **[Test build #68603 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68603/consoleFull)** for PR 15876 at commit [`85a25ad`](https://github.com/apache/spark/commit/85a25ad81312bc6d935eb89f5e886cfcd2edee72). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15875: [SPARK-18428][DOC] Update docs for Graph.op
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15875 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68602/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15857: [SPARK-18300][SQL] Do not apply foldable propagat...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15857#discussion_r87743162 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -449,7 +452,7 @@ object FoldablePropagation extends Rule[LogicalPlan] { stop = true j -// These 3 operators take attributes as constructor parameters, and these attributes +// These 4 operators take attributes as constructor parameters, and these attributes --- End diff -- shall we turn to whitelist? As I remember there have been more than 3 bugs reported on this optimizer rule. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15875: [SPARK-18428][DOC] Update docs for Graph.op
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15875 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15875: [SPARK-18428][DOC] Update docs for Graph.op
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15875 **[Test build #68602 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68602/consoleFull)** for PR 15875 at commit [`485a4e0`](https://github.com/apache/spark/commit/485a4e0efa1286063fe86f16fe2a3c63ed9fe96b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15876: [SPARK-11496][GraphX][FOLLOWUP] Add param checkin...
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/15876 [SPARK-11496][GraphX][FOLLOWUP] Add param checking for runParallelPersonalizedPageRank ## What changes were proposed in this pull request? add the param checking to keep in line with other algos ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark param_check_runParallelPersonalizedPageRank Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15876.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15876 commit 85a25ad81312bc6d935eb89f5e886cfcd2edee72 Author: Zheng RuiFeng Date: 2016-11-14T06:03:20Z create pr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15863: [SPARK-18419][SQL] Fix JDBCOptions and DataSource...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15863#discussion_r87743041 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala --- @@ -303,4 +303,13 @@ class JDBCWriteSuite extends SharedSQLContext with BeforeAndAfter { assert(e.contains("If 'partitionColumn' is specified then 'lowerBound', 'upperBound'," + " and 'numPartitions' are required.")) } + + test("SPARK-18419 JDBCOption keys should be case-insensitive") { --- End diff -- is it consistent with other options like `JsonOtion`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15875: [SPARK-18428][DOC] Update docs for Graph.op
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15875 So I think what we should really say is that this is just an incomplete list, and then link to the API docs. It is not great to duplicate API docs in two places. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15780: [SPARK-18284][SQL] Make ExpressionEncoder.serializer.nul...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15780 > this optimization leads to failure of the program Seq((0, 1.0, 3.0), (2, 2.0, 5.0)).toDF("id", "v").printSchema This is not an optimization, we should use `serializer.schema` as encoder schema. Can you look into it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15875: [SPARK-18428][DOC] Update docs for Graph.op
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15875 **[Test build #68602 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68602/consoleFull)** for PR 15875 at commit [`485a4e0`](https://github.com/apache/spark/commit/485a4e0efa1286063fe86f16fe2a3c63ed9fe96b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15874: [Spark-18408] API Improvements for LSH
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15874: [Spark-18408] API Improvements for LSH
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68601/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15875: [SPARK-18428][DOC] Update docs for Graph.op
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/15875 [SPARK-18428][DOC] Update docs for Graph.op ## What changes were proposed in this pull request? Update `Summary List of Operators` and `VertexRDDs` to include some missing APIs. ## How was this patch tested? No tests, only docs is modified You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark update_graphop_doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15875.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15875 commit 6b04f956e603c34e1d4d30e33d835350fc283918 Author: Zheng RuiFeng Date: 2016-11-14T04:59:49Z create pr commit 655b161e9bd1b51f419ea3fd53a924ab1efbd45c Author: Zheng RuiFeng Date: 2016-11-14T05:10:43Z add pickRandomVertex commit 485a4e0efa1286063fe86f16fe2a3c63ed9fe96b Author: Zheng RuiFeng Date: 2016-11-14T05:44:34Z add methods in VertexRDD --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15874: [Spark-18408] API Improvements for LSH
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #68601 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68601/consoleFull)** for PR 15874 at commit [`adbbefe`](https://github.com/apache/spark/commit/adbbefe1777db8fb85a0af59c11e5840d3bc91ee). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15865: [SPARK-18420][SPARK][BUILD]Fix the compile errors caused...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15865 @ConeyLiu Let us please make the title `[SPARK-18420][BUILD] Fix the errors caused by lint check in Java` for this PR and Jira you opened. I think for the one below: ``` [ERROR] src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java:[133] (coding) NoFinalizer: Avoid using finalizer method. ``` we should disable the lint for the lines because it seems a required one. I guess you should disable the lines or work around this. Could we then remove `hasher` in `HiveHasherSuite.java` if it is not used? FWIW, I just checked that this private instance is not referenced in `HiveHasherSuite.java` for sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15868: [SPARK-18413][SQL] Control the number of JDBC connection...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15868 Thank you for review, @lichenglin . For 1, Jenkins test that, too. For 2, sure! It sounds reasonable and better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13909 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13909 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68594/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13909 **[Test build #68594 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68594/consoleFull)** for PR 13909 at commit [`3ef6fd4`](https://github.com/apache/spark/commit/3ef6fd451a801e2c8bb7183913f3efe6405341c6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15874: [Spark-18408] API Improvements for LSH
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #68601 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68601/consoleFull)** for PR 15874 at commit [`adbbefe`](https://github.com/apache/spark/commit/adbbefe1777db8fb85a0af59c11e5840d3bc91ee). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15800: [SPARK-18334] MinHash should use binary hash dist...
Github user Yunni closed the pull request at: https://github.com/apache/spark/pull/15800 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15800: [SPARK-18334] MinHash should use binary hash distance
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15800 OK. Abandon this PR since we are making MultiProbe NN Search and `hashDistance` private. Related changes are included in #15874 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15868: [SPARK-18413][SQL] Control the number of JDBC connection...
Github user lichenglin commented on the issue: https://github.com/apache/spark/pull/15868 I'm sorry,my network is too bad to download dependencies from maven rep for building spark. I have seen your PR and here is some suggestions: 1. I notice the PR has used CaseInsensitiveMap,becasuse "numPartitions" is still used by DataframeReader,we'd better to check whether jdbcreading work well. 2."Testing the Total Connections" may be difficulty,because when the tasks finish the connection will be closed.I have better way to check the numPartitions if works, we can watch the last stage's task on spark ui,there should be "numPartitions" task in the last step. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15219: [SPARK-14098][SQL] Generate Java code to build CachedCol...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/15219 @sameeragarwal could you please take a closer look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15874: Spark 18408 yunn api improvements
GitHub user Yunni opened a pull request: https://github.com/apache/spark/pull/15874 Spark 18408 yunn api improvements ## What changes were proposed in this pull request? (1) Change output schema to `Array of Vector` instead of `Vectors` (2) Use `numHashTables` as the dimension of Array and `numHashFunctions` as the dimension of Vector (3) Rename `RandomProjection` to `BucketedRandomProjectionLSH`, `MinHash` to `MinHashLSH` (4) Make `randUnitVectors/randCoefficients` private (5) Make Multi-Probe NN Search and `hashDistance` private for future discussion ## How was this patch tested? Related unit tests are modified to make sure the performance of LSH are ensured, and the outputs of the APIs meets expectation. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Yunni/spark SPARK-18408-yunn-api-improvements Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15874.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15874 commit 559c09904538012b70bcb3493b8bc287dd855b2d Author: Yun Ni Date: 2016-11-07T21:30:32Z [SPARK-18334] MinHash should use binary hash distance commit 517a97bd16f3771d9abbcdf54957a011f5f87adc Author: Yunni Date: 2016-11-08T06:15:24Z Remove misleading documentation as requested commit b546dbd207a04e73bde097f25cae8c927322c2ae Author: Yun Ni Date: 2016-11-08T18:54:09Z Add warning for multi-probe in MinHash commit a3cd9281d1fb8d969cb8bdd32ae8c5b9c373ad3b Author: Yun Ni Date: 2016-11-08T18:55:49Z Merge branch 'SPARK-18334-yunn-minhash-bug' of https://github.com/Yunni/spark into SPARK-18334-yunn-minhash-bug commit c8243c7def8c270072edd5889cea7fd02677b44f Author: Yun Ni Date: 2016-11-09T23:11:20Z (1) Fix documentation as CR suggested (2) Fix typo in unit test commit 6aac8b343c5ea3a91b8517a2d3f47ed055ece9ad Author: Yun Ni Date: 2016-11-09T23:22:27Z Fix typo in unit test commit 98707436ea8a90599fd8615a47afff3bf29a3ae6 Author: Yun Ni Date: 2016-11-14T04:25:17Z [SPARK-18408] API Improvements for LSH commit 0e9250be0142691e9e085ed1260f83f8ed40f5e4 Author: Yun Ni Date: 2016-11-14T04:38:44Z (1) Fix description for numHashFunctions (2) Make numEntries in MinHash private commit adbbefe1777db8fb85a0af59c11e5840d3bc91ee Author: Yun Ni Date: 2016-11-14T04:43:30Z Add assertion for hashFunction in BucketedRandomProjectionLSHSuite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15780: [SPARK-18284][SQL] Make ExpressionEncoder.serializer.nul...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/15780 ping @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15851: [SPARK-18412][SPARKR][ML] Fix exception for some ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15851 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15851: [SPARK-18412][SPARKR][ML] Fix exception for some SparkR ...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15851 Merged into master and branch-2.1. Thanks for review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15861: [SPARK-18294][CORE] Implement commit protocol to support...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15861 **[Test build #3423 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3423/consoleFull)** for PR 15861 at commit [`ff4ce8c`](https://github.com/apache/spark/commit/ff4ce8c67912e4ce8df1d78defbf22f0f23b875d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15861: [SPARK-18294][CORE] Implement commit protocol to support...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/15861 Looks the failed test suite `SparkListenerWithClusterSuite` is not writing anything so I wonder whether it is related to what we have changed here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15861: [SPARK-18294][CORE] Implement commit protocol to support...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15861 **[Test build #68600 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68600/consoleFull)** for PR 15861 at commit [`2a73827`](https://github.com/apache/spark/commit/2a73827b740fb30afddd7f320a9c530c050e7e3c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15849 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15849 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68599/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15849 **[Test build #68599 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68599/consoleFull)** for PR 15849 at commit [`e6a5367`](https://github.com/apache/spark/commit/e6a5367da19c341d69a60adda902596a1bf3d91e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15829: [SPARK-18379][SQL] Make the parallelism of parallelParti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15829 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15829: [SPARK-18379][SQL] Make the parallelism of parallelParti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15829 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68591/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15873: [SPARK-18427][DOC] Update docs of mllib.KMeans
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15873 **[Test build #68597 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68597/consoleFull)** for PR 15873 at commit [`c8d1cc9`](https://github.com/apache/spark/commit/c8d1cc983b678a5b79a0af7e1fb8089eff0e4183). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15873: [SPARK-18427][DOC] Update docs of mllib.KMeans
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15873 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15873: [SPARK-18427][DOC] Update docs of mllib.KMeans
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15873 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68597/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15829: [SPARK-18379][SQL] Make the parallelism of parallelParti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15829 **[Test build #68591 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68591/consoleFull)** for PR 15829 at commit [`f6cf77f`](https://github.com/apache/spark/commit/f6cf77f284a73a8aed5ae1349a0697a44d09972b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15849 **[Test build #68599 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68599/consoleFull)** for PR 15849 at commit [`e6a5367`](https://github.com/apache/spark/commit/e6a5367da19c341d69a60adda902596a1bf3d91e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15873: [SPARK-18427][DOC] Update docs of mllib.KMeans
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15873 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15873: [SPARK-18427][DOC] Update docs of mllib.KMeans
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15873 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68596/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15873: [SPARK-18427][DOC] Update docs of mllib.KMeans
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15873 **[Test build #68596 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68596/consoleFull)** for PR 15873 at commit [`4c0bb2e`](https://github.com/apache/spark/commit/4c0bb2ed3dbef4086c788377f4d294f1034974e7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15849 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15849 **[Test build #68598 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68598/consoleFull)** for PR 15849 at commit [`d1bb4a7`](https://github.com/apache/spark/commit/d1bb4a7aba3ec459f74d8da3e62ae3e84df7fa83). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15849 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68598/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15871: [SPARK-17116][Pyspark] Allow parameters to be {string,va...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15871 cc @holdenk --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15871: [SPARK-17116][Pyspark] Allow parameters to be {string,va...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15871 I think we need a test here maybe and update the argument description if this change is legitimate. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15849 **[Test build #68598 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68598/consoleFull)** for PR 15849 at commit [`d1bb4a7`](https://github.com/apache/spark/commit/d1bb4a7aba3ec459f74d8da3e62ae3e84df7fa83). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15873: [SPARK-18427][DOC] Remove 'runs' from docs of mllib.KMea...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15873 **[Test build #68597 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68597/consoleFull)** for PR 15873 at commit [`c8d1cc9`](https://github.com/apache/spark/commit/c8d1cc983b678a5b79a0af7e1fb8089eff0e4183). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15871: [SPARK-17116][Pyspark] Allow parameters to be {st...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15871#discussion_r87734522 --- Diff: python/pyspark/ml/base.py --- @@ -59,6 +59,12 @@ def fit(self, dataset, params=None): return [self.fit(dataset, paramMap) for paramMap in params] elif isinstance(params, dict): if params: +if type(params.keys()[0]) is str: --- End diff -- Shouldn't we just use `isinstance` instead? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15873: [SPARK-18427][DOC] Remove 'runs' from docs of mllib.KMea...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15873 **[Test build #68596 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68596/consoleFull)** for PR 15873 at commit [`4c0bb2e`](https://github.com/apache/spark/commit/4c0bb2ed3dbef4086c788377f4d294f1034974e7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15873: [SPARK-18427][DOC] Remove 'runs' from docs of mll...
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/15873 [SPARK-18427][DOC] Remove 'runs' from docs of mllib.KMeans ## What changes were proposed in this pull request? Remove 'runs' from docs of mllib.KMeans ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark update_doc_mllib_kmeans Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15873.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15873 commit 4c0bb2ed3dbef4086c788377f4d294f1034974e7 Author: Zheng RuiFeng Date: 2016-11-14T03:11:49Z create pr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15849 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13837: [SPARK-16126] [SQL] Better Error Message When usi...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13837#discussion_r87733959 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -322,6 +323,9 @@ case class DataSource( val equality = sparkSession.sessionState.conf.resolver StructType(schema.filterNot(f => partitionColumns.exists(equality(_, f.name }.orElse { + if (allPaths.isEmpty && !format.isInstanceOf[TextFileFormat]) { --- End diff -- Hi @gatorsmile, would this be better if we explain here text data source is excluded because text datasource always uses a schema consisting of a string field if the schema is not explicitly given? BTW, should we maybe change `text.TextFileFormat` to `TextFileFormat ` https://github.com/apache/spark/pull/13837/files#diff-7a6cb188d2ae31eb3347b5629a679cecR139 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15849 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68595/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15849 **[Test build #68595 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68595/consoleFull)** for PR 15849 at commit [`0abc93a`](https://github.com/apache/spark/commit/0abc93aa38d036e403f843fd6e41320da1ce5471). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15849 **[Test build #68595 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68595/consoleFull)** for PR 15849 at commit [`0abc93a`](https://github.com/apache/spark/commit/0abc93aa38d036e403f843fd6e41320da1ce5471). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r87732022 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriterConfig.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.internal.io + +import scala.reflect.ClassTag + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.mapred.JobConf +import org.apache.hadoop.mapreduce._ + +import org.apache.spark.util.{SerializableConfiguration, SerializableJobConf, Utils} + +/** + * Interface for create output format/committer/writer used during saving an RDD using a Hadoop + * OutputFormat (both from the old mapred API and the new mapreduce API) + * + * Notes: + * 1. Implementations should throw [[IllegalArgumentException]] when wrong hadoop API is + *referenced; + * 2. Implementations must be serializable, as the instance instantiated on the driver + *will be used for tasks on executors; + * 3. Implementations should have a constructor with exactly one argument: + *(conf: SerializableConfiguration) or (conf: SerializableJobConf). + */ +abstract class SparkHadoopWriterConfig[K, V: ClassTag] extends Serializable { --- End diff -- Yes, it's an abstraction that conceal the differences between using the `mapred` and the `mapreduce` API. But I failed in figuring out a concise name for it, any suggestion? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r87731801 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriterConfig.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.internal.io + +import scala.reflect.ClassTag + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.mapred.JobConf +import org.apache.hadoop.mapreduce._ + +import org.apache.spark.util.{SerializableConfiguration, SerializableJobConf, Utils} + +/** + * Interface for create output format/committer/writer used during saving an RDD using a Hadoop + * OutputFormat (both from the old mapred API and the new mapreduce API) + * + * Notes: + * 1. Implementations should throw [[IllegalArgumentException]] when wrong hadoop API is + *referenced; + * 2. Implementations must be serializable, as the instance instantiated on the driver + *will be used for tasks on executors; + * 3. Implementations should have a constructor with exactly one argument: + *(conf: SerializableConfiguration) or (conf: SerializableJobConf). + */ +abstract class SparkHadoopWriterConfig[K, V: ClassTag] extends Serializable { + + // -- + // Create JobContext/TaskAttemptContext + // -- + + def createJobContext(jobTrackerId: String, jobId: Int): JobContext + + def createTaskAttemptContext( + jobTrackerId: String, + jobId: Int, + splitId: Int, + taskAttemptId: Int): TaskAttemptContext + + // -- + // Create committer + // -- + + def createCommitter(jobId: Int): HadoopMapReduceCommitProtocol + + // -- + // Create writer + // -- + + def initWriter(taskContext: TaskAttemptContext, splitId: Int): Unit + + def write(pair: (K, V)): Unit + + def closeWriter(taskContext: TaskAttemptContext): Unit + + // -- + // Create OutputFormat + // -- + + def initOutputFormat(jobContext: JobContext): Unit + + // -- + // Verify hadoop config + // -- + + def assertConf(): Unit + + def checkOutputSpecs(jobContext: JobContext): Unit + +} + +object SparkHadoopWriterConfig { + + /** + * Instantiates a SparkHadoopWriterConfig using the given configuration. + */ + def instantiate[K, V](className: String, conf: Configuration)( --- End diff -- Reasonable, I'll address this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15849 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68593/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15849 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15849 **[Test build #68593 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68593/consoleFull)** for PR 15849 at commit [`f8d7e16`](https://github.com/apache/spark/commit/f8d7e16ee6fe7714eb7416dac6f8dbdd284c2d92). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15829: [SPARK-18379][SQL] Make the parallelism of parallelParti...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/15829 cc @yhuai @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15852: Spark-18187 [SQL] CompactibleFileStreamLog should not us...
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/15852 @uncleGen @tcondie thanks for working on this. My major concern is this approach might disallow changing the `compactInterval` once there were at least two compact files. Should we disallow it? Or as a alternative, what do you think of the approach taken in https://github.com/apache/spark/pull/15828? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13909 **[Test build #68594 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68594/consoleFull)** for PR 13909 at commit [`3ef6fd4`](https://github.com/apache/spark/commit/3ef6fd451a801e2c8bb7183913f3efe6405341c6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r87731086 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriterConfig.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.internal.io + +import scala.reflect.ClassTag + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.mapred.JobConf +import org.apache.hadoop.mapreduce._ + +import org.apache.spark.util.{SerializableConfiguration, SerializableJobConf, Utils} + +/** + * Interface for create output format/committer/writer used during saving an RDD using a Hadoop + * OutputFormat (both from the old mapred API and the new mapreduce API) + * + * Notes: + * 1. Implementations should throw [[IllegalArgumentException]] when wrong hadoop API is + *referenced; + * 2. Implementations must be serializable, as the instance instantiated on the driver + *will be used for tasks on executors; + * 3. Implementations should have a constructor with exactly one argument: + *(conf: SerializableConfiguration) or (conf: SerializableJobConf). + */ +abstract class SparkHadoopWriterConfig[K, V: ClassTag] extends Serializable { --- End diff -- If I understand this correctly, this is basically an abstraction that makes both the old mapred API and the new mapreduce API work, isn't it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r87731058 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriterConfig.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.internal.io + +import scala.reflect.ClassTag + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.mapred.JobConf +import org.apache.hadoop.mapreduce._ + +import org.apache.spark.util.{SerializableConfiguration, SerializableJobConf, Utils} + +/** + * Interface for create output format/committer/writer used during saving an RDD using a Hadoop + * OutputFormat (both from the old mapred API and the new mapreduce API) + * + * Notes: + * 1. Implementations should throw [[IllegalArgumentException]] when wrong hadoop API is + *referenced; + * 2. Implementations must be serializable, as the instance instantiated on the driver + *will be used for tasks on executors; + * 3. Implementations should have a constructor with exactly one argument: + *(conf: SerializableConfiguration) or (conf: SerializableJobConf). + */ +abstract class SparkHadoopWriterConfig[K, V: ClassTag] extends Serializable { --- End diff -- One thing that confuses me is why this is named "Config"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15849 @koeninger want to review this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15861: [SPARK-18294][CORE] Implement commit protocol to support...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15861 **[Test build #3423 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3423/consoleFull)** for PR 15861 at commit [`ff4ce8c`](https://github.com/apache/spark/commit/ff4ce8c67912e4ce8df1d78defbf22f0f23b875d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15861: [SPARK-18294][CORE] Implement commit protocol to support...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15861 cc @mridulm too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15872: [SPARK-18426][Structured Streaming] Python Docume...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15872 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15872: [SPARK-18426][Structured Streaming] Python Documentation...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15872 Thanks - merging in master/branch-2.1/branch-2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15849 **[Test build #68593 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68593/consoleFull)** for PR 15849 at commit [`f8d7e16`](https://github.com/apache/spark/commit/f8d7e16ee6fe7714eb7416dac6f8dbdd284c2d92). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15849 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68592/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org