[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4642#issuecomment-74629902 [Test build #27621 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27621/consoleFull) for PR 4642 at commit [`d291c34`](https://github.com/apache/spark/commit/d291c347687da1576ba8fafc855d05f9da3419b1). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4602#issuecomment-74629893 [Test build #27620 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27620/consoleFull) for PR 4602 at commit [`f6907d2`](https://github.com/apache/spark/commit/f6907d2bb1c9aca1528e458a9a7fd9a3d58b9309). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] [Minor] Update the HiveContext Unittest
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4584#issuecomment-74629906 [Test build #27619 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27619/consoleFull) for PR 4584 at commit [`e5bdc3a`](https://github.com/apache/spark/commit/e5bdc3a2f1847098f3f663d6e3a336cbdaf50bce). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] [Minor] Update the HiveContext Unittest
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4584#issuecomment-74629772 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4602#issuecomment-74629760 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] [Minor] Update the HiveContext Unittest
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4584#issuecomment-74629480 [Test build #27618 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27618/consoleFull) for PR 4584 at commit [`e5bdc3a`](https://github.com/apache/spark/commit/e5bdc3a2f1847098f3f663d6e3a336cbdaf50bce). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4642#issuecomment-74628875 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27613/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4642#issuecomment-74628869 [Test build #27613 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27613/consoleFull) for PR 4642 at commit [`9be66e3`](https://github.com/apache/spark/commit/9be66e326f2fc50bb81b9f2cff82ab77714230d6). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4602#issuecomment-74628619 Thank you @yhuai , I've updated the description and rebased the code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4602#issuecomment-74628623 [Test build #27617 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27617/consoleFull) for PR 4602 at commit [`f6907d2`](https://github.com/apache/spark/commit/f6907d2bb1c9aca1528e458a9a7fd9a3d58b9309). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5785] [PySpark] narrow dependency for c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4629#issuecomment-74628437 [Test build #611 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/611/consoleFull) for PR 4629 at commit [`4d29932`](https://github.com/apache/spark/commit/4d29932172301731db904176636d530631f448ea). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5856: In Maven build script, launch Zinc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4643#issuecomment-74628174 [Test build #27616 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27616/consoleFull) for PR 4643 at commit [`717cfb0`](https://github.com/apache/spark/commit/717cfb055dcdbdf682a1a891e2413ab0d66de211). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5856: In Maven build script, launch Zinc...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4643#issuecomment-74627979 /cc @brennonyork if you want to take a quick look. I'll probably merge this soon since it's needed for some release packaging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5856: In Maven build script, launch Zinc...
GitHub user pwendell opened a pull request: https://github.com/apache/spark/pull/4643 SPARK-5856: In Maven build script, launch Zinc with more memory I've seen out of memory exceptions when trying to run many parallel builds against the same Zinc server during packaging. We should use the same increased memory settings we use for Maven itself. I tested this and confirmed that the Nailgun JVM launched with higher memory. You can merge this pull request into a Git repository by running: $ git pull https://github.com/pwendell/spark zinc-memory Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4643.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4643 commit 717cfb055dcdbdf682a1a891e2413ab0d66de211 Author: Patrick Wendell Date: 2015-02-17T07:29:39Z SPARK-5856: Launch Zinc with larger memory options. I've seen out of memory exceptions when trying to run many parallel builds against the same Zinc server during packaging. We should use the same increased memory settings we use for Maven itself. I tested this and confirmed that the Nailgun JVM launched with higher memory. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4562#discussion_r24797498 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/parquet/parquetSuites.scala --- @@ -121,13 +121,50 @@ class ParquetDataSourceOnMetastoreSuite extends ParquetMetastoreSuiteBase { override def beforeAll(): Unit = { super.beforeAll() + +sql(s""" + create table test_parquet + ( +intField INT, +stringField STRING + ) + ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' + STORED AS + INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' + OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' +""") + +val rdd = sparkContext.parallelize((1 to 10).map(i => s"""{"a":$i, "b":"str${i}"}""")) +jsonRDD(rdd).registerTempTable("jt") +sql(""" + create table test ROW FORMAT + | SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' + | STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' + | AS select * from jt""".stripMargin) + --- End diff -- Oh, i thought `STORED AS PARQUERT AS ..` is just the syntactic sugar. Unfortunately, all of the test suite are implemented in the sub project `sql`, but the `HiveShim` is in the subproject `hive` with `hive` package accessing visibility. Let's put this test in another PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4562#issuecomment-74627418 [Test build #27615 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27615/consoleFull) for PR 4562 at commit [`36978d1`](https://github.com/apache/spark/commit/36978d1835ab6e0266ad3787b33056b573fd59e8). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4634#issuecomment-74627399 Are there ever situations where `combineByKey` should be used instead of `aggregateByKey`? I tend to think of `combineByKey` as an internal API that's exposed for historical reasons. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4588 [MLLIB] [WIP] Add API for feature a...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/4460#issuecomment-74623712 Btw, for the feature type, beside continuous and categorical, do we want to make binary special? It could be treated as both continuous and categorical. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4588 [MLLIB] [WIP] Add API for feature a...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/4460#issuecomment-74623603 There are two types of `Attribute(s)`: describing a feature group (a vector column) or describing a single feature (a scalar column). For a feature group, the column name becomes the group name and individual features inside this group may have their own names. For example, we have a vector column called `user` and inside this feature group we can have features named `age` and `gender`. When we merge multiple groups into a single feature vector, e.g., in a feature vector assembler, the names are flattened like `user:age` and `user:gender`. This answers @sryza 's question about one-hot-encoding. Assume that the input column is a scalar column called "country" with categories stored in the attribute. Then OneHotEncoder will output a vector column and generate feature attributes with names like `country:US`, `country:CA`, etc. +1 on @jkbradley 's suggestion about not calling it `FeatureAttribute`. `Attribute` should be okay to describe a scalar column but we also need a name to describe a vector column, where `Attributes` may sounds a little confusing. I suggest `AttributeGroup`. We don't need to care about the `FeatureType` in `mllib.tree` in this PR. Once we have this PR merged, we can migrate the decision tree code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4172] [PySpark] Progress API in Python
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3027#issuecomment-74622928 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27614/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4172] [PySpark] Progress API in Python
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3027#issuecomment-74622925 [Test build #27614 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27614/consoleFull) for PR 3027 at commit [`d3b9253`](https://github.com/apache/spark/commit/d3b9253d3ac31f4a5178d45afaa4eb5b56eb537a). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class SparkJobInfo(namedtuple("SparkJobInfo", "jobId stageIds status")):` * `class SparkStageInfo(namedtuple("SparkStageInfo",` * `class StatusTracker(object):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4172] [PySpark] Progress API in Python
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3027#issuecomment-74622799 [Test build #27614 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27614/consoleFull) for PR 3027 at commit [`d3b9253`](https://github.com/apache/spark/commit/d3b9253d3ac31f4a5178d45afaa4eb5b56eb537a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5785] [PySpark] narrow dependency for c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4629#issuecomment-74622588 [Test build #611 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/611/consoleFull) for PR 4629 at commit [`4d29932`](https://github.com/apache/spark/commit/4d29932172301731db904176636d530631f448ea). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4642#issuecomment-74622438 [Test build #27613 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27613/consoleFull) for PR 4642 at commit [`9be66e3`](https://github.com/apache/spark/commit/9be66e326f2fc50bb81b9f2cff82ab77714230d6). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/4622#issuecomment-74622054 @mengxr okay. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5793][SQL] Add explode to Column
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/4585#issuecomment-74621954 @rxin is this pr ready to go? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5802][MLLIB] cache transformed data in ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4593 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5802][MLLIB] cache transformed data in ...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/4593#issuecomment-74621554 Merged into master and branch-1.3. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5785] [PySpark] narrow dependency for c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4629#issuecomment-74621490 [Test build #27612 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27612/consoleFull) for PR 4629 at commit [`4d29932`](https://github.com/apache/spark/commit/4d29932172301731db904176636d530631f448ea). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class Partitioner(object):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5785] [PySpark] narrow dependency for c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4629#issuecomment-74621492 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27612/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5785] [PySpark] narrow dependency for c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4629#issuecomment-74621421 [Test build #27612 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27612/consoleFull) for PR 4629 at commit [`4d29932`](https://github.com/apache/spark/commit/4d29932172301731db904176636d530631f448ea). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4620#issuecomment-74621154 [Test build #27611 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27611/consoleFull) for PR 4620 at commit [`88e4b05`](https://github.com/apache/spark/commit/88e4b05094eb64bf4d85f54c7a5e2037bbc6f06a). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4620#issuecomment-74621158 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27611/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4620#issuecomment-74621085 [Test build #27611 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27611/consoleFull) for PR 4620 at commit [`88e4b05`](https://github.com/apache/spark/commit/88e4b05094eb64bf4d85f54c7a5e2037bbc6f06a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5826][Streaming] Fix Configuration not ...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/4612#issuecomment-74619833 Yeah, I'm sure now `Configuration` is just a constructor statement not a field, so `@transient` is no needed. I have a local test which verified this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/4562#discussion_r24794112 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala --- @@ -287,7 +287,11 @@ case class ParquetRelation2( } } - parquetSchema = maybeSchema.getOrElse(readSchema()) + try { +parquetSchema = readSchema().getOrElse(maybeSchema.get) + } catch { +case e => throw new SparkException(s"Failed to find schema for ${paths.mkString(",")}", e) + } --- End diff -- Based on Cheng's comment at https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala#L194, I think that it is better to keep `maybeMetastoreSchema` and we just fix the bug for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/4562#discussion_r24793987 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala --- @@ -630,11 +635,12 @@ object ParquetRelation2 { sqlContext.conf.isParquetBinaryAsString, sqlContext.conf.isParquetINT96AsTimestamp)) } -}.reduce { (left, right) => - try left.merge(right) catch { case e: Throwable => -throw new SparkException(s"Failed to merge incompatible schemas $left and $right", e) - } -} +}.foldLeft[StructType](null) { --- End diff -- All right. Instead of putting a large code block in `Option`, how about use a temporary `val` and then use `Option` at the end of this method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4562#discussion_r24793945 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala --- @@ -287,7 +287,11 @@ case class ParquetRelation2( } } - parquetSchema = maybeSchema.getOrElse(readSchema()) + try { +parquetSchema = readSchema().getOrElse(maybeSchema.get) + } catch { +case e => throw new SparkException(s"Failed to find schema for ${paths.mkString(",")}", e) + } --- End diff -- After reading the source code, I am wondering if the `maybeMetastoreSchema` is redundant, and it probably should be always converted into `maybeSchema` when creating the `ParquetRelation2` instance? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-74617889 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27608/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-74617885 [Test build #27608 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27608/consoleFull) for PR 4231 at commit [`58c19a5`](https://github.com/apache/spark/commit/58c19a5399e09329631a455ec2e535f71e31ed97). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/4562#discussion_r24793769 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/parquet/parquetSuites.scala --- @@ -121,13 +121,50 @@ class ParquetDataSourceOnMetastoreSuite extends ParquetMetastoreSuiteBase { override def beforeAll(): Unit = { super.beforeAll() + +sql(s""" + create table test_parquet + ( +intField INT, +stringField STRING + ) + ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' + STORED AS + INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' + OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' +""") + +val rdd = sparkContext.parallelize((1 to 10).map(i => s"""{"a":$i, "b":"str${i}"}""")) +jsonRDD(rdd).registerTempTable("jt") +sql(""" + create table test ROW FORMAT + | SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' + | STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' + | AS select * from jt""".stripMargin) + --- End diff -- How about we use `if (HiveShim.version =="0.13.1")` to check the Hive version like what we did in https://github.com/apache/spark/commit/e0490e271d078aa55d7c7583e2ba80337ed1b0c4. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4562#discussion_r24793692 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala --- @@ -630,11 +635,12 @@ object ParquetRelation2 { sqlContext.conf.isParquetBinaryAsString, sqlContext.conf.isParquetINT96AsTimestamp)) } -}.reduce { (left, right) => - try left.merge(right) catch { case e: Throwable => -throw new SparkException(s"Failed to merge incompatible schemas $left and $right", e) - } -} +}.foldLeft[StructType](null) { --- End diff -- Yeah, I was trying that also, but seems using `null` is more simple, as `Option` requires some more value extracting code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/4562#discussion_r24793676 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala --- @@ -287,7 +287,11 @@ case class ParquetRelation2( } } - parquetSchema = maybeSchema.getOrElse(readSchema()) + try { +parquetSchema = readSchema().getOrElse(maybeSchema.get) + } catch { +case e => throw new SparkException(s"Failed to find schema for ${paths.mkString(",")}", e) + } --- End diff -- Also, seems we do not need `try ... catch` at here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4562#discussion_r24793419 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/parquet/parquetSuites.scala --- @@ -121,13 +121,50 @@ class ParquetDataSourceOnMetastoreSuite extends ParquetMetastoreSuiteBase { override def beforeAll(): Unit = { super.beforeAll() + +sql(s""" + create table test_parquet + ( +intField INT, +stringField STRING + ) + ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' + STORED AS + INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' + OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' +""") + +val rdd = sparkContext.parallelize((1 to 10).map(i => s"""{"a":$i, "b":"str${i}"}""")) +jsonRDD(rdd).registerTempTable("jt") +sql(""" + create table test ROW FORMAT + | SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' + | STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' + | AS select * from jt""".stripMargin) + --- End diff -- `STORED AS PARQUET` is supported since Hive 0.13, the unit test may failed in Hive 0.12 if we do that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4642#issuecomment-74617269 [Test build #27610 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27610/consoleFull) for PR 4642 at commit [`d56afc2`](https://github.com/apache/spark/commit/d56afc24178642ed13995877ce0d851175340584). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4642#issuecomment-74617271 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27610/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5825] [Spark Submit] Remove the double ...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4611#issuecomment-74617241 @srowen I've tested that both under ubuntu 12.04 and centos 6.5, `ps -f -p ...` only print the first 4096 characters of its arguments. By the way, I've also checked the `hadoop-daemon.sh` of hadoop (hadoop 2.3), seems it doesn't confirm the process name as we did in `spark-daemon.sh`. Or can we just confirm if it's a java process? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4642#issuecomment-74617207 [Test build #27610 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27610/consoleFull) for PR 4642 at commit [`d56afc2`](https://github.com/apache/spark/commit/d56afc24178642ed13995877ce0d851175340584). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/4642#discussion_r24792972 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala --- @@ -792,16 +896,73 @@ trait DataFrame extends RDDApi[Row] with Serializable { * :: Experimental :: * Adds the rows from this RDD to the specified table. * Throws an exception if the table already exists. + * @group output */ @Experimental def insertInto(tableName: String): Unit = insertInto(tableName, overwrite = false) /** * Returns the content of the [[DataFrame]] as a RDD of JSON strings. + * @group rdd */ def toJSON: RDD[String] + // JDBC Write Support + + + /** + * Save this RDD to a JDBC database at `url` under the table name `table`. + * This will run a `CREATE TABLE` and a bunch of `INSERT INTO` statements. + * If you pass `true` for `allowExisting`, it will drop any table with the + * given name; if you pass `false`, it will throw if the table already + * exists. + * @group output + */ + def createJDBCTable(url: String, table: String, allowExisting: Boolean) { --- End diff -- the impl should go into DataFrameImpl shouldn't it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/4642#discussion_r24792776 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala --- @@ -17,6 +17,10 @@ package org.apache.spark.sql +import java.sql.DriverManager + +import org.apache.spark.sql.jdbc.JDBCWriteDetails --- End diff -- nit import order --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4642#issuecomment-74616986 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27609/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4642#issuecomment-74616985 [Test build #27609 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27609/consoleFull) for PR 4642 at commit [`f004747`](https://github.com/apache/spark/commit/f004747ad0e351306a6747e44e310961f35c650c). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4642#issuecomment-74616944 [Test build #27609 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27609/consoleFull) for PR 4642 at commit [`f004747`](https://github.com/apache/spark/commit/f004747ad0e351306a6747e44e310961f35c650c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5722][SQL] fix for infer long type in p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4641#issuecomment-74616924 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/4642 [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API Cleanup / Documentation You can merge this pull request into a Git repository by running: $ git pull https://github.com/marmbrus/spark docs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4642.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4642 commit 42e2b73371468bbec63648368044bd7e77f888a4 Author: Michael Armbrust Date: 2015-02-17T04:06:13Z [SQL] Documentation / API Clean-up. commit c4a907b40e5404d944d579cfe93d5250241c2afe Author: Michael Armbrust Date: 2015-02-17T04:35:17Z fix tests commit f004747ad0e351306a6747e44e310961f35c650c Author: Michael Armbrust Date: 2015-02-17T04:37:55Z fix build --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5722][SQL] fix for infer long type in p...
GitHub user dondrake opened a pull request: https://github.com/apache/spark/pull/4641 [SPARK-5722][SQL] fix for infer long type in python similar to Java long (master branch) Corresponding fix for SPARK-5722 for the master (1.3) branch. See Pull #4521 for 1.2 version. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dondrake/spark drake_python_long Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4641.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4641 commit 79f136d5ab347bf2c9afe8d0c5c29bcdc214e634 Author: Don Drake Date: 2015-02-17T04:11:31Z SPARK-5722 fixes for inferring LongType commit 9aa0737844ae66b25487f1d5979bdf4f7a23eddd Author: Don Drake Date: 2015-02-17T04:45:04Z Merge branch 'master' into drake_python_long Conflicts: python/pyspark/sql/dataframe.py --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5853][SQL] Schema support in Row.
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4640 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4601#issuecomment-74616323 I merged this into `master` (1.4.0), `branch-1.3` (1.3.0), and `branch-1.2` (1.2.2), but did so _right_ before I noticed that there's [a comment](https://issues.apache.org/jira/browse/SPARK-5363?focusedCommentId=14323623&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14323623) on JIRA suggesting that this didn't fix the freeze. I guess I was a bit too trigger-happy here since I wanted to try to squeeze a fix in for 1.3.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5395] [PySpark] fix python process leak...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4238#issuecomment-74616043 I've merged this into `branch-1.2` (1.2.2) as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Added support for accessing secured HDFS
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2320#issuecomment-74616039 Let's close this issue. There is an alternative PR that is currently ongoing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/4562#discussion_r24791828 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -208,14 +208,14 @@ private[hive] class HiveMetastoreCatalog(hive: HiveContext) extends Catalog with ParquetRelation2( paths, Map(ParquetRelation2.METASTORE_SCHEMA -> metastoreSchema.json), - None, + Some(metastoreSchema), Some(partitionSpec))(hive)) } else { val paths = Seq(metastoreRelation.hiveQlTable.getDataLocation.toString) - LogicalRelation( -ParquetRelation2( + LogicalRelation(ParquetRelation2( paths, - Map(ParquetRelation2.METASTORE_SCHEMA -> metastoreSchema.json))(hive)) + Map(ParquetRelation2.METASTORE_SCHEMA -> metastoreSchema.json), + Some(metastoreSchema))(hive)) --- End diff -- OK, we can leave this file unchanged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5850: Remove experimental label for Scal...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4638 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/4562#discussion_r24791821 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/parquet/parquetSuites.scala --- @@ -121,13 +121,50 @@ class ParquetDataSourceOnMetastoreSuite extends ParquetMetastoreSuiteBase { override def beforeAll(): Unit = { super.beforeAll() + +sql(s""" + create table test_parquet + ( +intField INT, +stringField STRING + ) + ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' + STORED AS + INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' + OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' +""") + +val rdd = sparkContext.parallelize((1 to 10).map(i => s"""{"a":$i, "b":"str${i}"}""")) +jsonRDD(rdd).registerTempTable("jt") +sql(""" + create table test ROW FORMAT + | SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' + | STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' + | AS select * from jt""".stripMargin) + --- End diff -- Also add a test for `CREATE TABLE ... STORED AS PARQUET AS ...`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4601 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/4562#discussion_r24791812 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala --- @@ -630,11 +635,12 @@ object ParquetRelation2 { sqlContext.conf.isParquetBinaryAsString, sqlContext.conf.isParquetINT96AsTimestamp)) } -}.reduce { (left, right) => - try left.merge(right) catch { case e: Throwable => -throw new SparkException(s"Failed to merge incompatible schemas $left and $right", e) - } -} +}.foldLeft[StructType](null) { --- End diff -- How about using `None` instead of `null`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/4562#discussion_r24791802 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala --- @@ -287,7 +287,11 @@ case class ParquetRelation2( } } - parquetSchema = maybeSchema.getOrElse(readSchema()) + try { +parquetSchema = readSchema().getOrElse(maybeSchema.get) + } catch { +case e => throw new SparkException(s"Failed to find schema for ${paths.mkString(",")}", e) + } --- End diff -- How about this ``` parquetSchema = { if (maybeSchema.isDefined) { maybeSchema.get } else { (readSchema(), maybeMetastoreSchema) match { case (Some(dataSchema), _) => dataSchema case (None, Some(metastoreSchema)) => metastoreSchema case (None, None) => throw new SparkException("Failed to get the schema.") } } } ``` We first check if maybeSchema is defined. If not, we read the schema from existing data. If existing data does not exist, we are dealing with a newly created empty table and we will use maybeMetastoreSchema defined in the options. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4601#issuecomment-74615791 LGTM. I'm going to merge this into `master` (1.4.0), `branch-1.3` (1.3.0), and `branch-1.2` (1.2.2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4562#issuecomment-74615738 [Test build #27607 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27607/consoleFull) for PR 4562 at commit [`a04930b`](https://github.com/apache/spark/commit/a04930badb291e55ba4e6ba79ce781a89f827932). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4562#issuecomment-74615739 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27607/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5853][SQL] Schema support in Row.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4640#issuecomment-74615587 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27606/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5853][SQL] Schema support in Row.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4640#issuecomment-74615582 [Test build #27606 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27606/consoleFull) for PR 4640 at commit [`9c6f569`](https://github.com/apache/spark/commit/9c6f569139fcca2152c07bcef340afed2bef0778). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class GenericRowWithSchema(values: Array[Any], override val schema: StructType)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5722][SQL] fix for infer long type in p...
Github user dondrake commented on the pull request: https://github.com/apache/spark/pull/4521#issuecomment-74615073 OK, this PR, which is against branch-1.2 is now updated and I've verified that the tests are now passing. I created another branch off of the master (named drake_python_long) that has the changes needed for v1.3. I'll create another PR for that one. Please test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/4562#discussion_r24791423 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -208,14 +208,14 @@ private[hive] class HiveMetastoreCatalog(hive: HiveContext) extends Catalog with ParquetRelation2( paths, Map(ParquetRelation2.METASTORE_SCHEMA -> metastoreSchema.json), - None, + Some(metastoreSchema), Some(partitionSpec))(hive)) } else { val paths = Seq(metastoreRelation.hiveQlTable.getDataLocation.toString) - LogicalRelation( -ParquetRelation2( + LogicalRelation(ParquetRelation2( paths, - Map(ParquetRelation2.METASTORE_SCHEMA -> metastoreSchema.json))(hive)) + Map(ParquetRelation2.METASTORE_SCHEMA -> metastoreSchema.json), + Some(metastoreSchema))(hive)) --- End diff -- Oh, but if we do create table, we have to pass the metastore schema. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4588 [MLLIB] [WIP] Add API for feature a...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4460#issuecomment-74614834 This is perhaps contained in @jkbradley 's question, but how does this work with features that are represented with multiple entries in the feature vector - e.g. when we're doing a one-hot encoding. With a one-hot encoding is each category its own feature or can a feature span multiple indices in the vector? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5850: Remove experimental label for Scal...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4638#issuecomment-74614179 [Test build #27603 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27603/consoleFull) for PR 4638 at commit [`386126f`](https://github.com/apache/spark/commit/386126fb9e60e8c7a08bb098b366b78d335750be). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5850: Remove experimental label for Scal...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4638#issuecomment-74614187 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27603/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4620#issuecomment-74614095 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27604/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4620#issuecomment-74614087 [Test build #27604 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27604/consoleFull) for PR 4620 at commit [`673e4e3`](https://github.com/apache/spark/commit/673e4e3c2720ef88a4316656ba9972d06d17980c). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/4562#discussion_r24791099 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -208,14 +208,14 @@ private[hive] class HiveMetastoreCatalog(hive: HiveContext) extends Catalog with ParquetRelation2( paths, Map(ParquetRelation2.METASTORE_SCHEMA -> metastoreSchema.json), - None, + Some(metastoreSchema), Some(partitionSpec))(hive)) } else { val paths = Seq(metastoreRelation.hiveQlTable.getDataLocation.toString) - LogicalRelation( -ParquetRelation2( + LogicalRelation(ParquetRelation2( paths, - Map(ParquetRelation2.METASTORE_SCHEMA -> metastoreSchema.json))(hive)) + Map(ParquetRelation2.METASTORE_SCHEMA -> metastoreSchema.json), + Some(metastoreSchema))(hive)) --- End diff -- I think we cannot do it. See https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala#L194 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5841: remove DiskBlockManager shutdown h...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4627#issuecomment-74613715 @pwendell I suppose we could revert. A bandaid patch would be to just wrap this in a try block and ignore exceptions thrown when removing the hook (see also: `Utils.inShutdown()`). What do you think about just hotfixing in a `Try`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-74613572 [Test build #27608 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27608/consoleFull) for PR 4231 at commit [`58c19a5`](https://github.com/apache/spark/commit/58c19a5399e09329631a455ec2e535f71e31ed97). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-74612880 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27605/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-74612876 [Test build #27605 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27605/consoleFull) for PR 4231 at commit [`b7c5581`](https://github.com/apache/spark/commit/b7c558174dc15e212227d193d341dfe120cb7634). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] [Minor] Passdown the schema for Parquet ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4562#issuecomment-74610745 [Test build #27607 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27607/consoleFull) for PR 4562 at commit [`a04930b`](https://github.com/apache/spark/commit/a04930badb291e55ba4e6ba79ce781a89f827932). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5853][SQL] Schema support in Row.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4640#issuecomment-74610755 [Test build #27606 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27606/consoleFull) for PR 4640 at commit [`9c6f569`](https://github.com/apache/spark/commit/9c6f569139fcca2152c07bcef340afed2bef0778). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5853][SQL] Schema support in Row.
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/4640 [SPARK-5853][SQL] Schema support in Row. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-5853 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4640.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4640 commit 9c6f569139fcca2152c07bcef340afed2bef0778 Author: Reynold Xin Date: 2015-02-17T03:04:28Z [SPARK-5853][SQL] Schema support in Row. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Various DataFrame doc changes.
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4636 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5841: remove DiskBlockManager shutdown h...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4627#issuecomment-74609998 @JoshRosen should we revert this in 1.3 then? I might create a release candidate soon to kick off community testing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4588 [MLLIB] [WIP] Add API for feature a...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/4460#issuecomment-74609769 I like the current sketch but also want to think about it more. A few thoughts: I'm not quite clear on how the Array of Attributes in FeatureAttributes corresponds to the columns of the DataFrame. Is it one-to-one, or will Attributes be nested? (I'm basically thinking about groups of features, especially individual features grouped into vectors.) How will propagation of feature names work? Will we try to impose a standard, such as Transformers maintaining the same (or a modified) feature name whenever possible? By the way, do we want to call this "FeatureAttributes," or should we name it something like "ColumnAttributes" so it more obviously applies to other types of columns like labels, users, products, etc.? +1 for moving FeatureType from mllib.tree to attribute. It should be more general. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4081] [mllib] DatasetIndexer
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3000#issuecomment-74609759 @sryza Thanks for offering! That would be great if you have the bandwidth to work on this. I'd be happy to help review. One comment: It would be nice to be able to take advantage of FeatureAttributes in the spark.ml package, but that's a WIP right now: [https://github.com/apache/spark/pull/4460] --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-74609378 [Test build #27605 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27605/consoleFull) for PR 4231 at commit [`b7c5581`](https://github.com/apache/spark/commit/b7c558174dc15e212227d193d341dfe120cb7634). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-74609225 @jkbradley fixed! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/4231#discussion_r24789326 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -1064,9 +1045,12 @@ object DecisionTree extends Serializable with Logging { // Bins correspond to feature values, so we do not need to compute splits or bins // beforehand. Splits are constructed as needed during training. splits(featureIndex) = new Array[Split](0) - bins(featureIndex) = new Array[Bin](0) } - } +// For ordered features, bins correspond to feature values. +// For unordered categorical features, there is no need to construct the bins. +// since there is a one-to-one correspondence between the splits and the bins. +bins(featureIndex) = new Array[Bin](0) +} --- End diff -- Do you mean to move the closing brace 2 spaces ahead? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5850: Remove experimental label for Scal...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4638#issuecomment-74609032 [Test build #27603 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27603/consoleFull) for PR 4638 at commit [`386126f`](https://github.com/apache/spark/commit/386126fb9e60e8c7a08bb098b366b78d335750be). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4620#issuecomment-74609026 [Test build #27604 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27604/consoleFull) for PR 4620 at commit [`673e4e3`](https://github.com/apache/spark/commit/673e4e3c2720ef88a4316656ba9972d06d17980c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5841: remove DiskBlockManager shutdown h...
Github user MattWhelan commented on the pull request: https://github.com/apache/spark/pull/4627#issuecomment-74608953 Kinda weird that it would pass sometimes and fail others. I'll submit a fix tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5850: Remove experimental label for Scal...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4638#issuecomment-74608855 Jenkins retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5785] [PySpark] narrow dependency for c...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4629#issuecomment-74607728 LGTM overall; this is tricky logic, though, so I'll take one more pass through when I get home. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5841: remove DiskBlockManager shutdown h...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4627#issuecomment-74607180 I saw it during shutdown from a failed Jenkins test run: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/1616/console --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5841: remove DiskBlockManager shutdown h...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4627#issuecomment-74607089 Ah, how did you trigger that BTW? Yes I will help get it patched. I imagine the stop logic must be factored into a method called both by the hook and by stop, which can still unregister the hook first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org