[GitHub] spark issue #17581: [SPARK-20248][ SQL]Spark SQL add limit parameter to enha...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17581 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75819/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17581: [SPARK-20248][ SQL]Spark SQL add limit parameter to enha...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17581 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17581: [SPARK-20248][ SQL]Spark SQL add limit parameter to enha...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17581 **[Test build #75819 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75819/testReport)** for PR 17581 at commit [`f8f85a3`](https://github.com/apache/spark/commit/f8f85a3c70e00d53195c95c7d884d6d8ef6a469a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17630: [SPARK-20318][SQL] Use Catalyst type for min/max in Colu...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/17630 > are we storing UTF8Strings directly in the catalog for statistics? That doesn't make sense ... if we are not, then we are not using internal types. @rxin By "in the catalog for statistics", do you mean statistics in metastore? We still use external type for statistics in the metastore. What this pr changed were the types of min/max in `ColumnStat`. So we don't have this problem here. > My concern is that the internal types are specific to the physical execution path and stats/CBO are independent of that. We can in the future change the internal data types without changing CBO. Since literal values are internal, stats/CBO need to be consistent with them to do estimation. So it's hard for CBO to be independent of that. If the internal types are changed in the future, what we can do is to change the conversion contract defined in `ColumnStat` based on the changes on internal types. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17623: [SPARK-20292][SQL] Clean up string representation of Tre...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17623 cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17623: [SPARK-20292][SQL][WIP] Clean up string representation o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17623 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17623: [SPARK-20292][SQL][WIP] Clean up string representation o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17623 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75818/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17623: [SPARK-20292][SQL][WIP] Clean up string representation o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17623 **[Test build #75818 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75818/testReport)** for PR 17623 at commit [`a21675d`](https://github.com/apache/spark/commit/a21675d37d66a0fbf1a15a7e714bfe596814431d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17642: [SPARK-20343][BUILD] Force Avro 1.7.7 in sbt build to re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17642 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75817/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17642: [SPARK-20343][BUILD] Force Avro 1.7.7 in sbt build to re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17642 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17642: [SPARK-20343][BUILD] Force Avro 1.7.7 in sbt build to re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17642 **[Test build #75817 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75817/testReport)** for PR 17642 at commit [`1ae57f2`](https://github.com/apache/spark/commit/1ae57f2e569462734d89d9c8c77e765859ce8393). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 @sethah Perhaps it's hard to compare R with Spark's behavior, since many factors involved. I'd like to read R GBM's code, and verify consistency of both side's design on split criteria. Is it OK? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r111656245 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -104,6 +104,18 @@ class RandomForestSuite extends SparkFunSuite with MLlibTestSparkContext { assert(splits.distinct.length === splits.length) } +// SPARK-16957: Use weighted midpoints for split values. +{ + val fakeMetadata = new DecisionTreeMetadata(1, 0, 0, 0, +Map(), Set(), +Array(2), Gini, QuantileStrategy.Sort, +0, 0, 0.0, 0, 0 + ) + val featureSamples = Array(0, 1, 0, 0, 1, 0, 1, 1).map(_.toDouble) + val splits = RandomForest.findSplitsForContinuousFeature(featureSamples, fakeMetadata, 0) + assert(splits === Array(0.5)) --- End diff -- add new case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r111656240 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -126,9 +138,10 @@ class RandomForestSuite extends SparkFunSuite with MLlibTestSparkContext { Array(3), Gini, QuantileStrategy.Sort, 0, 0, 0.0, 0, 0 ) - val featureSamples = Array(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 4, 5).map(_.toDouble) + val featureSamples = Array(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 4, 5) +.map(_.toDouble) val splits = RandomForest.findSplitsForContinuousFeature(featureSamples, fakeMetadata, 0) - assert(splits === Array(2.0, 3.0)) + assert(splits === Array(2.0625, 3.5)) --- End diff -- done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r111656235 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -112,9 +124,9 @@ class RandomForestSuite extends SparkFunSuite with MLlibTestSparkContext { Array(5), Gini, QuantileStrategy.Sort, 0, 0, 0.0, 0, 0 ) - val featureSamples = Array(1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3).map(_.toDouble) + val featureSamples = Array(1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3).map(_.toDouble) val splits = RandomForest.findSplitsForContinuousFeature(featureSamples, fakeMetadata, 0) - assert(splits === Array(1.0, 2.0)) + assert(splits === Array(1.8, 2.2)) --- End diff -- done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17637: [SPARK-20337][CORE] Support upgrade a jar depende...
Github user wangyum closed the pull request at: https://github.com/apache/spark/pull/17637 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17581: [SPARK-20248][ SQL]Spark SQL add limit parameter to enha...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17581 **[Test build #75819 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75819/testReport)** for PR 17581 at commit [`f8f85a3`](https://github.com/apache/spark/commit/f8f85a3c70e00d53195c95c7d884d6d8ef6a469a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17642: [SPARK-20343][BUILD] Force Avro 1.7.7 in sbt buil...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17642#discussion_r111654148 --- Diff: project/SparkBuild.scala --- @@ -448,7 +448,9 @@ object DockerIntegrationTests { */ object DependencyOverrides { lazy val settings = Seq( -dependencyOverrides += "com.google.guava" % "guava" % "14.0.1") +dependencyOverrides ++= Set( --- End diff -- Using `Seq` produces an error as below: ``` [error] .../spark/project/SparkBuild.scala:451: No implicit for Append.Values[Set[sbt.ModuleID], Seq[sbt.ModuleID]] found, [error] so Seq[sbt.ModuleID] cannot be appended to Set[sbt.ModuleID] [error] dependencyOverrides ++= Seq( [error] ^ [error] one error found ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17642: [SPARK-20343][BUILD] Force Avro 1.7.7 in sbt build to re...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17642 cc @srowen and @vanzin. I think apparently it is a similar issue with [SPARK-11538](https://issues.apache.org/jira/browse/SPARK-11538). Could you check if it makes sense? I think this is going to resolve the problem as a safe workaround. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17642: [SPARK-20343][BUILD] Force Avro 1.7.7 in sbt buil...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17642#discussion_r111653875 --- Diff: project/SparkBuild.scala --- @@ -448,7 +448,9 @@ object DockerIntegrationTests { */ object DependencyOverrides { lazy val settings = Seq( -dependencyOverrides += "com.google.guava" % "guava" % "14.0.1") +dependencyOverrides ++= Set( --- End diff -- It seems requires `Set`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17623: [SPARK-20292][SQL][WIP] Clean up string representation o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17623 **[Test build #75818 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75818/testReport)** for PR 17623 at commit [`a21675d`](https://github.com/apache/spark/commit/a21675d37d66a0fbf1a15a7e714bfe596814431d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17642: [SPARK-20343][BUILD] Force Avro 1.7.7 in sbt build to re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17642 **[Test build #75817 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75817/testReport)** for PR 17642 at commit [`1ae57f2`](https://github.com/apache/spark/commit/1ae57f2e569462734d89d9c8c77e765859ce8393). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17642: [SPARK-20343][BUILD] Force Avro 1.7.7 in sbt buil...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/17642 [SPARK-20343][BUILD] Force Avro 1.7.7 in sbt build to resolve build failure in SBT Hadoop 2.6 master on Jenkins ## What changes were proposed in this pull request? Currently, the build fails on the SBT master build but only for Hadoop 2.6. It seems the dependency resolution can be different. ``` [error] /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.6/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala:123: value createDatumWriter is not a member of org.apache.avro.generic.GenericData [error] writerCache.getOrElseUpdate(schema, GenericData.get.createDatumWriter(schema)) [error] ``` https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/2770/consoleFull ## How was this patch tested? I tried many ways but I was unable to reproduce this in my local. Sean also tries the way I did but he was also unable to reproduce this. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-20343 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17642.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17642 commit 1ae57f2e569462734d89d9c8c77e765859ce8393 Author: hyukjinkwonDate: 2017-04-15T00:25:33Z Explicitly override Avro version --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17596: [SPARK-12837][CORE] Do not send the accumulator n...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17596#discussion_r111653739 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -537,3 +539,27 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex } } } + +class NumRowGroupsAcc extends AccumulatorV2[Integer, Integer] { --- End diff -- oh. This approach looks good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17149 @cnauroth Thank you so much for your help. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...
Github user cnauroth commented on the issue: https://github.com/apache/spark/pull/17149 @HyukjinKwon , nice to meet you! I see I got notified here for a bit of Hadoop `Path` knowledge, and particularly on Windows. > Is it okay to use both URIs and local file paths for the input string for org.apache.hadoop.fs.Path in general (when they are expected to be unescaped)? Yes, this is correct. Specifically on the topic of Windows, `Path` has special case logic for handling a Windows-specific local file path. (This logic is only triggered if it detects the runtime OS is Windows.) On Windows, I expect a call like `new Path("C:\\foo\\bar").toUri` to yield a correct `URI` pointing at that local file path, and further calling `toString` yields a correct `String` representation of the path. Hadoop code often needs to take a path string that is possibly a relative path and pass it through `Path` to make it absolute and escape it according to Hadoop code expectations. The standard invocation for doing this in the Hadoop code is `new Path(...).toUri();` or `new Path(...).toUri().toString();`. This works across all platforms. I don't have any knowledge of the Spark codebase, but I see this patch uses similar invocations, so I expect it's good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17416: [SPARK-20075][CORE][WIP] Support classifier, packaging i...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/17416 @srowen , I finally had some time to look into this and I was able to get the correct jar on the classpath. The fix was to use the code you had in the previous commit for `SparkSubmit.addDependenciesToIvy` so that the extraAttributes is set with `dd.addDependencyArtifact` and doesn't need to be in the `ModuleRevisionId` - so it was my bad advice that probably screwed this up :< The reason is that when the `DefaultDependencyDescriptor` gets resolved in DefaultModuleDescriptor.java, if there are no artifacts defined, it adds 1 but does not copy over the `extraAttributes`, that's why the resolve report doesn't know about it. But if there are artifacts (which come from `addDependencyArtifact`) then the `extraAttributes` are carried over. wow, this is really confusing - hopefully this makes sense, see the code below `BasicResolver. getDependency(DependencyDescriptor dd, ResolveData data)` calls `DefaultModuleDescriptor.newDefaultInstance` ```java public static DefaultModuleDescriptor newDefaultInstance(ModuleRevisionId mrid, DependencyArtifactDescriptor[] artifacts) { DefaultModuleDescriptor moduleDescriptor = new DefaultModuleDescriptor(mrid, "release", null, true); moduleDescriptor.addConfiguration(new Configuration(DEFAULT_CONFIGURATION)); if (artifacts != null && artifacts.length > 0) { for (int i = 0; i < artifacts.length; i++) { moduleDescriptor.addArtifact(DEFAULT_CONFIGURATION, new MDArtifact(moduleDescriptor, artifacts[i].getName(), artifacts[i].getType(), artifacts[i].getExt(), artifacts[i].getUrl(), artifacts[i].getExtraAttributes())); } } else { moduleDescriptor.addArtifact(DEFAULT_CONFIGURATION, new MDArtifact(moduleDescriptor, mrid.getName(), "jar", "jar")); } moduleDescriptor.setLastModified(System.currentTimeMillis()); return moduleDescriptor; } ``` I think that some other code you added in the second commit was also required, which is maybe why it didn't work for you in the first place, but give it another try. Here is the output from my test, looks like it should work now: ``` bin/spark-submit --packages edu.stanford.nlp:stanford-corenlp:jar:models:3.4.1 -v examples/src/main/python/pi.py Using properties file: /home/bryan/git/spark/conf/spark-defaults.conf Adding default property: spark.history.fs.logDirectory=/home/bryan/git/spark/logs/history Adding default property: spark.eventLog.dir=/home/bryan/git/spark/logs/history Adding default property: drill.enable_unsafe_memory_access=false Warning: Ignoring non-spark config property: drill.enable_unsafe_memory_access=false Parsed arguments: master local[*] deployMode null executorMemory null executorCores null totalExecutorCores null propertiesFile /home/bryan/git/spark/conf/spark-defaults.conf driverMemorynull driverCores null driverExtraClassPathnull driverExtraLibraryPath null driverExtraJavaOptions null supervise false queue null numExecutorsnull files null pyFiles null archivesnull mainClass null primaryResource file:/home/bryan/git/spark/examples/src/main/python/pi.py namepi.py childArgs [] jarsnull packagesedu.stanford.nlp:stanford-corenlp:jar:models:3.4.1 packagesExclusions null repositoriesnull verbose true Spark properties used, including those specified through --conf and those from the properties file /home/bryan/git/spark/conf/spark-defaults.conf: (spark.history.fs.logDirectory,/home/bryan/git/spark/logs/history) (spark.eventLog.dir,/home/bryan/git/spark/logs/history) Ivy Default Cache set to: /home/bryan/.ivy2/cache The jars for the packages stored in: /home/bryan/.ivy2/jars :: loading settings :: url = jar:file:/home/bryan/git/spark/assembly/target/scala-2.11/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml edu.stanford.nlp#stanford-corenlp added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default] found edu.stanford.nlp#stanford-corenlp;3.4.1 in central downloading
[GitHub] spark issue #17506: [SPARK-20189][DStream] Fix spark kinesis testcases to re...
Github user yssharma commented on the issue: https://github.com/apache/spark/pull/17506 Thanks @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17641: [SPARK-20329][SQL] Make timezone aware expression withou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17641 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17641: [SPARK-20329][SQL] Make timezone aware expression withou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17641 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75816/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17641: [SPARK-20329][SQL] Make timezone aware expression withou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17641 **[Test build #75816 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75816/testReport)** for PR 17641 at commit [`0654409`](https://github.com/apache/spark/commit/0654409677dc8f569950fb54eb1d1d1239cdf870). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ResolveTimeZone(conf: SQLConf) extends Rule[LogicalPlan] ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17640: [SPARK-17608][SPARKR]:Long type has incorrect serializat...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/17640 cc @felixcheung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17640: [SPARK-17608][SPARKR]:Long type has incorrect serializat...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/17640 I will some bound check and error handling. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17640: [SPARK-17608][SPARKR]:Long type has incorrect serializat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17640 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17640: [SPARK-17608][SPARKR]:Long type has incorrect serializat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75815/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17640: [SPARK-17608][SPARKR]:Long type has incorrect serializat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17640 **[Test build #75815 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75815/testReport)** for PR 17640 at commit [`03b82ac`](https://github.com/apache/spark/commit/03b82ac19dcbe17a70d9e45790dd24210b6d4f07). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17582: [SPARK-20239][Core] Improve HistoryServer's ACL mechanis...
Github user ajbozarth commented on the issue: https://github.com/apache/spark/pull/17582 Been following this but haven't had time to do a proper review, but @tgravescs since you brought up the UI vs API thing, as of 2.0 the UI gets it's list from the API so that's where the security has to be handled. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17641: [SPARK-20329][SQL] Make timezone aware expression withou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17641 **[Test build #75816 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75816/testReport)** for PR 17641 at commit [`0654409`](https://github.com/apache/spark/commit/0654409677dc8f569950fb54eb1d1d1239cdf870). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17641: [SPARK-20329][SQL] Make timezone aware expression...
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/17641 [SPARK-20329][SQL] Make timezone aware expression without timezone unresolved. ## What changes were proposed in this pull request? TBD ## How was this patch tested? TBD You can merge this pull request into a Git repository by running: $ git pull https://github.com/hvanhovell/spark SPARK-20329 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17641.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17641 commit 0654409677dc8f569950fb54eb1d1d1239cdf870 Author: Herman van HovellDate: 2017-04-14T20:23:32Z Make timezone aware expression without timezone unresolved. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/17540 @cloud-fan, could you have another look at this? There are a few new changes: * withNewExecutionId now warns instead of throwing an exception, but still throws exceptions if spark.testing is defined * SQLExecution.nested allows nested execution IDs without test failures or warnings. This is needed because several places will nest when withNewExceptionId is called at the high-level operations. CacheTableCommand is an example. Over the last week, I've fixed nearly all of the tests. The remaining failure, SQLExecutionSuite.concurrent query execution (SPARK-10548), is fixed in maven, but fails in SBT. The problem is that exceptions are now only thrown if `spark.testing` is defined, and for some reason adding it to the test's SparkSession or SparkContext doesn't work on Jenkins. Because this test is reproducing a case that now will never happen for two reasons (the original multi-threading fix and throw only if spark.testing), I'd like to simply remove it. Let me know what you think about that. Other changes to look at: * `SQLMetricsSuite.save metrics` started failing because there is a nested execution ID. This is because there are two SQL physical plans. The first, `ExecutedCommandExec` links in a logical plan that is turned into a second physical plan *at runtime*. This means that the inner plan can't report the metrics that will be collected when analyzing the outer plan because it doesn't exist yet. The long-term solution is to fix `ExecutedCommandExec`, but for now this accepts any metrics created by the inner plan. * `StreamExecution` wasn't calling `withNewExecutionId` and was caught by the new assertion. I added the call around the entire execution so that there isn't a new SQL execution for every batch. This required creating a special `queryExecution` to pass in. * `DataFrameCallbackSuite` had to be updated to include commands that were previously not registered in the SQL tab. The new SQL executions are for dropping tables, so the result looks more correct than before. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17568 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75814/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17568 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17568 **[Test build #75814 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75814/testReport)** for PR 17568 at commit [`b47c1f4`](https://github.com/apache/spark/commit/b47c1f4e3f22febc2955c61c38ef794c6ecce158). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class EliminateMapObjectsSuite extends PlanTest ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17087: [SPARK-19372][SQL] Fix throwing a Java exception at df.f...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/17087 @marmbrus could you please take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17640: [SPARK-17608][SPARKR]:Long type has incorrect serializat...
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/17640 Overall, this looks like a sensible approach to a messy problem. You might want to think about adding some overflow handling to the SQL-->R translation. That is, if a Dataframe contains a `bigint` value that cannot be expressed as a `Double`, it would be safer to convert that value to NaN instead of stripping the lower-order bits off the `bigint`. The `bigint` column in the source Dataframe could hold a unique identifier or a hash value. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17540 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75813/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17540 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17540 **[Test build #75813 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75813/testReport)** for PR 17540 at commit [`30fa4fc`](https://github.com/apache/spark/commit/30fa4fc8603e68f9295fc65e573f96140bb04ac6). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17568: [SPARK-20254][SQL] Remove unnecessary data conver...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17568#discussion_r111614833 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetPrimitiveSuite.scala --- @@ -253,4 +256,27 @@ class DatasetPrimitiveSuite extends QueryTest with SharedSQLContext { checkDataset(Seq(PackageClass(1)).toDS(), PackageClass(1)) } + test("SPARK-20254: Remove unnecessary data conversion for primitive array") { --- End diff -- Thank you for pointing it out. I implemented non-e2e tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17568: [SPARK-20254][SQL] Remove unnecessary data conver...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17568#discussion_r111614744 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/objects.scala --- @@ -96,3 +99,32 @@ object CombineTypedFilters extends Rule[LogicalPlan] { } } } + +/** + * Removes MapObjects when the following conditions are satisfied + * 1. Mapobject(e) where e is lambdavariable(), which means types for input output + * are primitive types + * 2. no custom collection class specified + * representation of data item. For example back to back map operations. + */ +object EliminateMapObjects extends Rule[LogicalPlan] { + private def convertDataTypeToArrayClass(dt: DataType): Class[_] = dt match { +case IntegerType => classOf[Array[Int]] +case LongType => classOf[Array[Long]] +case DoubleType => classOf[Array[Double]] +case FloatType => classOf[Array[Float]] +case ShortType => classOf[Array[Short]] +case ByteType => classOf[Array[Byte]] +case BooleanType => classOf[Array[Boolean]] + } + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case _ @ DeserializeToObject(_ @ Invoke( --- End diff -- Yes, I can do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17568: [SPARK-20254][SQL] Remove unnecessary data conver...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17568#discussion_r111614710 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -368,6 +369,8 @@ case class NullPropagation(conf: SQLConf) extends Rule[LogicalPlan] { case EqualNullSafe(Literal(null, _), r) => IsNull(r) case EqualNullSafe(l, Literal(null, _)) => IsNull(l) + case a @ AssertNotNull(c, _) if !c.nullable => c --- End diff -- Good cattch. done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...
Github user mgummelt commented on the issue: https://github.com/apache/spark/pull/17530 > Right now the PR doesn't set that, so it needs to be set under the user's HADOOP_CONF even though it had no real effect. That probably should be changed. Yep, same problem I'm seeing. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17621: [SPARK-6227][MLLIB][PYSPARK] Implement PySpark wrappers ...
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/17621 Thanks @MLnick ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17640: [SPARK-17608][SPARKR]:Long type has incorrect serializat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17640 **[Test build #75815 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75815/testReport)** for PR 17640 at commit [`03b82ac`](https://github.com/apache/spark/commit/03b82ac19dcbe17a70d9e45790dd24210b6d4f07). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17640: [SPARK-17608][SPARKR]:Long type has incorrect ser...
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/17640 [SPARK-17608][SPARKR]:Long type has incorrect serialization/deserialization ## What changes were proposed in this pull request? `bigint` is not supported in schema and the serialization is not `Double`. Add `bigint` support in schema and serialized and deserialized as `Double`. This fix is orthogonal to the precision problem in https://issues.apache.org/jira/browse/SPARK-12360 ## How was this patch tested? Add a new unit test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangmiao1981/spark summary Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17640.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17640 commit 03b82ac19dcbe17a70d9e45790dd24210b6d4f07 Author: wm...@hotmail.comDate: 2017-04-14T17:43:35Z add bigint support --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17568 **[Test build #75814 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75814/testReport)** for PR 17568 at commit [`b47c1f4`](https://github.com/apache/spark/commit/b47c1f4e3f22febc2955c61c38ef794c6ecce158). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17630: [SPARK-20318][SQL] Use Catalyst type for min/max in Colu...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17630 Wait - are we storing UTF8Strings directly in the catalog for statistics? That doesn't make sense ... if we are not, then we are not using internal types. In that case we should document clearly what's happening. My concern is that the internal types are specific to the physical execution path and stats/CBO are independent of that. We can in the future change the internal data types without changing CBO, and completely screw ourselves. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17633 Then it should work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17623: [SPARK-20292][SQL][WIP] Clean up string representation o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17623 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17623: [SPARK-20292][SQL][WIP] Clean up string representation o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17623 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75811/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17623: [SPARK-20292][SQL][WIP] Clean up string representation o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17623 **[Test build #75811 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75811/testReport)** for PR 17623 at commit [`c83396e`](https://github.com/apache/spark/commit/c83396e0906e781a493648d70067a91880f9cf8f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17637: [SPARK-20337][CORE] Support upgrade a jar dependency and...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/17637 This does not work. Any classes that have already been loaded from the old jar will not be unloaded. So you're going to end up with really odd issues when two classes from different jars don't agree with each other. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17623: [SPARK-20292][SQL][WIP] Clean up string representation o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17623 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17623: [SPARK-20292][SQL][WIP] Clean up string representation o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17623 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75810/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17623: [SPARK-20292][SQL][WIP] Clean up string representation o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17623 **[Test build #75810 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75810/testReport)** for PR 17623 at commit [`5c057ba`](https://github.com/apache/spark/commit/5c057ba7eb2a68a276387988d5c3eb6419a0cba8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17540 **[Test build #75813 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75813/testReport)** for PR 17540 at commit [`30fa4fc`](https://github.com/apache/spark/commit/30fa4fc8603e68f9295fc65e573f96140bb04ac6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13440 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75812/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13440 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13440 **[Test build #75812 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75812/testReport)** for PR 13440 at commit [`6762a18`](https://github.com/apache/spark/commit/6762a18dd558b61d5b292787115d6e8c8768ed12). * This patch **fails to generate documentation**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13440 **[Test build #75812 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75812/testReport)** for PR 13440 at commit [`6762a18`](https://github.com/apache/spark/commit/6762a18dd558b61d5b292787115d6e8c8768ed12). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/17633 > Does this work for non-Hive tables? This is geared towards Hive partitioned tables. If we have another system that prunes table partitions based on a string-ified pruning predicate I'm unaware. Do you have one in mind? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/13440 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17639: [SPARK-19716][SQL][follow-up] UnresolvedMapObjects shoul...
Github user koertkuipers commented on the issue: https://github.com/apache/spark/pull/17639 @cloud-fan thanks for doing this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17639: [SPARK-19716][SQL][follow-up] UnresolvedMapObjects shoul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17639 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17639: [SPARK-19716][SQL][follow-up] UnresolvedMapObjects shoul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17639 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75809/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17639: [SPARK-19716][SQL][follow-up] UnresolvedMapObjects shoul...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17639 **[Test build #75809 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75809/testReport)** for PR 17639 at commit [`bb0a14a`](https://github.com/apache/spark/commit/bb0a14a391e7316a1d5caee900f295c9487c6e8a). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class UnresolvedMapObjects(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17623: [SPARK-20292][SQL][WIP] Clean up string representation o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17623 **[Test build #75811 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75811/testReport)** for PR 17623 at commit [`c83396e`](https://github.com/apache/spark/commit/c83396e0906e781a493648d70067a91880f9cf8f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17623: [SPARK-20292][SQL][WIP] Clean up string representation o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17623 **[Test build #75810 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75810/testReport)** for PR 17623 at commit [`5c057ba`](https://github.com/apache/spark/commit/5c057ba7eb2a68a276387988d5c3eb6419a0cba8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17582: [SPARK-20239][Core] Improve HistoryServer's ACL mechanis...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17582 Sorry again the wording above and all the different configs are a bit confusing to me as to what the real issues are here. >Here actually has two list of acls, one is controlled by spark.acls.enabled, if user "A" is not added to this acl list, then user "A" cannot see the app list (//api/v1/applications). But if this app is run by user "A", then user "A" could still see the details of app, like (//api/v1/applications//jobs), this acl is controlled by "spark.history.ui.acls.enabled", and user "A" is automatically in the acl list (because of run by him). You are mixing things here. You say that if user "A" is not added to acl list he cannot see the app list. This is broken then and I assume only applies to rest api not UI? But I'm not sure what that has to do with your second sentence, if user "A" ran the app then of course he can see the details of the app, that is intended. I'm not sure what that has to do with the first issue? If you don't have spark.history.ui.acls.enabled then it is up to what the user set. Generally in any secure environment you should set spark.history.ui.acls.enabled=true and it should enforce acls no matter what user set. It might help for you to describe these in terms of configs. Which exact configs are set on the history server and which exact configs are set on the application side and which exact apis are being used (Rest vs Web UI). so all the urls you list are the REST API, is this only an issue with rest api or the actual web UI as well? It sounds like things are definitely broke there but I'm not sure it requires changing the configs just fixing the things that are broken. Its supposed to be that if spark.history.ui.acls.enable is enabled it doesn't matter what the setting of spark.acls.enable is, acls should always be enforced on the history server. see the description: https://spark.apache.org/docs/latest/monitoring.html Certain UI's don't have information that should be sensitive. I thought the list of applications was one of those things, all users should be able to see the entire list of applications. Nothing sensitive there, but once you look at the application details that should be acl'd. If someone added something sensitive then it should be protected or it should be moved from that page. My opinions on your response to @vanzin 1. No, there shouldn't be sensitive information there and many times a user is looking for a job run by say a headless user or other user. I guess you could filter only the jobs that user has acls to but that makes it more complicated. Do you have a concrete reason it should be protected? Note that this follow how other Hadoop UI's work. 2. That is just broken, event log should be protected. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17592: [SPARK-20243][TESTS] DebugFilesystem.assertNoOpenStreams...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/17592 I cherry picked this into branch-2.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17623: [SPARK-20292][SQL][WIP] Clean up string representation o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17623 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75808/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17623: [SPARK-20292][SQL][WIP] Clean up string representation o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17623 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17623: [SPARK-20292][SQL][WIP] Clean up string representation o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17623 **[Test build #75808 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75808/testReport)** for PR 17623 at commit [`0be2db8`](https://github.com/apache/spark/commit/0be2db809b28d7e9debbc319145d2928201798c2). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17639: [SPARK-19716][SQL][follow-up] UnresolvedMapObjects shoul...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17639 cc @koertkuipers --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17639: [SPARK-19716][SQL][follow-up] UnresolvedMapObjects shoul...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17639 **[Test build #75809 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75809/testReport)** for PR 17639 at commit [`bb0a14a`](https://github.com/apache/spark/commit/bb0a14a391e7316a1d5caee900f295c9487c6e8a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17639: [SPARK-19716][SQL][follow-up] UnresolvedMapObject...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/17639 [SPARK-19716][SQL][follow-up] UnresolvedMapObjects should always be serializable ## What changes were proposed in this pull request? In https://github.com/apache/spark/pull/17398 we introduced `UnresolvedMapObjects` as a placeholder of `MapObjects`. Unfortunately `UnresolvedMapObjects` is not serializable as its `function` may reference Scala `Type` which is not serializable. Ideally this is fine, as we will never serialize and send unresolved expressions to executors. However users may accidentally do this, e.g. mistakenly reference an encoder instance when implementing `Aggregator`, we should fix it so that it's just a performance issue(more network traffic) and should not fail the query. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark minor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17639.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17639 commit bb0a14a391e7316a1d5caee900f295c9487c6e8a Author: Wenchen FanDate: 2017-04-14T13:15:36Z UnresolvedMapObjects should always be serializable --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17637: [SPARK-20337][CORE] Support upgrade a jar dependency and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17637 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17637: [SPARK-20337][CORE] Support upgrade a jar dependency and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17637 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75806/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17637: [SPARK-20337][CORE] Support upgrade a jar dependency and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17637 **[Test build #75806 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75806/testReport)** for PR 17637 at commit [`eb4cb86`](https://github.com/apache/spark/commit/eb4cb8653565fcb66d7c7222cc7b765383bfce45). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17568 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17568 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75807/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17568 **[Test build #75807 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75807/testReport)** for PR 17568 at commit [`1515947`](https://github.com/apache/spark/commit/1515947d7a8497bb1f9365d40e1534dff44f0f04). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17533: [WIP][SPARK-20219] Schedule tasks based on size of input...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17533 I think the failed unit test can be fixed in https://github.com/apache/spark/pull/17634 and https://github.com/apache/spark/pull/17603 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17636: [SPARK-20334][SQL] Return a better error message when co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17636 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17636: [SPARK-20334][SQL] Return a better error message when co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17636 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75805/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17636: [SPARK-20334][SQL] Return a better error message when co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17636 **[Test build #75805 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75805/testReport)** for PR 17636 at commit [`c4e1a01`](https://github.com/apache/spark/commit/c4e1a010c16d753360c6bc576518d71820de1243). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17374: [SPARK-19019][PYTHON][BRANCH-2.0] Fix hijacked `collecti...
Github user jbloom22 commented on the issue: https://github.com/apache/spark/pull/17374 Our users (https://hail.is) are running into this bug. Will the backport be merged soon? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17364 thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17623: [SPARK-20292][SQL][WIP] Clean up string representation o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17623 **[Test build #75808 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75808/testReport)** for PR 17623 at commit [`0be2db8`](https://github.com/apache/spark/commit/0be2db809b28d7e9debbc319145d2928201798c2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17477: [SPARK-18692][BUILD][DOCS] Test Java 8 unidoc build on J...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17477 The build should use 1.7.7, yes. Hadoop pulls in 1.7.4, but, it does so in 2.6 and 2.7. And the SBT and Maven builds seem to get that right as intended because the POM directly overrides this version. (The only component on a different Avro is the Flume module but that's not the problem here.) I also can't reproduce this locally. It builds fine for me too with the same commands. I am open to workarounds, though I also don't know what will be sufficient because we can't reproduce it. I am pretty sure the Avro 1.7.4 dependency is coming from `hadoop-common` but no idea why only in 2.6. sbt-unidoc has a newer version, 0.4.0, but updating it requires other changes I don't know how to make and I don't see a reason to think it's the problem. I wonder if the problem is that `core` does not directly declare a dependency on `org.apache.avro:avro` but uses it. If so then adding this might do the trick in the core POM: ``` org.apache.avro avro ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org