[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17346 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17346 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76408/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17346 **[Test build #76408 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76408/testReport)** for PR 17346 at commit [`49ee54d`](https://github.com/apache/spark/commit/49ee54d7a644b916e5c1c2c58f4cd1e011c7abc6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17838: [SPARK-20567] Lazily bind in GenerateExec
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17838 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17836: [SPARK-20566][SQL] ColumnVector should support `appendFl...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/17836 cc @michal-databricks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17838: [SPARK-20567] Lazily bind in GenerateExec
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/17838 LGTM - merging to master/2.2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17770 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17770 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76407/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17770 **[Test build #76407 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76407/testReport)** for PR 17770 at commit [`a076d83`](https://github.com/apache/spark/commit/a076d83cfc9e87f8234eda639957d663d87eaac4). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class AnalysisBarrier(child: LogicalPlan) extends LeafNode ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add a Bucketizer that can bin mul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17819 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add a Bucketizer that can bin mul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17819 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76406/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add a Bucketizer that can bin mul...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17819 **[Test build #76406 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76406/testReport)** for PR 17819 at commit [`6ff9c79`](https://github.com/apache/spark/commit/6ff9c7998688107a835875ea41e6fe9576a1558c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17832: [SPARK-20557][SQL] Support for db column type TIM...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/17832#discussion_r114472869 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -223,6 +223,9 @@ object JdbcUtils extends Logging { case java.sql.Types.STRUCT=> StringType case java.sql.Types.TIME => TimestampType case java.sql.Types.TIMESTAMP => TimestampType + case java.sql.Types.TIMESTAMP_WITH_TIMEZONE +=> TimestampType + case -101 => TimestampType --- End diff -- Hi, @JannikArndt Could you add a comment describing about `-101` here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17838: [SPARK-20567] Lazily bind in GenerateExec
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17838 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76405/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17838: [SPARK-20567] Lazily bind in GenerateExec
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17838 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17838: [SPARK-20567] Lazily bind in GenerateExec
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17838 **[Test build #76405 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76405/testReport)** for PR 17838 at commit [`7c86b0e`](https://github.com/apache/spark/commit/7c86b0e997e87bce77cdf6064975ff5cab245c08). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17736 > This also seems unreasonable to me because so many backslashes are confusing and it seems to me that no other systems have similar behavior Like I said before, it's because java string literal plays a role here, try to use `"""string"""` and it can be much better. If we wanna compare with other systems, we should compare the SQL shell. Migration is a real problem, but it's also a problem for string literals. We can add a config to fallback to old SQL parser behavior. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17100 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17100 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76401/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17100 **[Test build #76401 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76401/testReport)** for PR 17100 at commit [`4ac8143`](https://github.com/apache/spark/commit/4ac8143cf63f5b4777a66f236824671b0bb05933). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17346: [SPARK-19965][SS] DataFrame batch reader may fail...
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/17346#discussion_r114468906 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala --- @@ -36,20 +37,27 @@ import org.apache.spark.util.SerializableConfiguration * A [[FileIndex]] that generates the list of files to process by recursively listing all the * files present in `paths`. * - * @param rootPaths the list of root table paths to scan + * @param rootPathsSpecified the list of root table paths to scan (some of which might be + * filtered out later) * @param parameters as set of options to control discovery * @param partitionSchema an optional partition schema that will be use to provide types for the *discovered partitions */ class InMemoryFileIndex( sparkSession: SparkSession, -override val rootPaths: Seq[Path], +rootPathsSpecified: Seq[Path], parameters: Map[String, String], partitionSchema: Option[StructType], fileStatusCache: FileStatusCache = NoopCache) extends PartitioningAwareFileIndex( sparkSession, parameters, partitionSchema, fileStatusCache) { + // Filter out streaming metadata dirs or files such as "/.../_spark_metadata" (the metadata dir) + // or "/.../_spark_metadata/0" (a file in the metadata dir). `rootPathsSpecified` might contain + // such streaming metadata dir or files, e.g. when after globbing "basePath/*" where "basePath" + // is the output of a streaming query. + override val rootPaths = rootPathsSpecified.filterNot(FileStreamSink.ancestorIsMetadataDirectory) --- End diff -- Yea your are quite correct! They will be filted by `InMemoryFileIndex.shouldFilterOut`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17346: [SPARK-19965][SS] DataFrame batch reader may fail...
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/17346#discussion_r114468833 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala --- @@ -145,6 +147,41 @@ class FileStreamSinkSuite extends StreamTest { } } + test("partitioned writing and batch reading with 'basePath'") { +val inputData = MemoryStream[Int] +val ds = inputData.toDS() + +val outputDir = Utils.createTempDir(namePrefix = "stream.output").getCanonicalPath --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17346: [SPARK-19965][SS] DataFrame batch reader may fail...
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/17346#discussion_r114468801 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala --- @@ -53,6 +53,29 @@ object FileStreamSink extends Logging { case _ => false } } + + /** + * Returns true if the path is the metadata dir or its ancestor is the metadata dir. + * E.g.: + * - ancestorIsMetadataDirectory(/.../_spark_metadata) => true + * - ancestorIsMetadataDirectory(/.../_spark_metadata/0) => true + * - ancestorIsMetadataDirectory(/a/b/c) => false + */ + def ancestorIsMetadataDirectory(path: Path): Boolean = { +require(path.isAbsolute, s"$path is required to be absolute") --- End diff -- switched to `makeQualified` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17346: [SPARK-19965][SS] DataFrame batch reader may fail...
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/17346#discussion_r114468821 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala --- @@ -53,6 +53,29 @@ object FileStreamSink extends Logging { case _ => false } } + + /** + * Returns true if the path is the metadata dir or its ancestor is the metadata dir. + * E.g.: + * - ancestorIsMetadataDirectory(/.../_spark_metadata) => true + * - ancestorIsMetadataDirectory(/.../_spark_metadata/0) => true + * - ancestorIsMetadataDirectory(/a/b/c) => false + */ + def ancestorIsMetadataDirectory(path: Path): Boolean = { +require(path.isAbsolute, s"$path is required to be absolute") +var currentPath = path +var finished = false +while (!finished) { --- End diff -- fixed. good point! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17346 **[Test build #76408 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76408/testReport)** for PR 17346 at commit [`49ee54d`](https://github.com/apache/spark/commit/49ee54d7a644b916e5c1c2c58f4cd1e011c7abc6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17251 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17251 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76403/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17251 **[Test build #76403 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76403/testReport)** for PR 17251 at commit [`2150ce5`](https://github.com/apache/spark/commit/2150ce552a7a02d656329761e04a7fcb38e5e648). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17836: [SPARK-20566][SQL] ColumnVector should support `appendFl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17836 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17836: [SPARK-20566][SQL] ColumnVector should support `appendFl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17836 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76404/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17836: [SPARK-20566][SQL] ColumnVector should support `appendFl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17836 **[Test build #76404 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76404/testReport)** for PR 17836 at commit [`d979d0f`](https://github.com/apache/spark/commit/d979d0f482758d2d763fc88f456388ba9caf9274). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17736 For the regex, currently users need to write something like `df.filter("value rlike '^x20[x20-x23]+$'")`. This seems unreasonable to me and it is this patch tries to fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add a Bucketizer that can bin mul...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17819 **[Test build #76406 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76406/testReport)** for PR 17819 at commit [`6ff9c79`](https://github.com/apache/spark/commit/6ff9c7998688107a835875ea41e6fe9576a1558c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17770 **[Test build #76407 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76407/testReport)** for PR 17770 at commit [`a076d83`](https://github.com/apache/spark/commit/a076d83cfc9e87f8234eda639957d663d87eaac4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17736 yea, much clearer now, and the string literal in Spark 2.0 looks more reasonable. For the regex, I think it's unfair to compare `df.filter("value rlike '^\\x20[\\x20-\\x23]+$'")` with `df.filter($"value".rlike("^\\x20[\\x20-\\x23]+$"))`, because java string literal also plays a role here. Think about a SQL shell, users can write `SELECT ... WHERE value RLIKE '^\\x20[\\x20-\\x23]+$'`, which is consistent with the java version, so I think the current SQL parser is corrected. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17836: [SPARK-20566][SQL] ColumnVector should support `appendFl...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17836 Thank you for review, @kiszk --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17836: [SPARK-20566][SQL] ColumnVector should support `appendFl...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/17836 LGTM cc: @sameeragarwal @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17698: [SPARK-20403][SQL][Documentation]Modify the instructions...
Github user 10110346 commented on the issue: https://github.com/apache/spark/pull/17698 @rxin, would you help me review it again?thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17838: [SPARK-20567] Lazily bind in GenerateExec
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17838 **[Test build #76405 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76405/testReport)** for PR 17838 at commit [`7c86b0e`](https://github.com/apache/spark/commit/7c86b0e997e87bce77cdf6064975ff5cab245c08). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17736 @cloud-fan I've updated the example. Please check if it is better for you. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17838: [SPARK-20567] Lazily bind in GenerateExec
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/17838 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17838: [SPARK-20567] Lazily bind in GenerateExec
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17838 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76399/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17838: [SPARK-20567] Lazily bind in GenerateExec
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17838 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17838: [SPARK-20567] Lazily bind in GenerateExec
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17838 **[Test build #76399 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76399/testReport)** for PR 17838 at commit [`7c86b0e`](https://github.com/apache/spark/commit/7c86b0e997e87bce77cdf6064975ff5cab245c08). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17735: [SPARK-20441][SPARK-20432][SS] Within the same streaming...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17735 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17735: [SPARK-20441][SPARK-20432][SS] Within the same streaming...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17735 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76398/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17735: [SPARK-20441][SPARK-20432][SS] Within the same streaming...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17735 **[Test build #76398 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76398/testReport)** for PR 17735 at commit [`63ed28a`](https://github.com/apache/spark/commit/63ed28ac1f9062b0f7d88f91a8eada601df6f6e9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17794: [SPARK-20518][CORE]Supplement the new blockidsuite unit ...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/17794 @jerryshao I have been updated it. please review it again. thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17833: [SPARK-20558][CORE] clear InheritableThreadLocal ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17833 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17833: [SPARK-20558][CORE] clear InheritableThreadLocal variabl...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17833 merging to master/2.2/2.1/2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17833: [SPARK-20558][CORE] clear InheritableThreadLocal variabl...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17833 > I think it only cleans localProperties in the current thread. localProperties overrides childValue and always clones a new Properties for child threads. Yea, that's true. If some child threads are already there and cloned the local properties, we can't clean them. But we can avoid future child threads to inherit this local properties, which can reduce the memory footprint a lot if users create new `SparkContext` and stop it, and repeat this many times. Anyway, I'll merge this PR and see if it can fix the flaky test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17100 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76402/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17100 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17100 **[Test build #76402 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76402/testReport)** for PR 17100 at commit [`766a033`](https://github.com/apache/spark/commit/766a033f19602dbd7da6eff96947236c8c0fd2a2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use midpoints for split valu...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r114457816 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -1037,7 +1042,8 @@ private[spark] object RandomForest extends Logging { // makes the gap between currentCount and targetCount smaller, // previous value is a split threshold. if (previousGap < currentGap) { -splitsBuilder += valueCounts(index - 1)._1 +// perhaps weighted mean will be used later, see SPARK-16957 and Github PR 17556. --- End diff -- Comments like these tend to just get left around and sit there forever. Unless we file a _new_ JIRA that intends to decide on future behavior, I would like to remove this comment altogether. Otherwise, no one will follow up on this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17835: [SPARK-20557] [SQL] Improve the error message for...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/17835#discussion_r114460827 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -230,7 +230,9 @@ object JdbcUtils extends Logging { // scalastyle:on } -if (answer == null) throw new SQLException("Unsupported type " + sqlType) +if (answer == null) { + throw new SQLException("Unsupported type " + JDBCType.valueOf(sqlType).getName) --- End diff -- Hi, @gatorsmile . Then, it seems that we need to consider `IllegalArgumentException` from `JDBCType.valueOf` then. Can we do that here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17723: [SPARK-20434][YARN][CORE] Move kerberos delegation token...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17723 @vanzin: > @mridulm the main argument for just dealing with Hadoop security is that it's been sufficient since the inception of Spark. I have never seen anyone ask for integration with any other type of system. Support for long running applications (which require token renewal, etc) was added much later in spark - in 1.4 IIRC : https://issues.apache.org/jira/browse/SPARK-5342 > Would it make you more comfortable if the new API were kept private[spark]? It would limit extensibility in the case of Mesos (would be restricted to built-in providers), but would free us from this discussion and allow progress to happen. If we are not exposing an api for spark core, while maintaining backward compatibility - I am fine with the change. (Please see [1] below too) Either we move implementations from yarn to core - or a new module which yarn and mesos depends on (if that helps). @mgummelt: > @mridulm You keep mentioning hadoop-security as if it's a library. It's not. UserGroupInformation and Credentials, for example, are are security classes in hadoop-commons, which core already depends on. So this coupling already exists. Is your concern that we're increasing this coupling? When I refer to hadoop-security, I do not mean a maven package - but use of `org.apache.hadoop.security` for handling authentication/authorization. hadoop-common contains a collection of libraries & utilities, bundled together for convenience; `security` package implements classes in context of security design of hadoop. [1] If we want to base security in spark on hadoop-security, and think it is sufficient for our current & anticipated needs - we should be explicit about the (design) dependency. We should solicit opinion about it in dev@ and proceed based on feedback from from the list (this PR discussion might not be followed by many interested parties). As @vanzin mentioned above, there have not integration requests for other systems - so perhaps it is sufficient for our needs and I might be being overly paranoid (due to my past experiences with api design). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...
Github user watermen commented on the issue: https://github.com/apache/spark/pull/16677 When we create a DataSource table like below ```sql create table t1 using parquet select * from src limit 1000; ``` It will call `CollectLimitExec.doExecute`, it also use SinglePartition, so we should cover this case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17251 **[Test build #76403 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76403/testReport)** for PR 17251 at commit [`2150ce5`](https://github.com/apache/spark/commit/2150ce552a7a02d656329761e04a7fcb38e5e648). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17251 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17836: [SPARK-20566][SQL] ColumnVector should support `appendFl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17836 **[Test build #76404 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76404/testReport)** for PR 17836 at commit [`d979d0f`](https://github.com/apache/spark/commit/d979d0f482758d2d763fc88f456388ba9caf9274). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17540 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17540 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76396/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17825 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76400/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17540 **[Test build #76396 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76396/testReport)** for PR 17540 at commit [`69ed59e`](https://github.com/apache/spark/commit/69ed59e2d723fff9756671b390765f9106b5b720). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17825 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17825 **[Test build #76400 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76400/testReport)** for PR 17825 at commit [`32fd836`](https://github.com/apache/spark/commit/32fd8361d693d106658020d89dc6563aade6abb7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17821: [SPARK-20529][Core]Allow worker and master work with a p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17821 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76392/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17821: [SPARK-20529][Core]Allow worker and master work with a p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17821 **[Test build #76392 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76392/testReport)** for PR 17821 at commit [`f4699ad`](https://github.com/apache/spark/commit/f4699add54bb3fbe40d489aeabbf8f192e6ecb1f). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17821: [SPARK-20529][Core]Allow worker and master work with a p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17821 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias
Github user zero323 commented on a diff in the pull request: https://github.com/apache/spark/pull/17825#discussion_r114457011 --- Diff: R/pkg/R/column.R --- @@ -132,17 +132,24 @@ createMethods() #' alias #' -#' Set a new name for a column +#' Set a new name for an object. Equivalent to SQL "AS" keyword. --- End diff -- Moving to `generics.R` sounds good. "Column or SparkDataFrame" in place of "object" as well. Regarding "AS"... In SQL it can be used with both expressions and tables so I deliberately didn't quantify this with `Column`. I am not sure if we really need to state that it returns a new object. Maybe _Return a new Column or SparkDataFrame with an alias. Equivalent to SQL "AS" keyword._? But it doesn't sound great. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17100 **[Test build #76402 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76402/testReport)** for PR 17100 at commit [`766a033`](https://github.com/apache/spark/commit/766a033f19602dbd7da6eff96947236c8c0fd2a2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17825 **[Test build #76400 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76400/testReport)** for PR 17825 at commit [`32fd836`](https://github.com/apache/spark/commit/32fd8361d693d106658020d89dc6563aade6abb7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17100 **[Test build #76401 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76401/testReport)** for PR 17100 at commit [`4ac8143`](https://github.com/apache/spark/commit/4ac8143cf63f5b4777a66f236824671b0bb05933). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17838: [SPARK-20567] Lazily bind in GenerateExec
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/17838 LGTM! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17818: [SPARK-20544] R wrapper for input_file_name
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17818 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76397/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17818: [SPARK-20544] R wrapper for input_file_name
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17818 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17838: [SPARK-20567] Lazily bind in GenerateExec
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17838 **[Test build #76399 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76399/testReport)** for PR 17838 at commit [`7c86b0e`](https://github.com/apache/spark/commit/7c86b0e997e87bce77cdf6064975ff5cab245c08). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17818: [SPARK-20544] R wrapper for input_file_name
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17818 **[Test build #76397 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76397/testReport)** for PR 17818 at commit [`72f3fb7`](https://github.com/apache/spark/commit/72f3fb739240b9f27fcab47cbb9d82aff3272f93). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17838: [SPARK-20567] Lazily bind in GenerateExec
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/17838 [SPARK-20567] Lazily bind in GenerateExec It is not valid to eagerly bind with the child's output as this causes failures when we attempt to canonicalize the plan (replacing the attribute references with dummies). You can merge this pull request into a Git repository by running: $ git pull https://github.com/marmbrus/spark fixBindExplode Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17838.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17838 commit 7c86b0e997e87bce77cdf6064975ff5cab245c08 Author: Michael ArmbrustDate: 2017-05-03T00:13:52Z [SPARK-20567] Lazily bind in GenerateExec --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17735: [SPARK-20441][SPARK-20432][SS] Within the same st...
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/17735#discussion_r114453379 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala --- @@ -120,6 +141,32 @@ class StreamSuite extends StreamTest { assertDF(df) } + test("Within the same streaming query, one StreamingRelation should only be transformed to one " + +"StreamingExecutionRelation") { +val df = spark.readStream.format(classOf[FakeDefaultSource].getName).load() +var query: StreamExecution = null +try { + query = +df.union(df) + .writeStream + .format("memory") + .queryName("memory") + .start() + .asInstanceOf[StreamingQueryWrapper] + .streamingQuery + val executionRelations = +query + .logicalPlan --- End diff -- ah, i see. fixed. thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17735: [SPARK-20441][SPARK-20432][SS] Within the same streaming...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17735 **[Test build #76398 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76398/testReport)** for PR 17735 at commit [`63ed28a`](https://github.com/apache/spark/commit/63ed28ac1f9062b0f7d88f91a8eada601df6f6e9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9699: [SPARK-2344] [MLlib] Add fuzzifier (m) parameter to KMean...
Github user LuciferWong commented on the issue: https://github.com/apache/spark/pull/9699 This is what I wrote it myself ï¼ https://github.com/LuciferWong/spark --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17835: [SPARK-20557] [SQL] Improve the error message for unsupp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17835 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17835: [SPARK-20557] [SQL] Improve the error message for unsupp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17835 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76393/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17835: [SPARK-20557] [SQL] Improve the error message for unsupp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17835 **[Test build #76393 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76393/testReport)** for PR 17835 at commit [`d199f7b`](https://github.com/apache/spark/commit/d199f7bdfcdda3859efc0d021b42ab65915db009). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17818: [SPARK-20544] R wrapper for input_file_name
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17818 **[Test build #76397 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76397/testReport)** for PR 17818 at commit [`72f3fb7`](https://github.com/apache/spark/commit/72f3fb739240b9f27fcab47cbb9d82aff3272f93). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17818: [SPARK-20544] R wrapper for input_file_name
Github user zero323 commented on a diff in the pull request: https://github.com/apache/spark/pull/17818#discussion_r114450054 --- Diff: R/pkg/R/functions.R --- @@ -3890,3 +3890,23 @@ setMethod("not", jc <- callJStatic("org.apache.spark.sql.functions", "not", x@jc) column(jc) }) + +#' input_file_name +#' +#' Creates a string column for the file name of the current Spark task. --- End diff -- How about the new one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17818: [SPARK-20544] R wrapper for input_file_name
Github user zero323 commented on a diff in the pull request: https://github.com/apache/spark/pull/17818#discussion_r114450006 --- Diff: R/pkg/R/functions.R --- @@ -3974,3 +3974,23 @@ setMethod("grouping_id", jc <- callJStatic("org.apache.spark.sql.functions", "grouping_id", jcols) column(jc) }) + +#' input_file_name +#' +#' Creates a string column for the file name of the current Spark task. +#' +#' @rdname input_file_name +#' @name input_file_name +#' @aliases input_file_name,missing-method --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17831: [SPARK-18777][PYTHON][SQL] Return UDF from udf.register
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/17831 @gatorsmile This sounds reasonable but I am not sure if I fully understand your concerns. If anything this brings PySpark closer to the Scala API. At this moment we have ``` registerFunction(self, name: str, f: Callable[[T], U], returnType: DataType) -> None: ... ``` and we would move to: ``` registerFunction(self, name: str, f: Callable[[T], U], returnType: DataType) -> Callable[[Column, ...], Column]: ... ``` This, as pointed out by @holdenk, matches `register` API for `Function0` .. `Function22`. If you're planning breaking changes in the Scala API, it may render this PR obsolete, but we don't commit here to any particular implementation. The only promise here is that registering udf for SQL applications, returns an object, which can be used with `DataFrame` API. I believe this sounds like a reasonable requirement for any upcoming API. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17540 **[Test build #76396 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76396/testReport)** for PR 17540 at commit [`69ed59e`](https://github.com/apache/spark/commit/69ed59e2d723fff9756671b390765f9106b5b720). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17836: [SPARK-20566] ColumnVector should support `appendFloats`...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17836 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76394/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17836: [SPARK-20566] ColumnVector should support `appendFloats`...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17836 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17836: [SPARK-20566] ColumnVector should support `appendFloats`...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17836 **[Test build #76394 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76394/testReport)** for PR 17836 at commit [`a03f927`](https://github.com/apache/spark/commit/a03f92740aa36c85519c170112473a4bae89ae99). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17821: [SPARK-20529][Core]Allow worker and master work w...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/17821#discussion_r114446352 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -266,7 +289,8 @@ private[deploy] class Worker( if (registerMasterFutures != null) { registerMasterFutures.foreach(_.cancel(true)) } -val masterAddress = masterRef.address +val masterAddress = + if (preferConfiguredMasterAddress) masterAddressToConnect.get else masterRef.address --- End diff -- Right now `masterRef` and `masterAddressToConnect` are set at the same time. It's impossible unless we break something in future. It's better to fail rather than hiding the broken change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17540 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17540 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76395/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17540 **[Test build #76395 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76395/testReport)** for PR 17540 at commit [`7131c32`](https://github.com/apache/spark/commit/7131c329e34b7961ea478532faa4f202255da246). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...
Github user rberenguel commented on the issue: https://github.com/apache/spark/pull/17100 @gatorsmile Thanks for the pointers, finally found some time to come back to this. I'm not sure if my approach to get the `SQLConf` into `checkAnalysis` is the correct one in my current local changes (since it seems to change a possible API endpoint). I changed the current implementation in the trait to be named instead `def checkAnalysisWithConf(plan: LogicalPlan, conf: SQLConf): Unit` and added an abstract method `def checkAnalysis(plan: LogicalPlan): Unit` that is then implemented in `Analyzer` (where we have a `conf` we can pass around). I haven't fixed all the rest yet, was puzzled enough with the correctness of this for now ;) Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17831: [SPARK-18777][PYTHON][SQL] Return UDF from udf.register
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/17831 I feel like that's an unrelated challenge. I'm happy to see other improvements but I'm worried that we will hold up changes for things which aren't happening soon - is there a JIRA for these changes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org