[GitHub] spark issue #15900: [SPARK-18464][SQL] support old table which doesn't store...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15900 Merging in master/branch-2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15901: [SPARK-18467][SQL] Extracts method for preparing argumen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15901 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68746/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15901: [SPARK-18467][SQL] Extracts method for preparing argumen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15901 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15901: [SPARK-18467][SQL] Extracts method for preparing argumen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15901 **[Test build #68746 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68746/consoleFull)** for PR 15901 at commit [`d448b60`](https://github.com/apache/spark/commit/d448b60c786efd1a002c17a6458a4b3b62669efc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15913: [SPARK-18481][ML] ML 2.1 QA: Remove deprecated methods f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15913 **[Test build #68757 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68757/consoleFull)** for PR 15913 at commit [`9a8c926`](https://github.com/apache/spark/commit/9a8c92691e7ec0b9d37eed0cf6f9dbcc4d4d622f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15907: [SPARK-18458][CORE] Fix signed integer overflow problem ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/15907 It would be good. In that case, is it better to insert explicit cast (from int to long) if a caller gives an int variable? This is why we want to explicitly express a specification of public APIs (`sort()` and `sortKeyPrefixArray()`). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15811: [SPARK-18361] [PySpark] Expose RDD localCheckpoint in Py...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15811 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68755/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15811: [SPARK-18361] [PySpark] Expose RDD localCheckpoint in Py...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15811 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15811: [SPARK-18361] [PySpark] Expose RDD localCheckpoint in Py...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15811 **[Test build #68755 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68755/consoleFull)** for PR 15811 at commit [`36988a3`](https://github.com/apache/spark/commit/36988a3aaeaa5e70919cb532025c0a67ded95117). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15812: [SPARK-18360][SQL] default table path of tables in defau...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15812 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15812: [SPARK-18360][SQL] default table path of tables in defau...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15812 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68749/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15812: [SPARK-18360][SQL] default table path of tables in defau...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15812 **[Test build #68749 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68749/consoleFull)** for PR 15812 at commit [`27be481`](https://github.com/apache/spark/commit/27be481dd4cf554c39d220c2c039bff944af6469). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r88399292 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriterConfig.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.internal.io + +import scala.reflect.ClassTag + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.mapred.JobConf +import org.apache.hadoop.mapreduce._ + +import org.apache.spark.util.{SerializableConfiguration, SerializableJobConf, Utils} + +/** + * Interface for create output format/committer/writer used during saving an RDD using a Hadoop + * OutputFormat (both from the old mapred API and the new mapreduce API) + * + * Notes: + * 1. Implementations should throw [[IllegalArgumentException]] when wrong hadoop API is + *referenced; + * 2. Implementations must be serializable, as the instance instantiated on the driver + *will be used for tasks on executors; + * 3. Implementations should have a constructor with exactly one argument: + *(conf: SerializableConfiguration) or (conf: SerializableJobConf). + */ +abstract class SparkHadoopWriterConfig[K, V: ClassTag] extends Serializable { --- End diff -- maybe just HadoopWriteConfigUtil ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15868: [SPARK-18413][SQL] Add `maxConnections` JDBCOptio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15868#discussion_r88399280 --- Diff: docs/sql-programming-guide.md --- @@ -1087,6 +1087,13 @@ the following case-sensitive options: + maxConnections + + The number of JDBC connections, which specifies the maximum number of simultaneous JDBC connections that are allowed. This option applies only to writing. It defaults to the number of partitions of RDD. --- End diff -- Ok. Let us keep it unchanged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15868: [SPARK-18413][SQL] Add `maxConnections` JDBCOptio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15868#discussion_r88399141 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala --- @@ -122,6 +122,11 @@ class JDBCOptions( case "REPEATABLE_READ" => Connection.TRANSACTION_REPEATABLE_READ case "SERIALIZABLE" => Connection.TRANSACTION_SERIALIZABLE } + // the maximum number of connections + val maxConnections = parameters.getOrElse(JDBC_MAX_CONNECTIONS, null) + require(maxConnections == null || maxConnections.toInt > 0, +s"Invalid value `$maxConnections` for parameter `$JDBC_MAX_CONNECTIONS`. " + + s"The minimum value is 1.") --- End diff -- Nit: no need to add `s` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15868: [SPARK-18413][SQL] Add `maxConnections` JDBCOption
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15868 Yeah, add one negative test case and one positive test case for the newly added parm. For example, below is a negative test case. ``` val df = spark.createDataFrame(sparkContext.parallelize(arr2x2), schema2) val e = intercept[java.lang.IllegalArgumentException] { df.write.format("jdbc") .option("dbtable", "TEST.SAVETEST") .option("url", url1) .option(s"${JDBCOptions.JDBC_MAX_CONNECTIONS}", "0") .save() }.getMessage assert(e.contains("Invalid value `0` for parameter `maxConnections`. The minimum value is 1")) ``` Very hard to test the number of connections Spark really makes, unless we change the source codes. Maybe we do not need it. You can manually checks the num of partitions of RDD. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15889: [WIP][SPARK-18445][DOCS] Fix the markdown for `Note:`/`N...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15889 **[Test build #68756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68756/consoleFull)** for PR 15889 at commit [`f2b201c`](https://github.com/apache/spark/commit/f2b201c50c07102e793b8723d8e29275ce834e12). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15907: [SPARK-18458][CORE] Fix signed integer overflow problem ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15907 I'm thinking 2. What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15901: [SPARK-18467][SQL] Extracts method for preparing argumen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15901 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68744/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15901: [SPARK-18467][SQL] Extracts method for preparing argumen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15901 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15901: [SPARK-18467][SQL] Extracts method for preparing argumen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15901 **[Test build #68744 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68744/consoleFull)** for PR 15901 at commit [`240fde4`](https://github.com/apache/spark/commit/240fde493856b7bfb0568b338cf30ba0a08408df). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait InvokeLike ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15900: [SPARK-18464][SQL] support old table which doesn't store...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15900 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15900: [SPARK-18464][SQL] support old table which doesn't store...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15900 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68745/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15900: [SPARK-18464][SQL] support old table which doesn't store...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15900 **[Test build #68745 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68745/consoleFull)** for PR 15900 at commit [`847dada`](https://github.com/apache/spark/commit/847dadaf03293092406c43109d3e5b7b88369628). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15803: [SPARK-18298][Web UI]change gmt time to local zone time ...
Github user WangTaoTheTonic commented on the issue: https://github.com/apache/spark/pull/15803 Do we have a guy who's good at JAX-RS? maybe he can explain the theory and help us to understand better :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15717: [SPARK-17910][SQL] Allow users to update the comment of ...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/15717 @gatorsmile I've added tests ensure that it support data source tables, please check when you have time, thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15803: [SPARK-18298][Web UI]change gmt time to local zone time ...
Github user WangTaoTheTonic commented on the issue: https://github.com/apache/spark/pull/15803 @srowen Before the code changes, browser get date string from server side, now instead it get Date(this conclusion comes from codes debugging(https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/api/v1/ApiRootResource.scala#L46), I'm not sure personally, please correct me if I'm wrong), and parse its string format(`hacks a date string to drop seconds and timezone`). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15861: [SPARK-18294][CORE] Implement commit protocol to support...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/15861 Would anyone look at this PR pleaseï¼ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15898: [SPARK-18457][SQL] ORC and other columnar formats using ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15898 @tejasapatil does this lgtu? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15358: [SPARK-17783] [SQL] Hide Credentials in CREATE and DESC ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15358 The PR description is updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15811: [SPARK-18361] [PySpark] Expose RDD localCheckpoint in Py...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15811 **[Test build #68755 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68755/consoleFull)** for PR 15811 at commit [`36988a3`](https://github.com/apache/spark/commit/36988a3aaeaa5e70919cb532025c0a67ded95117). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15811: [SPARK-18361] [PySpark] Expose RDD localCheckpoint in Py...
Github user gabrielhuang commented on the issue: https://github.com/apache/spark/pull/15811 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15811: [SPARK-18361] [PySpark] Expose RDD localCheckpoint in Py...
Github user gabrielhuang commented on the issue: https://github.com/apache/spark/pull/15811 Ok, I removed the Python flag. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15358: [SPARK-17783] [SQL] Hide Credentials in CREATE and DESC ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15358 **[Test build #68754 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68754/consoleFull)** for PR 15358 at commit [`bc9a508`](https://github.com/apache/spark/commit/bc9a5082cf4576cfa7a6d4911db74bc0476c4360). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14719 **[Test build #68753 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68753/consoleFull)** for PR 14719 at commit [`5d1ff3e`](https://github.com/apache/spark/commit/5d1ff3e601f9583d289a88f708230639a25a18b2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14262: [SPARK-14974][SQL]delete temporary folder after i...
Github user baishuo closed the pull request at: https://github.com/apache/spark/pull/14262 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14262: [SPARK-14974][SQL]delete temporary folder after insert h...
Github user baishuo commented on the issue: https://github.com/apache/spark/pull/14262 close this and open the same one base on new master branch. https://github.com/apache/spark/pull/15914 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15914: delete temporary folder after insert hive table
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15914 **[Test build #68752 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68752/consoleFull)** for PR 15914 at commit [`cb08136`](https://github.com/apache/spark/commit/cb08136733b4b4dc48e488e33525dcebb715a75f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15914: delete temporary folder after insert hive table
GitHub user baishuo opened a pull request: https://github.com/apache/spark/pull/15914 delete temporary folder after insert hive table ## What changes were proposed in this pull request? Modify the code of InsertIntoHiveTable.scala. To fix https://issues.apache.org/jira/browse/SPARK-14974 ## How was this patch tested? I think this patch can be tested manually. You can merge this pull request into a Git repository by running: $ git pull https://github.com/baishuo/spark SPARK-14974-20161117 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15914.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15914 commit cb08136733b4b4dc48e488e33525dcebb715a75f Author: baishuo Date: 2016-11-17T06:35:29Z delete temporary folder after insert hive table --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15659 Yay! :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15910: [SPARK-18476][SPARKR][ML]:SparkR Logistic Regression sho...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15910 **[Test build #68751 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68751/consoleFull)** for PR 15910 at commit [`575eeda`](https://github.com/apache/spark/commit/575eedadd2b1fd679623f5a71db8c0439df5f3d0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15910: [SPARK-18476][SPARKR][ML]:SparkR Logistic Regression sho...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15910 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15815: [DOCS][SPARK-18365] Improve Sample Method Documentation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15815 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68742/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15815: [DOCS][SPARK-18365] Improve Sample Method Documentation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15815 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15815: [DOCS][SPARK-18365] Improve Sample Method Documentation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15815 **[Test build #68742 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68742/consoleFull)** for PR 15815 at commit [`0d7cde8`](https://github.com/apache/spark/commit/0d7cde89d17d8eab2a3df50f1e25f4508bed5010). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15907: [SPARK-18458][CORE] Fix signed integer overflow problem ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/15907 Ideally, it is a good idea. Do you want to do that change using this PR? Another question is which scope do we apply such a change? (1) Only in methods that I applied changes to. (2) Only in 'RadixSort.java' (3) Beyond public methods ('sort()' and 'sortKeyPrefixArray()') Obviously, (3) requires a lot of changes to many files. What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14719 **[Test build #68750 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68750/consoleFull)** for PR 14719 at commit [`e91a24e`](https://github.com/apache/spark/commit/e91a24ef0e9a09df4e0a24030d0a6cc23bfc3b9d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15910: [SPARK-18476][SPARKR][ML]:SparkR Logistic Regression sho...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15910 the failure occurs in kafka-streaming. retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14719 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68750/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14719 **[Test build #68750 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68750/consoleFull)** for PR 14719 at commit [`e91a24e`](https://github.com/apache/spark/commit/e91a24ef0e9a09df4e0a24030d0a6cc23bfc3b9d). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14719 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15907: [SPARK-18458][CORE] Fix signed integer overflow problem ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15907 Yea it might be a good idea to just use longs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15901: [SPARK-18467][SQL] Extracts method for preparing ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15901#discussion_r88392144 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -179,7 +249,8 @@ case class Invoke( } val code = s""" ${obj.code} - ${argGen.map(_.code).mkString("\n")} + $argCode --- End diff -- we don't need to evaluate the arguments if `obj.isNull == true` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15901: [SPARK-18467][SQL] Extracts method for preparing ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15901#discussion_r88392052 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -33,6 +33,85 @@ import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, GenericArrayData} import org.apache.spark.sql.types._ /** + * Common base class for [[StaticInvoke]], [[Invoke]], and [[NewInstance]]. + */ +trait InvokeLike extends Expression with NonSQLExpression { + + def arguments: Seq[Expression] + + def propagateNull: Boolean + + protected lazy val needNullCheck: Boolean = propagateNull && arguments.exists(_.nullable) + + /** + * Prepares codes for arguments. + * + * - generate codes for argument. + * - use ctx.splitExpressions() to not exceed 64kb JVM limit while preparing arguments. + * - avoid some of nullabilty checking which are not needed because the expression is not + * nullable. + * - when needNullCheck == true, short circuit if we found one of arguments is null because + * preparing rest of arguments can be skipped in the case. + * + * @param ctx a [[CodegenContext]] + * @param ev an [[ExprCode]] with unique terms. + * @return (code to prepare arguments, argument string, code to set isNull) + */ + def prepareArguments(ctx: CodegenContext, ev: ExprCode): (String, String, String) = { + +val containsNullInArguments = if (needNullCheck) { --- End diff -- how about ``` val resultIsNull = if (needNullCheck) { ... } else { "false" } ``` Then we return this `resultIsNull`, and the caller side can just generate code like ``` ${ev.isNull} = $resultIsNull ${ev.value} = ... $postNullCheck ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15812: [SPARK-18360][SQL] default table path of tables in defau...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15812 **[Test build #68749 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68749/consoleFull)** for PR 15812 at commit [`27be481`](https://github.com/apache/spark/commit/27be481dd4cf554c39d220c2c039bff944af6469). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15901: [SPARK-18467][SQL] Extracts method for preparing argumen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15901 **[Test build #68747 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68747/consoleFull)** for PR 15901 at commit [`8894a96`](https://github.com/apache/spark/commit/8894a962f81e0eb73669b4dc4ba59d395fdf6bd0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12257: [SPARK-14483][WEBUI] Display user name for each job and ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12257 **[Test build #68748 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68748/consoleFull)** for PR 12257 at commit [`3918357`](https://github.com/apache/spark/commit/3918357ce987ab5beda0968f6a5f3af5c528186d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15901: [SPARK-18467][SQL] Extracts method for preparing ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/15901#discussion_r88390158 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -33,6 +33,88 @@ import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, GenericArrayData} import org.apache.spark.sql.types._ /** + * Common base class for [[StaticInvoke]], [[Invoke]], and [[NewInstance]]. + */ +trait InvokeLike { + + def arguments: Seq[Expression] + + def propagateNull: Boolean + + protected lazy val propagatingNull: Boolean = propagateNull && arguments.exists(_.nullable) + + /** + * Prepares codes for arguments. + * + * - generate codes for argument. + * - use ctx.splitExpressions() not to exceed 64kb JVM limit while preparing arguments. + * - avoid some of nullabilty checking which are not needed because the expression is not + * nullable. + * - when progagateNull == true, short circuit if we found one of arguments is null because --- End diff -- Thanks, I used `needNullCheck` for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15901: [SPARK-18467][SQL] Extracts method for preparing argumen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15901 **[Test build #68746 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68746/consoleFull)** for PR 15901 at commit [`d448b60`](https://github.com/apache/spark/commit/d448b60c786efd1a002c17a6458a4b3b62669efc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15910: [SPARK-18476][SPARKR][ML]:SparkR Logistic Regression sho...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15910 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15910: [SPARK-18476][SPARKR][ML]:SparkR Logistic Regression sho...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15910 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68741/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15910: [SPARK-18476][SPARKR][ML]:SparkR Logistic Regression sho...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15910 **[Test build #68741 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68741/consoleFull)** for PR 15910 at commit [`575eeda`](https://github.com/apache/spark/commit/575eedadd2b1fd679623f5a71db8c0439df5f3d0). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15901: [SPARK-18467][SQL] Extracts method for preparing ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/15901#discussion_r88389738 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -245,51 +319,36 @@ case class NewInstance( override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val javaType = ctx.javaType(dataType) -val argIsNulls = ctx.freshName("argIsNulls") -ctx.addMutableState("boolean[]", argIsNulls, - s"$argIsNulls = new boolean[${arguments.size}];") -val argValues = arguments.zipWithIndex.map { case (e, i) => - val argValue = ctx.freshName("argValue") - ctx.addMutableState(ctx.javaType(e.dataType), argValue, "") - argValue -} -val argCodes = arguments.zipWithIndex.map { case (e, i) => - val expr = e.genCode(ctx) - expr.code + s""" - $argIsNulls[$i] = ${expr.isNull}; - ${argValues(i)} = ${expr.value}; - """ -} -val argCode = ctx.splitExpressions(ctx.INPUT_ROW, argCodes) +val (argCode, argString, setIsNull) = prepareArguments(ctx, ev) val outer = outerPointer.map(func => Literal.fromObject(func()).genCode(ctx)) var isNull = ev.isNull -val setIsNull = if (propagateNull && arguments.nonEmpty) { - s""" - boolean $isNull = false; - for (int idx = 0; idx < ${arguments.length}; idx++) { - if ($argIsNulls[idx]) { $isNull = true; break; } - } - """ +val prepareIsNull = if (propagateNull && arguments.exists(_.nullable)) { --- End diff -- Oops, I missed it, fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15901: [SPARK-18467][SQL] Extracts method for preparing ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/15901#discussion_r88389735 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -50,7 +132,7 @@ case class StaticInvoke( dataType: DataType, functionName: String, arguments: Seq[Expression] = Nil, -propagateNull: Boolean = true) extends Expression with NonSQLExpression { +propagateNull: Boolean = true) extends Expression with InvokeLike with NonSQLExpression { --- End diff -- Makes sense, I'll modify them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15901: [SPARK-18467][SQL] Extracts method for preparing ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/15901#discussion_r88389718 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -33,6 +33,88 @@ import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, GenericArrayData} import org.apache.spark.sql.types._ /** + * Common base class for [[StaticInvoke]], [[Invoke]], and [[NewInstance]]. + */ +trait InvokeLike { + + def arguments: Seq[Expression] + + def propagateNull: Boolean + + protected lazy val propagatingNull: Boolean = propagateNull && arguments.exists(_.nullable) + + /** + * Prepares codes for arguments. + * + * - generate codes for argument. + * - use ctx.splitExpressions() not to exceed 64kb JVM limit while preparing arguments. --- End diff -- Thanks, fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15901: [SPARK-18467][SQL] Extracts method for preparing ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/15901#discussion_r88389723 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -33,6 +33,88 @@ import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, GenericArrayData} import org.apache.spark.sql.types._ /** + * Common base class for [[StaticInvoke]], [[Invoke]], and [[NewInstance]]. + */ +trait InvokeLike { + + def arguments: Seq[Expression] + + def propagateNull: Boolean + + protected lazy val propagatingNull: Boolean = propagateNull && arguments.exists(_.nullable) + + /** + * Prepares codes for arguments. + * + * - generate codes for argument. + * - use ctx.splitExpressions() not to exceed 64kb JVM limit while preparing arguments. + * - avoid some of nullabilty checking which are not needed because the expression is not + * nullable. + * - when progagateNull == true, short circuit if we found one of arguments is null because + * preparing rest of arguments can be skipped in the case. + * + * @param ctx a [[CodegenContext]] + * @param ev an [[ExprCode]] with unique terms. + * @return (code to prepare arguments, argument string, code to set isNull) + */ + def prepareArguments(ctx: CodegenContext, ev: ExprCode): (String, String, String) = { + +val containsNullInArguments = if (propagatingNull) { + val containsNullInArguments = ctx.freshName("containsNullInArguments") + ctx.addMutableState("boolean", containsNullInArguments, "") + containsNullInArguments +} else { + "" +} +val argValues = arguments.zipWithIndex.map { case (e, i) => + val argValue = ctx.freshName("argValue") + ctx.addMutableState(ctx.javaType(e.dataType), argValue, "") + argValue +} + +val argCodes = if (propagatingNull) { + val reset = s"$containsNullInArguments = false;" + val argCodes = arguments.zipWithIndex.map { case (e, i) => +val expr = e.genCode(ctx) +s""" --- End diff -- Thanks, I'll use it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15901: [SPARK-18467][SQL] Extracts method for preparing ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15901#discussion_r88387679 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -245,51 +319,36 @@ case class NewInstance( override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val javaType = ctx.javaType(dataType) -val argIsNulls = ctx.freshName("argIsNulls") -ctx.addMutableState("boolean[]", argIsNulls, - s"$argIsNulls = new boolean[${arguments.size}];") -val argValues = arguments.zipWithIndex.map { case (e, i) => - val argValue = ctx.freshName("argValue") - ctx.addMutableState(ctx.javaType(e.dataType), argValue, "") - argValue -} -val argCodes = arguments.zipWithIndex.map { case (e, i) => - val expr = e.genCode(ctx) - expr.code + s""" - $argIsNulls[$i] = ${expr.isNull}; - ${argValues(i)} = ${expr.value}; - """ -} -val argCode = ctx.splitExpressions(ctx.INPUT_ROW, argCodes) +val (argCode, argString, setIsNull) = prepareArguments(ctx, ev) val outer = outerPointer.map(func => Literal.fromObject(func()).genCode(ctx)) var isNull = ev.isNull -val setIsNull = if (propagateNull && arguments.nonEmpty) { - s""" - boolean $isNull = false; - for (int idx = 0; idx < ${arguments.length}; idx++) { - if ($argIsNulls[idx]) { $isNull = true; break; } - } - """ +val prepareIsNull = if (propagateNull && arguments.exists(_.nullable)) { --- End diff -- use `propogatingNull`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13300: [SPARK-15463][SQL] support creating dataframe out of Dat...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13300 What's the status of this pr? It seems to be more natural that we implement `from_csv` in a similar way of `from_json` in https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L2900 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15913: [SPARK-18481][ML] ML 2.1 QA: Remove deprecated methods f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15913 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68743/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15913: [SPARK-18481][ML] ML 2.1 QA: Remove deprecated methods f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15913 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15913: [SPARK-18481][ML] ML 2.1 QA: Remove deprecated methods f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15913 **[Test build #68743 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68743/consoleFull)** for PR 15913 at commit [`ecddf15`](https://github.com/apache/spark/commit/ecddf1597bde986cf216e632d8cf3075875a6918). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15901: [SPARK-18467][SQL] Extracts method for preparing ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15901#discussion_r88386690 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -33,6 +33,88 @@ import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, GenericArrayData} import org.apache.spark.sql.types._ /** + * Common base class for [[StaticInvoke]], [[Invoke]], and [[NewInstance]]. + */ +trait InvokeLike { + + def arguments: Seq[Expression] + + def propagateNull: Boolean + + protected lazy val propagatingNull: Boolean = propagateNull && arguments.exists(_.nullable) + + /** + * Prepares codes for arguments. + * + * - generate codes for argument. + * - use ctx.splitExpressions() not to exceed 64kb JVM limit while preparing arguments. + * - avoid some of nullabilty checking which are not needed because the expression is not + * nullable. + * - when progagateNull == true, short circuit if we found one of arguments is null because --- End diff -- `progagateNull` -> `propagatingNull`? Actually can we think of better name than `propagatingNull`? how about `needNullCheck` ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15901: [SPARK-18467][SQL] Extracts method for preparing ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15901#discussion_r88385924 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -33,6 +33,88 @@ import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, GenericArrayData} import org.apache.spark.sql.types._ /** + * Common base class for [[StaticInvoke]], [[Invoke]], and [[NewInstance]]. + */ +trait InvokeLike { + + def arguments: Seq[Expression] + + def propagateNull: Boolean + + protected lazy val propagatingNull: Boolean = propagateNull && arguments.exists(_.nullable) + + /** + * Prepares codes for arguments. + * + * - generate codes for argument. + * - use ctx.splitExpressions() not to exceed 64kb JVM limit while preparing arguments. --- End diff -- nit: `not to` -> `to not` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15900: [SPARK-18464][SQL] support old table which doesn't store...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15900 **[Test build #68745 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68745/consoleFull)** for PR 15900 at commit [`847dada`](https://github.com/apache/spark/commit/847dadaf03293092406c43109d3e5b7b88369628). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15901: [SPARK-18467][SQL] Extracts method for preparing argumen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15901 **[Test build #68744 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68744/consoleFull)** for PR 15901 at commit [`240fde4`](https://github.com/apache/spark/commit/240fde493856b7bfb0568b338cf30ba0a08408df). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15913: [SPARK-18481][ML] ML 2.1 QA: Remove deprecated methods f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15913 **[Test build #68743 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68743/consoleFull)** for PR 15913 at commit [`ecddf15`](https://github.com/apache/spark/commit/ecddf1597bde986cf216e632d8cf3075875a6918). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15901: [SPARK-18467][SQL] Extracts method for preparing argumen...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/15901 @viirya @cloud-fan I addressed your comments and updated pr title and description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15913: [SPARK-18481][ML] ML 2.1 QA: Remove deprecated me...
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/15913 [SPARK-18481][ML] ML 2.1 QA: Remove deprecated methods for ML ## What changes were proposed in this pull request? Remove deprecated methods for ML. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanboliang/spark spark-18481 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15913.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15913 commit ecddf1597bde986cf216e632d8cf3075875a6918 Author: Yanbo Liang Date: 2016-11-17T04:35:20Z Remove deprecated methods for ML. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15900: [SPARK-18464][SQL] support old table which doesn'...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15900#discussion_r88384871 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala --- @@ -1371,4 +1371,23 @@ class MetastoreDataSourcesSuite extends QueryTest with SQLTestUtils with TestHiv } } } + + test("SPARK-18464: support old table which doesn't store schema in table properties") { +withTable("old") { + withTempPath { path => +Seq(1 -> "a").toDF("i", "j").write.parquet(path.getAbsolutePath) +val tableDesc = CatalogTable( + identifier = TableIdentifier("old", Some("default")), + tableType = CatalogTableType.EXTERNAL, + storage = CatalogStorageFormat.empty.copy( +properties = Map("path" -> path.getAbsolutePath) + ), + schema = new StructType(), + properties = Map( +HiveExternalCatalog.DATASOURCE_PROVIDER -> "parquet")) +hiveClient.createTable(tableDesc, ignoreIfExists = false) +checkAnswer(spark.table("old"), Row(1, "a")) + } +} + } --- End diff -- created https://issues.apache.org/jira/browse/SPARK-18482 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/15659 Agreed, so I'm going to cherry-pick this into branch-2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15912: [SPARK-18480][Docs] Fix wrong links for ML guide docs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15912 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68740/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15912: [SPARK-18480][Docs] Fix wrong links for ML guide docs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15912 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15912: [SPARK-18480][Docs] Fix wrong links for ML guide docs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15912 **[Test build #68740 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68740/consoleFull)** for PR 15912 at commit [`8518730`](https://github.com/apache/spark/commit/8518730d0f1117bf00b58c1ed40ceaad0c7ab11b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15900: [SPARK-18464][SQL] support old table which doesn'...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15900#discussion_r88383221 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -1023,6 +1023,11 @@ object HiveExternalCatalog { // After SPARK-6024, we removed this flag. // Although we are not using `spark.sql.sources.schema` any more, we need to still support. DataType.fromJson(schema.get).asInstanceOf[StructType] +} else if (props.filterKeys(_.startsWith(DATASOURCE_SCHEMA_PREFIX)).isEmpty) { + // If there is no schema information in table properties, it means the schema of this table + // was empty when saving into metastore, which is possible in older version of Spark. We + // should respect it. + new StructType() --- End diff -- no, since we also store schema for hive table, hive table will also call this function. But hive table will never go into this branch, as it always has a schema.(the removal of runtime schema inference happened before we store schema of hive table) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedIm...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15703#discussion_r88382948 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala --- @@ -365,4 +380,66 @@ private[hive] case class HiveUDAFFunction( val distinct = if (isDistinct) "DISTINCT " else " " s"$name($distinct${children.map(_.sql).mkString(", ")})" } + + override def createAggregationBuffer(): AggregationBuffer = +partial1ModeEvaluator.getNewAggregationBuffer + + @transient + private lazy val inputProjection = new InterpretedProjection(children) + + override def update(buffer: AggregationBuffer, input: InternalRow): Unit = { +partial1ModeEvaluator.iterate( + buffer, wrap(inputProjection(input), inputWrappers, cached, inputDataTypes)) + } + + override def merge(buffer: AggregationBuffer, input: AggregationBuffer): Unit = { +partial2ModeEvaluator.merge(buffer, partial1ModeEvaluator.terminatePartial(input)) + } + + override def eval(buffer: AggregationBuffer): Any = { +resultUnwrapper(finalModeEvaluator.terminate(buffer)) + } + + override def serialize(buffer: AggregationBuffer): Array[Byte] = { +aggBufferSerDe.serialize(buffer) + } + + override def deserialize(bytes: Array[Byte]): AggregationBuffer = { +aggBufferSerDe.deserialize(bytes) + } + + // Helper class used to de/serialize Hive UDAF `AggregationBuffer` objects + private class AggregationBufferSerDe { +private val partialResultUnwrapper = unwrapperFor(partialResultInspector) + +private val partialResultWrapper = wrapperFor(partialResultInspector, partialResultDataType) + +private val projection = UnsafeProjection.create(Array(partialResultDataType)) + +private val mutableRow = new GenericInternalRow(1) + +def serialize(buffer: AggregationBuffer): Array[Byte] = { + // `GenericUDAFEvaluator.terminatePartial()` converts an `AggregationBuffer` into an object + // that can be inspected by the `ObjectInspector` returned by `GenericUDAFEvaluator.init()`. + // Then we can unwrap it to a Spark SQL value. + mutableRow.update(0, partialResultUnwrapper(partial1ModeEvaluator.terminatePartial(buffer))) + val unsafeRow = projection(mutableRow) + val bytes = ByteBuffer.allocate(unsafeRow.getSizeInBytes) + unsafeRow.writeTo(bytes) + bytes.array() --- End diff -- Actually they are different. If the buffer type is fixed length, then the `unsafeRow` is just a fixed-length bytes array, and `UnsafeRow.getBytes` will just return that array, instead of copying the memory. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedIm...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15703#discussion_r88382746 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala --- @@ -365,4 +380,66 @@ private[hive] case class HiveUDAFFunction( val distinct = if (isDistinct) "DISTINCT " else " " s"$name($distinct${children.map(_.sql).mkString(", ")})" } + + override def createAggregationBuffer(): AggregationBuffer = +partial1ModeEvaluator.getNewAggregationBuffer + + @transient + private lazy val inputProjection = new InterpretedProjection(children) + + override def update(buffer: AggregationBuffer, input: InternalRow): Unit = { +partial1ModeEvaluator.iterate( + buffer, wrap(inputProjection(input), inputWrappers, cached, inputDataTypes)) + } + + override def merge(buffer: AggregationBuffer, input: AggregationBuffer): Unit = { +partial2ModeEvaluator.merge(buffer, partial1ModeEvaluator.terminatePartial(input)) + } + + override def eval(buffer: AggregationBuffer): Any = { +resultUnwrapper(finalModeEvaluator.terminate(buffer)) + } + + override def serialize(buffer: AggregationBuffer): Array[Byte] = { +aggBufferSerDe.serialize(buffer) + } + + override def deserialize(bytes: Array[Byte]): AggregationBuffer = { +aggBufferSerDe.deserialize(bytes) + } + + // Helper class used to de/serialize Hive UDAF `AggregationBuffer` objects + private class AggregationBufferSerDe { +private val partialResultUnwrapper = unwrapperFor(partialResultInspector) + +private val partialResultWrapper = wrapperFor(partialResultInspector, partialResultDataType) + +private val projection = UnsafeProjection.create(Array(partialResultDataType)) + +private val mutableRow = new GenericInternalRow(1) + +def serialize(buffer: AggregationBuffer): Array[Byte] = { + // `GenericUDAFEvaluator.terminatePartial()` converts an `AggregationBuffer` into an object + // that can be inspected by the `ObjectInspector` returned by `GenericUDAFEvaluator.init()`. + // Then we can unwrap it to a Spark SQL value. + mutableRow.update(0, partialResultUnwrapper(partial1ModeEvaluator.terminatePartial(buffer))) + val unsafeRow = projection(mutableRow) + val bytes = ByteBuffer.allocate(unsafeRow.getSizeInBytes) + unsafeRow.writeTo(bytes) + bytes.array() --- End diff -- but you also create an unnecessary `ByteBuffer`... as they are equivalent, doesn't `unsafeRow.getBytes` simpler? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15815: [DOCS][SPARK-18365] Improve Sample Method Documentation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15815 **[Test build #68742 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68742/consoleFull)** for PR 15815 at commit [`0d7cde8`](https://github.com/apache/spark/commit/0d7cde89d17d8eab2a3df50f1e25f4508bed5010). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15815: [DOCS][SPARK-18365] Improve Sample Method Documentation
Github user anabranch commented on the issue: https://github.com/apache/spark/pull/15815 failures also seem unrelated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15815: [DOCS][SPARK-18365] Improve Sample Method Documen...
Github user anabranch commented on a diff in the pull request: https://github.com/apache/spark/pull/15815#discussion_r88382198 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDD.scala --- @@ -99,6 +99,8 @@ class JavaRDD[T](val rdd: RDD[T])(implicit val classTag: ClassTag[T]) /** * Return a sampled subset of this RDD. + * Note: this is NOT guaranteed to provide exactly the fraction of the count --- End diff -- Fixed, the Python one didn't need it once i re-read the docs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15910: [SPARK-18476][SPARKR][ML]:SparkR Logistic Regression sho...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15910 **[Test build #68741 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68741/consoleFull)** for PR 15910 at commit [`575eeda`](https://github.com/apache/spark/commit/575eedadd2b1fd679623f5a71db8c0439df5f3d0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15910: [SPARK-18476][SPARKR][ML]:SparkR Logistic Regression sho...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15910 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15887: [SPARK-18442][SQL] Fix nullability of WrapOption.
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15887 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15887: [SPARK-18442][SQL] Fix nullability of WrapOption.
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15887 LGTM, merging to master/2.1! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15910: [SPARK-18476][SPARKR][ML]:SparkR Logistic Regression sho...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15910 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68739/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15910: [SPARK-18476][SPARKR][ML]:SparkR Logistic Regression sho...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15910 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15910: [SPARK-18476][SPARKR][ML]:SparkR Logistic Regression sho...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15910 **[Test build #68739 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68739/consoleFull)** for PR 15910 at commit [`575eeda`](https://github.com/apache/spark/commit/575eedadd2b1fd679623f5a71db8c0439df5f3d0). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15912: [SPARK-18480][Docs] Fix wrong links for ML guide docs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15912 **[Test build #68740 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68740/consoleFull)** for PR 15912 at commit [`8518730`](https://github.com/apache/spark/commit/8518730d0f1117bf00b58c1ed40ceaad0c7ab11b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15912: [SPARK-18480][Docs] Fix wrong links for ML guide ...
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/15912 [SPARK-18480][Docs] Fix wrong links for ML guide docs ## What changes were proposed in this pull request? 1, There are two `[Graph.partitionBy]` in `graphx-programming-guide.md`, the first one had no effert. 2, `DataFrame`, `Transformer`, `Pipeline` and `Parameter` in `ml-pipeline.md` were linked to `ml-guide.html` by mistake. 3, `PythonMLLibAPI` in `mllib-linear-methods.md` was not accessable, because class `PythonMLLibAPI` is private. 4, Other link updates. ## How was this patch tested? manual tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark md_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15912.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15912 commit e3246b999dcc6ade948f1fac9650818e5cb8ad9f Author: Zheng RuiFeng Date: 2016-11-16T08:13:30Z create pr commit 192d4bfd7d95c2aaaca20e31665a1f3ee15fb89b Author: Zheng RuiFeng Date: 2016-11-16T08:19:17Z del duplicate link in graphx-doc commit 06a03b805fb20c25a12707e54d0a7642a05a80e3 Author: Zheng RuiFeng Date: 2016-11-16T13:16:08Z update commit 8518730d0f1117bf00b58c1ed40ceaad0c7ab11b Author: Zheng RuiFeng Date: 2016-11-16T13:25:30Z fix link in api doc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15907: [SPARK-18458][CORE] Avoid signed integer overflow at an ...
Github user ericl commented on the issue: https://github.com/apache/spark/pull/15907 LGTM. I wonder if it's also worth changing all the ints to longs to be extra safe here with conversions, even if we know the values are bounded. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15911: [SPARK-18477][SS]Enable interrupts for HDFS in HDFSMetad...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15911 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org