[GitHub] spark pull request #18920: [SPARK-19471][SQL]AggregationIterator does not in...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18920#discussion_r133002116 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -449,6 +451,49 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { ).foreach(assertValuesDoNotChangeAfterCoalesceOrUnion(_)) } + private def assertNoExceptions(c: Column): Unit = { --- End diff -- Could you submit a follow-up PR to move this test case to `DataFrameAggregateSuite`? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18887: [SPARK-20642][core] Store FsHistoryProvider listi...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18887#discussion_r133001818 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/ApplicationHistoryProvider.scala --- @@ -76,6 +76,14 @@ private[history] case class LoadedAppUI( private[history] abstract class ApplicationHistoryProvider { /** + * The number of applications available for listing. Separate method in case it's cheaper + * to get a count than to calculate the whole listing. --- End diff -- Actually it doesn't seem like this is used anymore and I can remove it... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18887: [SPARK-20642][core] Store FsHistoryProvider listi...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18887#discussion_r133001417 --- Diff: core/src/main/scala/org/apache/spark/status/api/v1/api.scala --- @@ -31,6 +33,9 @@ class ApplicationInfo private[spark]( val memoryPerExecutorMB: Option[Int], val attempts: Seq[ApplicationAttemptInfo]) +@JsonIgnoreProperties( + value = Array("startTimeEpoch", "endTimeEpoch", "lastUpdatedEpoch"), --- End diff -- No, this just avoids trying to deserialize them, which would cause an error because these properties have no setter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18887: [SPARK-20642][core] Store FsHistoryProvider listi...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18887#discussion_r133001335 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -742,53 +698,145 @@ private[history] object FsHistoryProvider { private val APPL_END_EVENT_PREFIX = "{\"Event\":\"SparkListenerApplicationEnd\"" private val LOG_START_EVENT_PREFIX = "{\"Event\":\"SparkListenerLogStart\"" + + private val CURRENT_VERSION = 1L } /** - * Application attempt information. - * - * @param logPath path to the log file, or, for a legacy log, its directory - * @param name application name - * @param appId application ID - * @param attemptId optional attempt ID - * @param startTime start time (from playback) - * @param endTime end time (from playback). -1 if the application is incomplete. - * @param lastUpdated the modification time of the log file when this entry was built by replaying - *the history. - * @param sparkUser user running the application - * @param completed flag to indicate whether or not the application has completed. - * @param fileSize the size of the log file the last time the file was scanned for changes + * A KVStoreSerializer that provides Scala types serialization too, and uses the same options as + * the API serializer. */ -private class FsApplicationAttemptInfo( +private class KVStoreScalaSerializer extends KVStoreSerializer { + + mapper.registerModule(DefaultScalaModule) + mapper.setSerializationInclusion(JsonInclude.Include.NON_NULL) + mapper.setDateFormat(v1.JacksonMessageWriter.makeISODateFormat) + +} + +private[history] case class KVStoreMetadata( + val version: Long, + val logDir: String) + +private[history] case class LogInfo( + @KVIndexParam val logPath: String, + val fileSize: Long) + +private[history] class AttemptInfoWrapper( +val info: v1.ApplicationAttemptInfo, --- End diff -- Yes, I'm using this syntax because in many places there are conflicting type names in the API package and in other packages. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18887: [SPARK-20642][core] Store FsHistoryProvider listi...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18887#discussion_r133000455 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/ApplicationHistoryProvider.scala --- @@ -76,6 +76,14 @@ private[history] case class LoadedAppUI( private[history] abstract class ApplicationHistoryProvider { /** + * The number of applications available for listing. Separate method in case it's cheaper + * to get a count than to calculate the whole listing. --- End diff -- This is an interface, so this was added to allow implementations to override this method if that makes sense. It just looks like I lost the override in one of my rebases, so let me add that back. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18918: [SPARK-21707][SQL]Improvement a special case for non-det...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18918 Yes. We should fix it in `object PhysicalOperation` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18920: [SPARK-19471][SQL]AggregationIterator does not in...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18920 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18920 Thanks! Merged to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18920 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18914: [MINOR][SQL][TEST]no uncache table in joinsuite t...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18914 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18940: YSPARK-734 Change CacheLoader to limit entries based on ...
Github user dbolshak commented on the issue: https://github.com/apache/spark/pull/18940 LGTM, btw, no unit tests for the change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18929: [MINOR][LAUNCHER]Reuse EXECUTOR_MEMORY and EXECUTOR_CORE...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/18929 They're there mainly to declare the constant as a public API that must not change. (I'm not sure whether mima captures changes in constant values, since that's a binary breaking change, but that's the spirit of having these constants.) I didn't change all of the usages when I introduced them because it would be really noisy. There are also a whole bunch of other constants that could be re-used throughout the code (basically all the constants declared in `SparkLauncher`). But I think there's no real need to change this - we can encourage new code to use the constants, but leave the old code there until it needs to be changed, to avoid noise. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18914 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18700: [SPARK-21499] [SQL] Support creating persistent f...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18700#discussion_r132996396 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -1096,8 +1099,42 @@ class SessionCatalog( * This performs reflection to decide what type of [[Expression]] to return in the builder. */ protected def makeFunctionBuilder(name: String, functionClassName: String): FunctionBuilder = { -// TODO: at least support UDAFs here -throw new UnsupportedOperationException("Use sqlContext.udf.register(...) instead.") +makeFunctionBuilder(name, Utils.classForName(functionClassName)) + } + + /** + * Construct a [[FunctionBuilder]] based on the provided class that represents a function. + */ + private def makeFunctionBuilder(name: String, clazz: Class[_]): FunctionBuilder = { +// When we instantiate ScalaUDAF class, we may throw exception if the input +// expressions don't satisfy the UDAF, such as type mismatch, input number +// mismatch, etc. Here we catch the exception and throw AnalysisException instead. +(children: Seq[Expression]) => { + try { +val clsForUDAF = + Utils.classForName("org.apache.spark.sql.expressions.UserDefinedAggregateFunction") --- End diff -- ```Scala /** * The base class for implementing user-defined aggregate functions (UDAF). * * @since 1.5.0 */ @InterfaceStability.Stable abstract class UserDefinedAggregateFunction ``` This interface has been marked as stable. Can we still move it? or make a trait in Catalyst? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18700: [SPARK-21499] [SQL] Support creating persistent f...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18700#discussion_r132994080 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -1096,8 +1099,42 @@ class SessionCatalog( * This performs reflection to decide what type of [[Expression]] to return in the builder. */ protected def makeFunctionBuilder(name: String, functionClassName: String): FunctionBuilder = { --- End diff -- The changes [here](https://github.com/apache/spark/pull/18700/files#diff-ca4533edbf148c89cc0c564ab6b0aeaa) are for `HiveSessionCatalog`. Also, we have a test case in `HiveUDAFSuite.scala` to verify it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...
Github user byakuinss commented on the issue: https://github.com/apache/spark/pull/18895 Okay, I leave a comment in the issue page. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9518: [SPARK-11574][Core] Add metrics StatsD sink
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9518 **[Test build #80638 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80638/testReport)** for PR 9518 at commit [`1ec9cc9`](https://github.com/apache/spark/commit/1ec9cc967ebb8789edb80bdae28d7c24b5d49a6c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18866: [WIP][SPARK-21649][SQL] Support writing data into hive b...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18866 **[Test build #80637 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80637/testReport)** for PR 18866 at commit [`6df2e78`](https://github.com/apache/spark/commit/6df2e7803a9769cd296a4b1b37756340504f6684). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18895 Can you maybe leave any comment saying.. like .. "here is my JIRA account." in https://issues.apache.org/jira/browse/SPARK-21658 if you don't mind? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18895 Hm.. weird. I can't search your account on JIRA ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18907: [SPARK-18464][SQL][followup] support old table which doe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18907 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18907: [SPARK-18464][SQL][followup] support old table which doe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18907 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80630/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...
Github user byakuinss commented on the issue: https://github.com/apache/spark/pull/18895 @HyukjinKwon Oh, do you mean my jira full name? It's `Chin Han Yu`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18907: [SPARK-18464][SQL][followup] support old table which doe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18907 **[Test build #80630 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80630/testReport)** for PR 18907 at commit [`8f4bc08`](https://github.com/apache/spark/commit/8f4bc087df88cdb8c0308c6607d944f7bdf37019). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait CatalogRelation extends LeafNode ` * `case class UnresolvedCatalogRelation(tableMeta: CatalogTable) extends CatalogRelation ` * `case class HiveTableRelation(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/18926 To summarize the feedback from @HyukjinKwon and @gatorsmile, I think what I need to do is: * Add a test for the mixed type case. * Explicitly check for `long` in Python 2 and throw a `TypeError` from PySpark. * Add a test for the `long` `TypeError` in Python 2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18940: YSPARK-734 Change CacheLoader to limit entries based on ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18940 **[Test build #80636 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80636/testReport)** for PR 18940 at commit [`f23a4c7`](https://github.com/apache/spark/commit/f23a4c79b69fd1f8a77162da34b8821cb0cc1352). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18468 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80629/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18468 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18895: [SPARK-21658][SQL][PYSPARK] Add default None for ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18895 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18895 @byakuinss, BTW, do you mind if I ask your JIRA id? I want to assign this to you as you resolved this but I can't find the ID.. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18940: YSPARK-734 Change CacheLoader to limit entries based on ...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18940 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18895 Merged to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18468 **[Test build #80629 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80629/testReport)** for PR 18468 at commit [`a26dc15`](https://github.com/apache/spark/commit/a26dc150f6b95cc42558561cd2548de04a89f041). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18938: [SPARK-21363][SQL] Prevent name duplication in (global/l...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18938 **[Test build #80635 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80635/testReport)** for PR 18938 at commit [`b87562f`](https://github.com/apache/spark/commit/b87562f6e81c1696373b4413479f884520504345). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18930: [SPARK-21677][SQL] json_tuple throws NullPointExc...
Github user jmchung commented on a diff in the pull request: https://github.com/apache/spark/pull/18930#discussion_r132984129 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -361,10 +361,18 @@ case class JsonTuple(children: Seq[Expression]) // the fields to query are the remaining children @transient private lazy val fieldExpressions: Seq[Expression] = children.tail + // a field name given with constant null will be replaced with this pseudo field name + private val nullFieldName = "__NullFieldName" --- End diff -- @HyukjinKwon @viirya Yep, we've discarded the fake field name and use Option here. We made a slight revision to deal with the None in `foldableFieldNames` instead of creating a new function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18488: [SPARK-21255][SQL][WIP] Fixed NPE when creating e...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/18488#discussion_r132983668 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionInfo.java --- @@ -79,7 +79,7 @@ public ExpressionInfo( assert name != null; assert arguments != null; assert examples != null; -assert examples.isEmpty() || examples.startsWith("\n Examples:"); +assert examples.isEmpty() || examples.startsWith(System.lineSeparator() + "Examples:"); --- End diff -- I don't think we support Windows for dev. This assertion should probably be weakened anyway but that's a separate issue from this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18855: [SPARK-3151] [Block Manager] DiskStore.getBytes f...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18855#discussion_r132982799 --- Diff: project/SparkBuild.scala --- @@ -790,7 +790,7 @@ object TestSettings { javaOptions in Test ++= System.getProperties.asScala.filter(_._1.startsWith("spark")) .map { case (k,v) => s"-D$k=$v" }.toSeq, javaOptions in Test += "-ea", -javaOptions in Test ++= "-Xmx3g -Xss4096k" +javaOptions in Test ++= "-Xmx6g -Xss4096k" --- End diff -- I am +1 for separating it if this can be. Let's get some changes we are sure of into the code base first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18930: [SPARK-21677][SQL] json_tuple throws NullPointException ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18930 **[Test build #80634 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80634/testReport)** for PR 18930 at commit [`5d71263`](https://github.com/apache/spark/commit/5d712637ba0710d9edda79c2097b4044adca75e0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18940: YSPARK-734 Change CacheLoader to limit entries based on ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18940 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18940: YSPARK-734 Change CacheLoader to limit entries ba...
GitHub user redsanket opened a pull request: https://github.com/apache/spark/pull/18940 YSPARK-734 Change CacheLoader to limit entries based on memory footprint Right now the spark shuffle service has a cache for index files. It is based on a # of files cached (spark.shuffle.service.index.cache.entries). This can cause issues if people have a lot of reducers because the size of each entry can fluctuate based on the # of reducers. We saw an issues with a job that had 17 reducers and it caused NM with spark shuffle service to use 700-800MB or memory in NM by itself. We should change this cache to be memory based and only allow a certain memory size used. When I say memory based I mean the cache should have a limit of say 100MB. https://issues.apache.org/jira/browse/SPARK-21501 Manual Testing with 17 reducers has been performed with cache loaded up to max 100MB default limit, with each shuffle index file of size 1.3MB. Eviction takes place as soon as the total cache size reaches the 100MB limit and the objects will be ready for garbage collection there by avoiding NM to crash. No notable difference in runtime has been observed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/redsanket/spark SPARK-21501 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18940.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18940 commit f23a4c79b69fd1f8a77162da34b8821cb0cc1352 Author: Sanket ChintapalliDate: 2017-07-27T14:59:40Z YSPARK-734 Change CacheLoader to limit entries based on memory footprint --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18855: [SPARK-3151] [Block Manager] DiskStore.getBytes f...
Github user eyalfa commented on a diff in the pull request: https://github.com/apache/spark/pull/18855#discussion_r132978316 --- Diff: project/SparkBuild.scala --- @@ -790,7 +790,7 @@ object TestSettings { javaOptions in Test ++= System.getProperties.asScala.filter(_._1.startsWith("spark")) .map { case (k,v) => s"-D$k=$v" }.toSeq, javaOptions in Test += "-ea", -javaOptions in Test ++= "-Xmx3g -Xss4096k" +javaOptions in Test ++= "-Xmx6g -Xss4096k" --- End diff -- @cloud-fan , let's wait few hours and see what the other guys CCed for this (the last ones to edit the build) have to say about this. if they are also worried or do not comment I'll revert this. I must say I'm reluctant to revert these tests as I personally believe that lack of such tests contributed to spark's 2GB issues, including this one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18895 LGTM too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18895 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18895 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80633/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18895 **[Test build #80633 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80633/testReport)** for PR 18895 at commit [`d07d49a`](https://github.com/apache/spark/commit/d07d49aa9dbff1a87a947da1309612a355aaeac2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18913: [SPARK-21563][CORE] Fix race condition when serializing ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18913 LGTM, merging to master/2.2! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18895 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18913: [SPARK-21563][CORE] Fix race condition when seria...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18913 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18488: [SPARK-21255][SQL][WIP] Fixed NPE when creating encoder ...
Github user mike0sv commented on the issue: https://github.com/apache/spark/pull/18488 @srowen @HyukjinKwon it seems like it's all ok now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18895 **[Test build #80633 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80633/testReport)** for PR 18895 at commit [`d07d49a`](https://github.com/apache/spark/commit/d07d49aa9dbff1a87a947da1309612a355aaeac2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18895: [SPARK-21658][SQL][PYSPARK] Add default None for ...
Github user byakuinss commented on a diff in the pull request: https://github.com/apache/spark/pull/18895#discussion_r132968529 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1403,6 +1403,16 @@ def replace(self, to_replace, value=None, subset=None): |null| null|null| ++--++ +>>> df4.na.replace('Alice').show() +++--++ +| age|height|name| +++--++ +| 10|80|null| +| 5| null| Bob| +|null| null| Tom| +|null| null|null| +++--++ --- End diff -- Thanks for your reminding! I'll remove them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18895: [SPARK-21658][SQL][PYSPARK] Add default None for ...
Github user byakuinss commented on a diff in the pull request: https://github.com/apache/spark/pull/18895#discussion_r132968408 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1837,8 +1847,8 @@ def fill(self, value, subset=None): fill.__doc__ = DataFrame.fillna.__doc__ -def replace(self, to_replace, value, subset=None): -return self.df.replace(to_replace, value, subset) +def replace(self, to_replace, value=None, subset=None): +return self.df.replace(to_replace=to_replace, value=value, subset=subset) --- End diff -- Got it, I'll change them back. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18895 **[Test build #80632 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80632/testReport)** for PR 18895 at commit [`abdef40`](https://github.com/apache/spark/commit/abdef40adc187f1a7b8b5e4db7601b517f893741). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18895 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80632/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18895 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18895 **[Test build #80632 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80632/testReport)** for PR 18895 at commit [`abdef40`](https://github.com/apache/spark/commit/abdef40adc187f1a7b8b5e4db7601b517f893741). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18939: [SPARK-21724][SQL][DOC] Adds since information in the do...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18939 **[Test build #80631 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80631/testReport)** for PR 18939 at commit [`1cf870c`](https://github.com/apache/spark/commit/1cf870c0a54649d2cc1e29b1b7b0be6d2daa739c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18939: [WIP][SPARK-21724][SQL][DOC] Adds since information in t...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18939 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18855: [SPARK-3151] [Block Manager] DiskStore.getBytes f...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18855#discussion_r132964212 --- Diff: project/SparkBuild.scala --- @@ -790,7 +790,7 @@ object TestSettings { javaOptions in Test ++= System.getProperties.asScala.filter(_._1.startsWith("spark")) .map { case (k,v) => s"-D$k=$v" }.toSeq, javaOptions in Test += "-ea", -javaOptions in Test ++= "-Xmx3g -Xss4096k" +javaOptions in Test ++= "-Xmx6g -Xss4096k" --- End diff -- I'm a little worried about this change. Since the change to `BlockManagerSuite` is not very related to this PR, can we revert and revisit it in follow-up PR? Then we can unblock this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18907: [SPARK-18464][SQL][followup] support old table which doe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18907 **[Test build #80630 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80630/testReport)** for PR 18907 at commit [`8f4bc08`](https://github.com/apache/spark/commit/8f4bc087df88cdb8c0308c6607d944f7bdf37019). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18700: [SPARK-21499] [SQL] Support creating persistent f...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18700#discussion_r132961933 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -1096,8 +1099,42 @@ class SessionCatalog( * This performs reflection to decide what type of [[Expression]] to return in the builder. */ protected def makeFunctionBuilder(name: String, functionClassName: String): FunctionBuilder = { --- End diff -- this will be overwritten by `HiveSessionCatalog`, does it mean we can not register spark UDAF if hive support is enable? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18700: [SPARK-21499] [SQL] Support creating persistent f...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18700#discussion_r132961262 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -1096,8 +1099,42 @@ class SessionCatalog( * This performs reflection to decide what type of [[Expression]] to return in the builder. */ protected def makeFunctionBuilder(name: String, functionClassName: String): FunctionBuilder = { -// TODO: at least support UDAFs here -throw new UnsupportedOperationException("Use sqlContext.udf.register(...) instead.") +makeFunctionBuilder(name, Utils.classForName(functionClassName)) + } + + /** + * Construct a [[FunctionBuilder]] based on the provided class that represents a function. + */ + private def makeFunctionBuilder(name: String, clazz: Class[_]): FunctionBuilder = { +// When we instantiate ScalaUDAF class, we may throw exception if the input +// expressions don't satisfy the UDAF, such as type mismatch, input number +// mismatch, etc. Here we catch the exception and throw AnalysisException instead. +(children: Seq[Expression]) => { + try { +val clsForUDAF = + Utils.classForName("org.apache.spark.sql.expressions.UserDefinedAggregateFunction") --- End diff -- shall we move the UDAF interface to catalyst? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18939: [WIP][SPARK-21724][SQL][DOC] Adds since information in t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18939 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80626/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18939: [WIP][SPARK-21724][SQL][DOC] Adds since information in t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18939 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18939: [WIP][SPARK-21724][SQL][DOC] Adds since information in t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18939 **[Test build #80626 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80626/testReport)** for PR 18939 at commit [`1cf870c`](https://github.com/apache/spark/commit/1cf870c0a54649d2cc1e29b1b7b0be6d2daa739c). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18918: [SPARK-21707][SQL]Improvement a special case for non-det...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18918 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80628/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18918: [SPARK-21707][SQL]Improvement a special case for non-det...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18918 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18918: [SPARK-21707][SQL]Improvement a special case for non-det...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18918 **[Test build #80628 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80628/testReport)** for PR 18918 at commit [`bf81c45`](https://github.com/apache/spark/commit/bf81c45469e8554fc76eec0c97e2b5fc7f397f3f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18920 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80625/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18920 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18920 **[Test build #80625 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80625/testReport)** for PR 18920 at commit [`d58ffaa`](https://github.com/apache/spark/commit/d58ffaa434337ae19f4b1f59524c84943ff7934f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18934: [SPARK-21721][SQL] Clear FileSystem deleteOnExit cache w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18934 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80627/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18934: [SPARK-21721][SQL] Clear FileSystem deleteOnExit cache w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18934 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18934: [SPARK-21721][SQL] Clear FileSystem deleteOnExit cache w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18934 **[Test build #80627 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80627/testReport)** for PR 18934 at commit [`13defbb`](https://github.com/apache/spark/commit/13defbbd26a2ec4806c1fc94b890f6f43068d411). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18468 **[Test build #80629 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80629/testReport)** for PR 18468 at commit [`a26dc15`](https://github.com/apache/spark/commit/a26dc150f6b95cc42558561cd2548de04a89f041). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/18468 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18938: [SPARK-21363][SQL] Prevent name duplication in (global/l...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18938 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80624/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18938: [SPARK-21363][SQL] Prevent name duplication in (global/l...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18938 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18938: [SPARK-21363][SQL] Prevent name duplication in (global/l...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18938 **[Test build #80624 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80624/testReport)** for PR 18938 at commit [`cf68f69`](https://github.com/apache/spark/commit/cf68f6960180817530ef3755edfb0b426cb6cb77). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18902: [SPARK-21690][ML] one-pass imputer
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/18902#discussion_r132939361 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala --- @@ -133,23 +134,29 @@ class Imputer @Since("2.2.0") (@Since("2.2.0") override val uid: String) override def fit(dataset: Dataset[_]): ImputerModel = { transformSchema(dataset.schema, logging = true) val spark = dataset.sparkSession -import spark.implicits._ -val surrogates = $(inputCols).map { inputCol => - val ic = col(inputCol) - val filtered = dataset.select(ic.cast(DoubleType)) -.filter(ic.isNotNull && ic =!= $(missingValue) && !ic.isNaN) - if(filtered.take(1).length == 0) { -throw new SparkException(s"surrogate cannot be computed. " + - s"All the values in $inputCol are Null, Nan or missingValue(${$(missingValue)})") - } - val surrogate = $(strategy) match { -case Imputer.mean => filtered.select(avg(inputCol)).as[Double].first() -case Imputer.median => filtered.stat.approxQuantile(inputCol, Array(0.5), 0.001).head - } - surrogate + +val selected = dataset.select($(inputCols).map(col(_).cast("double")): _*).rdd + +val summarizer = $(strategy) match { + case Imputer.mean => +new Imputer.MeanSummarizer($(inputCols).length, $(missingValue)) + case Imputer.median => +new Imputer.MedianSummarizer($(inputCols).length, $(missingValue)) +} + +val summary = selected.treeAggregate(summarizer)( + seqOp = { case (sum, row) => sum.update(row) }, + combOp = { case (sum1, sum2) => sum1.merge(sum2) } +) + +val emptyCols = ($(inputCols) zip summary.counts).filter(_._2 == 0).map(_._1) +if(emptyCols.nonEmpty) { --- End diff -- Style: space between `if` and `(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18902: [SPARK-21690][ML] one-pass imputer
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/18902#discussion_r132939323 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala --- @@ -133,23 +134,29 @@ class Imputer @Since("2.2.0") (@Since("2.2.0") override val uid: String) override def fit(dataset: Dataset[_]): ImputerModel = { transformSchema(dataset.schema, logging = true) val spark = dataset.sparkSession -import spark.implicits._ -val surrogates = $(inputCols).map { inputCol => - val ic = col(inputCol) - val filtered = dataset.select(ic.cast(DoubleType)) -.filter(ic.isNotNull && ic =!= $(missingValue) && !ic.isNaN) - if(filtered.take(1).length == 0) { -throw new SparkException(s"surrogate cannot be computed. " + - s"All the values in $inputCol are Null, Nan or missingValue(${$(missingValue)})") - } - val surrogate = $(strategy) match { -case Imputer.mean => filtered.select(avg(inputCol)).as[Double].first() -case Imputer.median => filtered.stat.approxQuantile(inputCol, Array(0.5), 0.001).head - } - surrogate + +val selected = dataset.select($(inputCols).map(col(_).cast("double")): _*).rdd + +val summarizer = $(strategy) match { + case Imputer.mean => +new Imputer.MeanSummarizer($(inputCols).length, $(missingValue)) + case Imputer.median => +new Imputer.MedianSummarizer($(inputCols).length, $(missingValue)) +} + +val summary = selected.treeAggregate(summarizer)( + seqOp = { case (sum, row) => sum.update(row) }, + combOp = { case (sum1, sum2) => sum1.merge(sum2) } +) + +val emptyCols = ($(inputCols) zip summary.counts).filter(_._2 == 0).map(_._1) --- End diff -- Style - use dot notation here not infix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18929: [MINOR][LAUNCHER]Reuse EXECUTOR_MEMORY and EXECUTOR_CORE...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18929 Maybe @vanzin can weigh in here, because the real question is whether these constants in the launcher module are meant to be _the_ single definition of them used throughout the code. core depends on launcher and uses these constants a little bit, but not consistently. Most of the other code doesn't seem to use it. That is, there are hundreds more changes like this you could make. Consistency is good. In contrast, there are only about 8 usages of these constants outside the launcher module. Is it simpler to achieve some consistency by removing those usages? Then it seems like a small step backwards to not use them (and yet declare them), but is also much less change. In the end, this is why I don't know if it's worth trying to standardize, because it is also hard to keep it standard anyway. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18920 **[Test build #80625 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80625/testReport)** for PR 18920 at commit [`d58ffaa`](https://github.com/apache/spark/commit/d58ffaa434337ae19f4b1f59524c84943ff7934f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18939: [WIP][SPARK-21724][SQL][DOC] Adds since information in t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18939 **[Test build #80626 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80626/testReport)** for PR 18939 at commit [`1cf870c`](https://github.com/apache/spark/commit/1cf870c0a54649d2cc1e29b1b7b0be6d2daa739c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18918: [SPARK-21707][SQL]Improvement a special case for non-det...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18918 **[Test build #80628 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80628/testReport)** for PR 18918 at commit [`bf81c45`](https://github.com/apache/spark/commit/bf81c45469e8554fc76eec0c97e2b5fc7f397f3f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18934: [SPARK-21721][SQL] Clear FileSystem deleteOnExit cache w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18934 **[Test build #80627 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80627/testReport)** for PR 18934 at commit [`13defbb`](https://github.com/apache/spark/commit/13defbbd26a2ec4806c1fc94b890f6f43068d411). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18938: [SPARK-21363][SQL] Prevent name duplication in (global/l...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18938 **[Test build #80624 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80624/testReport)** for PR 18938 at commit [`cf68f69`](https://github.com/apache/spark/commit/cf68f6960180817530ef3755edfb0b426cb6cb77). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18937: [MINOR] Remove false comment from planStreamingAggregati...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18937 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18937: [MINOR] Remove false comment from planStreamingAggregati...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18937 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80623/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18937: [MINOR] Remove false comment from planStreamingAggregati...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18937 **[Test build #80623 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80623/testReport)** for PR 18937 at commit [`28919cc`](https://github.com/apache/spark/commit/28919cc9dee8408612d94e2e03be5e5fbbc076e7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18933: [WIP][SPARK-21722][SQL][PYTHON] Enable timezone-aware ti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18933 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18933: [WIP][SPARK-21722][SQL][PYTHON] Enable timezone-aware ti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18933 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80622/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18933: [WIP][SPARK-21722][SQL][PYTHON] Enable timezone-aware ti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18933 **[Test build #80622 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80622/testReport)** for PR 18933 at commit [`7df7ac9`](https://github.com/apache/spark/commit/7df7ac941da56ee9ae894ada3ae30661fddd4b03). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for stand-alo...
Github user skonto commented on the issue: https://github.com/apache/spark/pull/18630 @vanzin I dont forgot I didnt see agreement. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18935: [SPARK-9104][CORE] Expose Netty memory metrics in Spark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18935 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18935: [SPARK-9104][CORE] Expose Netty memory metrics in Spark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18935 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80618/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18918: [SPARK-21707][SQL]Improvement a special case for ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18918#discussion_r132921051 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -522,6 +522,8 @@ object ColumnPruning extends Rule[LogicalPlan] { * so remove it. */ private def removeProjectBeforeFilter(plan: LogicalPlan): LogicalPlan = plan transform { +case p1 @ Project(_, _ @ Filter(condition, _ @ Project(_, _: LeafNode))) + if !condition.deterministic => p1 --- End diff -- I don't get it from your explanation. If I understand it correctly, when there is a `Project` which selects subset of output from the `LeafNode`, if we remove it by the below pattern, we will retrieve all fields. Is it your purpose? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18935: [SPARK-9104][CORE] Expose Netty memory metrics in Spark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18935 **[Test build #80618 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80618/testReport)** for PR 18935 at commit [`05c1f4d`](https://github.com/apache/spark/commit/05c1f4de4f00639d5f1acf1b9c061e4894d8286d). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` public class TransportClientFactory implements Closeable ` * `public class NettyMemoryMetrics implements MetricSet ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18918: [SPARK-21707][SQL]Improvement a special case for ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18918#discussion_r132919858 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala --- @@ -360,5 +360,34 @@ class ColumnPruningSuite extends PlanTest { comparePlans(optimized2, expected2.analyze) } + test("SPARK-21707 the condition of filter is not deterministic that split to two project ") { --- End diff -- Actually I don't get what the test title tries to say. Can you try to rephrase it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18929: [MINOR][LAUNCHER]Reuse EXECUTOR_MEMORY and EXECUTOR_CORE...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/18929 @srowen @jerryshao --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org