[GitHub] spark pull request #19256: [SPARK-21338][SQL]implement isCascadingTruncateTa...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19256#discussion_r139302993 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/AggregatedDialect.scala --- @@ -41,4 +41,8 @@ private class AggregatedDialect(dialects: List[JdbcDialect]) extends JdbcDialect override def getJDBCType(dt: DataType): Option[JdbcType] = { dialects.flatMap(_.getJDBCType(dt)).headOption } + + override def isCascadingTruncateTable(): Option[Boolean] = { +dialects.flatMap(_.isCascadingTruncateTable).headOption --- End diff -- Why using the first one? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19256: [SPARK-21338][SQL]implement isCascadingTruncateTable() m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19256 **[Test build #81848 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81848/testReport)** for PR 19256 at commit [`7b9d6fc`](https://github.com/apache/spark/commit/7b9d6fcec61cd72ffb257abf17eb0e9d6c462f9e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16600: [SPARK-19242][SQL] SHOW CREATE TABLE should generate new...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16600 **[Test build #81849 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81849/testReport)** for PR 16600 at commit [`dfb25a1`](https://github.com/apache/spark/commit/dfb25a15ee437dd6c09d840efdcaf9d233cb8dd0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19256: [SPARK-21338][SQL]implement isCascadingTruncateTable() m...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19256 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19253: [SPARK-22037][SQL] Collapse Project if it is the ...
Github user gengliangwang closed the pull request at: https://github.com/apache/spark/pull/19253 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19253: [SPARK-22037][SQL] Collapse Project if it is the child o...
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/19253 @gatorsmile @viirya Thanks. I should close this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19218 @fjh100456 We have priority for three different inputs. Here, you just consider one of three. Please also add the extra checks. Hopefully, @dongjoon-hyun can help you answer your questions. He just finished the work in https://github.com/apache/spark/pull/19055 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19218#discussion_r139302763 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -101,6 +101,19 @@ case class InsertIntoHiveTable( val tmpLocation = getExternalTmpPath(sparkSession, hadoopConf, tableLocation) val fileSinkConf = new FileSinkDesc(tmpLocation.toString, tableDesc, false) +tableDesc.getOutputFileFormatClassName match { --- End diff -- Move the whole logics into `saveAsHiveFile`, which is being shared by `InsertIntoHiveDirCommand` and `InsertIntoHiveTable`. Both need these logics. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19218 I see. If you set `spark.sql.hive.convertMetastoreParquet` to false, you will also hit the issue for non-partitioned table. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r139302558 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -503,69 +504,307 @@ case class FindInSet(left: Expression, right: Expression) extends BinaryExpressi override def prettyName: String = "find_in_set" } +trait String2TrimExpression extends Expression with ImplicitCastInputTypes { + + override def dataType: DataType = StringType + override def inputTypes: Seq[AbstractDataType] = Seq.fill(children.size)(StringType) + + override def nullable: Boolean = children.exists(_.nullable) + override def foldable: Boolean = children.forall(_.foldable) +} + +object StringTrim { + def apply(str: Expression, trimStr: Expression) : StringTrim = StringTrim(str, Some(trimStr)) + def apply(str: Expression) : StringTrim = StringTrim(str, None) +} + /** - * A function that trim the spaces from both ends for the specified string. + * A function that takes a character string, removes the leading and trailing characters matching with the characters + * in the trim string, returns the new string. + * If BOTH and trimStr keywords are not specified, it defaults to remove space character from both ends. The trim + * function will have one argument, which contains the source string. + * If BOTH and trimStr keywords are specified, it trims the characters from both ends, and the trim function will have + * two arguments, the first argument contains trimStr, the second argument contains the source string. + * trimStr: A character string to be trimmed from the source string, if it has multiple characters, the function + * searches for each character in the source string, removes the characters from the source string until it + * encounters the first non-match character. + * BOTH: removes any character from both ends of the source string that matches characters in the trim string. */ @ExpressionDescription( - usage = "_FUNC_(str) - Removes the leading and trailing space characters from `str`.", + usage = """ +_FUNC_(str) - Removes the leading and trailing space characters from `str`. +_FUNC_(BOTH trimStr FROM str) - Remove the leading and trailing trimString from `str` + """, + arguments = """ +Arguments: + * str - a string expression + * trimString - the trim string + * BOTH, FROM - these are keyword to specify for trim string from both ends of the string + """, examples = """ Examples: > SELECT _FUNC_('SparkSQL '); SparkSQL + > SELECT _FUNC_(BOTH 'SL' FROM 'SSparkSQLS'); + parkSQ """) -case class StringTrim(child: Expression) - extends UnaryExpression with String2StringExpression { +case class StringTrim( +srcStr: Expression, +trimStr: Option[Expression] = None) + extends String2TrimExpression { - def convert(v: UTF8String): UTF8String = v.trim() + def this (trimStr: Expression, srcStr: Expression) = this(srcStr, Option(trimStr)) --- End diff -- done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19258: add MockNetCat
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19258 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19258: add MockNetCat
GitHub user bluejoe2008 opened a pull request: https://github.com/apache/spark/pull/19258 add MockNetCat ## What changes were proposed in this pull request? I add a MockNetCat class, which avoid manually launch 'nc -lk ' command for test also I put a MockNetCatTest class, a JUnit 4 test case which test MockNetCat ## How was this patch tested? run MockNetCatTest to test MockNetCat this class uses an structured streaming example which read socket data from localhost: Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bluejoe2008/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19258.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19258 commit f61636da8998c52ddb980584ed8a9e7aee01902c Author: bluejoe2...@gmail.com Date: 2017-09-17T05:14:37Z add MockNetCat --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19249: [SPARK-22032] Speed up StructType.fromInternal
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19249#discussion_r139301428 --- Diff: python/pyspark/sql/types.py --- @@ -619,7 +621,8 @@ def fromInternal(self, obj): # it's already converted by pickler return obj if self._needSerializeAnyField: -values = [f.fromInternal(v) for f, v in zip(self.fields, obj)] +values = [f.fromInternal(v) if n else v --- End diff -- Could we run a benchmark with the worst case, when all columns are needed to be converted? I think here we pay another if and extra element in the zip to prevert function call basically. This one looks okay practically but I guess we should also identify the downside. Also, let's add a comment here to describe what we are doing here and also add some links to this PR for other guys to refer the benchmarks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can break when...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19257 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81847/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can break when...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19257 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can break when...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19257 **[Test build #81847 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81847/testReport)** for PR 19257 at commit [`6ff4ed0`](https://github.com/apache/spark/commit/6ff4ed089777d5ec9edef8e7e53e847ec9379444). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19253: [SPARK-22037][SQL] Collapse Project if it is the child o...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19253 The removed `Project` might affect the rule `PhysicalOperation` and reduce the chance to prune columns when reading relations. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can break when...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19257 **[Test build #81847 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81847/testReport)** for PR 19257 at commit [`6ff4ed0`](https://github.com/apache/spark/commit/6ff4ed089777d5ec9edef8e7e53e847ec9379444). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can bre...
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/19257 [SPARK-22042] [SQL] ReorderJoinPredicates can break when child's partitioning is not decided ## What changes were proposed in this pull request? See jira description for the bug : https://issues.apache.org/jira/browse/SPARK-22042 Fix done in this PR is: In `EnsureRequirements`, apply `ReorderJoinPredicates` over the input tree before doing its core logic. Since the tree is transformed bottom-up, we can assure that the children are resolved before doing `ReorderJoinPredicates`. Theoretically this will guarantee to cover all such cases while keeping the code simple. My small grudge is for cosmetic reasons. This PR will look weird given that we don't call rules from other rules (not to my knowledge). I could have moved all the logic for `ReorderJoinPredicates` into `EnsureRequirements` but that will make it a but crowded. I am happy to discuss if there are better options. ## How was this patch tested? Added a new test case You can merge this pull request into a Git repository by running: $ git pull https://github.com/tejasapatil/spark SPARK-22042_ReorderJoinPredicates Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19257.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19257 commit 6ff4ed089777d5ec9edef8e7e53e847ec9379444 Author: Tejas Patil Date: 2017-09-17T01:10:31Z [SPARK-22042] [SQL] ReorderJoinPredicates can break when child's partitioning is not decided --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can break when...
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/19257 Jenkins test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19255: [WIP][SPARK-22029] Add lru_cache to _parse_datatype_json...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19255 This PR also needs pref tests and the results in the description. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19255: [WIP][SPARK-22029] Add lru_cache to _parse_dataty...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19255#discussion_r139298764 --- Diff: python/pyspark/sql/types.py --- @@ -24,6 +24,7 @@ import re import base64 from array import array +from functools import lru_cache --- End diff -- I think we should disable it in Python 2.x if we are going ahead with this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19256: [SPARK-21338][SQL]implement isCascadingTruncateTable() m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19256 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19256: [SPARK-21338][SQL]implement isCascadingTruncateTa...
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/19256 [SPARK-21338][SQL]implement isCascadingTruncateTable() method in Aggr⦠â¦egatedDialect ## What changes were proposed in this pull request? org.apache.spark.sql.jdbc.JdbcDialect's method: def isCascadingTruncateTable(): Option[Boolean] = None is not overriden in org.apache.spark.sql.jdbc.AggregatedDialect class. Because of this issue, when you add more than one dialect Spark doesn't truncate table because isCascadingTruncateTable always returns default None for Aggregated Dialect. Will implement isCascadingTruncateTable in AggregatedDialect class in this PR. ## How was this patch tested? In JDBCSuite, inside test("Aggregated dialects"), will add one line to test AggregatedDialect.isCascadingTruncateTable You can merge this pull request into a Git repository by running: $ git pull https://github.com/huaxingao/spark spark-21338 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19256.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19256 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18853: [SPARK-21646][SQL] CommonType for binary comparison
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18853 Thank you for your investigation! I think we need to introduce a type inference conf for it. To avoid impacting the existing Spark users, we should keep the existing behaviors. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19246: [SPARK-22025] Speeding up fromInternal for StructField
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19246 Could you mark [PySpark] in the title? cc @ueshin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19253: [SPARK-22037][SQL] Collapse Project if it is the ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19253#discussion_r139297437 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -541,6 +541,25 @@ object CollapseProject extends Rule[LogicalPlan] { agg.copy(aggregateExpressions = buildCleanedProjectList( p.projectList, agg.aggregateExpressions)) } +case agg @ Aggregate(groupingExpressions, aggregateExpressions, p: Project) => + if (haveCommonNonDeterministicOutput(groupingExpressions, p.projectList) || +haveCommonNonDeterministicOutput(aggregateExpressions, p.projectList)) { --- End diff -- This condition is not right --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19253: [SPARK-22037][SQL] Collapse Project if it is the child o...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19253 In whole-stage codegen, this will not gain any perf gain if we can remove the useless Project. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19253: [SPARK-22037][SQL] Collapse Project if it is the ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19253#discussion_r139297406 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -541,6 +541,25 @@ object CollapseProject extends Rule[LogicalPlan] { agg.copy(aggregateExpressions = buildCleanedProjectList( p.projectList, agg.aggregateExpressions)) } +case agg @ Aggregate(groupingExpressions, aggregateExpressions, p: Project) => + if (haveCommonNonDeterministicOutput(groupingExpressions, p.projectList) || +haveCommonNonDeterministicOutput(aggregateExpressions, p.projectList)) { --- End diff -- This condition is not right. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19255: [WIP][SPARK-22029] Add lru_cache to _parse_datatype_json...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19255 Could you mark [PySpark] in the title? cc @ueshin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19249: [SPARK-22032] Speed up StructType.fromInternal
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19249 Could you mark [PySpark] in the title? cc @ueshin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19234: [SPARK-22010] Change fromInternal method of TimestampTyp...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19234 Could you mark [PySpark] in the title? cc @ueshin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r139297269 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -503,69 +504,307 @@ case class FindInSet(left: Expression, right: Expression) extends BinaryExpressi override def prettyName: String = "find_in_set" } +trait String2TrimExpression extends Expression with ImplicitCastInputTypes { + + override def dataType: DataType = StringType + override def inputTypes: Seq[AbstractDataType] = Seq.fill(children.size)(StringType) + + override def nullable: Boolean = children.exists(_.nullable) + override def foldable: Boolean = children.forall(_.foldable) +} + +object StringTrim { + def apply(str: Expression, trimStr: Expression) : StringTrim = StringTrim(str, Some(trimStr)) + def apply(str: Expression) : StringTrim = StringTrim(str, None) +} + /** - * A function that trim the spaces from both ends for the specified string. + * A function that takes a character string, removes the leading and trailing characters matching with the characters + * in the trim string, returns the new string. + * If BOTH and trimStr keywords are not specified, it defaults to remove space character from both ends. The trim + * function will have one argument, which contains the source string. + * If BOTH and trimStr keywords are specified, it trims the characters from both ends, and the trim function will have + * two arguments, the first argument contains trimStr, the second argument contains the source string. + * trimStr: A character string to be trimmed from the source string, if it has multiple characters, the function + * searches for each character in the source string, removes the characters from the source string until it + * encounters the first non-match character. + * BOTH: removes any character from both ends of the source string that matches characters in the trim string. */ @ExpressionDescription( - usage = "_FUNC_(str) - Removes the leading and trailing space characters from `str`.", + usage = """ +_FUNC_(str) - Removes the leading and trailing space characters from `str`. +_FUNC_(BOTH trimStr FROM str) - Remove the leading and trailing trimString from `str` + """, + arguments = """ +Arguments: + * str - a string expression + * trimString - the trim string + * BOTH, FROM - these are keyword to specify for trim string from both ends of the string + """, examples = """ Examples: > SELECT _FUNC_('SparkSQL '); SparkSQL + > SELECT _FUNC_(BOTH 'SL' FROM 'SSparkSQLS'); + parkSQ """) -case class StringTrim(child: Expression) - extends UnaryExpression with String2StringExpression { +case class StringTrim( +srcStr: Expression, +trimStr: Option[Expression] = None) + extends String2TrimExpression { - def convert(v: UTF8String): UTF8String = v.trim() + def this (trimStr: Expression, srcStr: Expression) = this(srcStr, Option(trimStr)) + + def this(srcStr: Expression) = this(srcStr, None) override def prettyName: String = "trim" + override def children: Seq[Expression] = if (trimStr.isDefined) { +srcStr :: trimStr.get :: Nil + } else { +srcStr :: Nil + } + override def eval(input: InternalRow): Any = { +val srcString = srcStr.eval(input).asInstanceOf[UTF8String] +if (srcString == null) { + null +} else { + if (trimStr.isDefined) { +return srcString.trim(trimStr.get.eval(input).asInstanceOf[UTF8String]) + } else { +return srcString.trim() + } +} + } + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -defineCodeGen(ctx, ev, c => s"($c).trim()") +val evals = children.map(_.genCode(ctx)) +val srcString = evals(0) + +if (evals.length == 1) { + ev.copy(evals.map(_.code).mkString("\n") + s""" --- End diff -- sounds good --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19254: [MINOR][CORE] Cleanup dead code and duplication in Mem. ...
Github user original-brownbear commented on the issue: https://github.com/apache/spark/pull/19254 Test failure in PySpark appears unrelated, various tests OOMed like e.g. ```sh FAILED (errors=2) [Running ] ERROR == ERROR: setUpClass (pyspark.streaming.tests.StreamingListenerTests) -- Traceback (most recent call last): File "/home/jenkins/workspace/NewSparkPullRequestBuilder/python/pyspark/streaming/tests.py", line 65, in setUpClass cls.sc = SparkContext(appName=class_name, conf=conf) File "/home/jenkins/workspace/NewSparkPullRequestBuilder/python/pyspark/context.py", line 118, in __init__ conf, jsc, profiler_cls) File "/home/jenkins/workspace/NewSparkPullRequestBuilder/python/pyspark/context.py", line 180, in _do_init self._jsc = jsc or self._initialize_context(self._conf._jconf) File "/home/jenkins/workspace/NewSparkPullRequestBuilder/python/pyspark/context.py", line 270, in _initialize_context return self._jvm.JavaSparkContext(jconf) File "/home/jenkins/workspace/NewSparkPullRequestBuilder/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py", line 1428, in __call__ answer, self._gateway_client, None, self._fqn) File "/home/jenkins/workspace/NewSparkPullRequestBuilder/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py", line 320, in get_return_value format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.lang.OutOfMemoryError: Java heap space ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19254: [MINOR][CORE] Cleanup dead code and duplication in Mem. ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19254 **[Test build #3922 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3922/testReport)** for PR 19254 at commit [`84094a0`](https://github.com/apache/spark/commit/84094a0c75727158f96a942d8d80cc3f5721b955). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19255: [WIP][SPARK-22029] Add lru_cache to _parse_datatype_json...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19255 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19255: [WIP][SPARK-22029] Add lru_cache to _parse_datatype_json...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19255 **[Test build #81846 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81846/testReport)** for PR 19255 at commit [`c903860`](https://github.com/apache/spark/commit/c903860ee8d25afda0f969b582bdbdaa0aa8c9fe). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19255: [WIP][SPARK-22029] Add lru_cache to _parse_datatype_json...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19255 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81846/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19255: [WIP][SPARK-22029] Add lru_cache to _parse_datatype_json...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19255 **[Test build #81846 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81846/testReport)** for PR 19255 at commit [`c903860`](https://github.com/apache/spark/commit/c903860ee8d25afda0f969b582bdbdaa0aa8c9fe). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19255: [WIP][SPARK-22029] Add lru_cache to _parse_dataty...
Github user maver1ck commented on a diff in the pull request: https://github.com/apache/spark/pull/19255#discussion_r139292791 --- Diff: python/pyspark/sql/types.py --- @@ -24,6 +24,7 @@ import re import base64 from array import array +from functools import lru_cache --- End diff -- Any ideas for Python 2.7 ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19255: [WIP][SPARK-22029] Add lru_cache to _parse_dataty...
GitHub user maver1ck opened a pull request: https://github.com/apache/spark/pull/19255 [WIP][SPARK-22029] Add lru_cache to _parse_datatype_json_string ## What changes were proposed in this pull request? _parse_datatype_json_string is called many times for the same datatypes. By cacheing its result we can speed up pySpark internals. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maver1ck/spark spark_22029 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19255.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19255 commit c903860ee8d25afda0f969b582bdbdaa0aa8c9fe Author: Maciej BryÅski Date: 2017-09-16T18:51:49Z Add lru_cache to _parse_datatype_json_string --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19246: [SPARK-22025] Speeding up fromInternal for Struct...
Github user maver1ck commented on a diff in the pull request: https://github.com/apache/spark/pull/19246#discussion_r139291502 --- Diff: python/pyspark/sql/types.py --- @@ -410,6 +410,24 @@ def __init__(self, name, dataType, nullable=True, metadata=None): self.dataType = dataType self.nullable = nullable self.metadata = metadata or {} +self.needConversion = dataType.needConversion +self.toInternal = dataType.toInternal +self.fromInternal = dataType.fromInternal + +def __getstate__(self): --- End diff -- We need to handle pickle by ourselves because we have fields with function values --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19249: [SPARK-22032] Speed up StructType.fromInternal
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/19249 I added benchmark for this code --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19254: [MINOR][CORE] Cleanup dead code and duplication in Mem. ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19254 **[Test build #3922 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3922/testReport)** for PR 19254 at commit [`84094a0`](https://github.com/apache/spark/commit/84094a0c75727158f96a942d8d80cc3f5721b955). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19234: [SPARK-22010] Change fromInternal method of TimestampTyp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19234 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81845/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19234: [SPARK-22010] Change fromInternal method of TimestampTyp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19234 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19234: [SPARK-22010] Change fromInternal method of TimestampTyp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19234 **[Test build #81845 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81845/testReport)** for PR 19234 at commit [`bed6193`](https://github.com/apache/spark/commit/bed6193ad1d57a0f7873d1a7dccd6257e15a7dab). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18323: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18323 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81843/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18323: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18323 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18323: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18323 **[Test build #81843 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81843/testReport)** for PR 18323 at commit [`2e2b2ca`](https://github.com/apache/spark/commit/2e2b2ca39ffb595ec5c26bcec71afa9df8a612c6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19234: [SPARK-22010] Change fromInternal method of TimestampTyp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19234 **[Test build #81845 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81845/testReport)** for PR 19234 at commit [`bed6193`](https://github.com/apache/spark/commit/bed6193ad1d57a0f7873d1a7dccd6257e15a7dab). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19234: [SPARK-22010] Change fromInternal method of Times...
Github user maver1ck commented on a diff in the pull request: https://github.com/apache/spark/pull/19234#discussion_r139290042 --- Diff: python/pyspark/sql/types.py --- @@ -196,7 +199,9 @@ def toInternal(self, dt): def fromInternal(self, ts): if ts is not None: # using int to avoid precision loss in float -return datetime.datetime.fromtimestamp(ts // 100).replace(microsecond=ts % 100) +y, m, d, hh, mm, ss, _, _, _ = (time.gmtime(ts // 100) if _is_utc +else time.localtime(ts // 100)) +return datetime.datetime(y, m, d, hh, mm, ss, ts % 100) --- End diff -- I added some description and support for leap seconds --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18659: [SPARK-21190][PYSPARK][WIP] Python Vectorized UDF...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18659#discussion_r139289721 --- Diff: python/pyspark/serializers.py --- @@ -199,6 +211,33 @@ def __repr__(self): return "ArrowSerializer" +class ArrowPandasSerializer(ArrowSerializer): + +def __init__(self): +super(ArrowPandasSerializer, self).__init__() + +def dumps(self, series): +""" +Make an ArrowRecordBatch from a Pandas Series and serialize +""" +import pyarrow as pa --- End diff -- I am okay with leaving it as is here. I think we should catch and throw it with better messages in all cases later but let's talk about this in another place. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18659: [SPARK-21190][PYSPARK][WIP] Python Vectorized UDF...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18659#discussion_r139289612 --- Diff: python/pyspark/serializers.py --- @@ -199,6 +211,33 @@ def __repr__(self): return "ArrowSerializer" +class ArrowPandasSerializer(ArrowSerializer): + +def __init__(self): +super(ArrowPandasSerializer, self).__init__() + +def dumps(self, series): +""" +Make an ArrowRecordBatch from a Pandas Series and serialize +""" +import pyarrow as pa --- End diff -- Ah, hm .. let me check the previous discussions and think about this a bit more. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18659: [SPARK-21190][PYSPARK][WIP] Python Vectorized UDF...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18659#discussion_r139289505 --- Diff: python/pyspark/serializers.py --- @@ -199,6 +211,33 @@ def __repr__(self): return "ArrowSerializer" +class ArrowPandasSerializer(ArrowSerializer): + +def __init__(self): +super(ArrowPandasSerializer, self).__init__() + +def dumps(self, series): +""" +Make an ArrowRecordBatch from a Pandas Series and serialize +""" +import pyarrow as pa --- End diff -- Ah, I see. Thanks for explaining it. Sure, I am okay with leaving it as is here. I think we should catch and throw it with better messages in all cases (probably in the entry points) but probably let's talk about this later in another place. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19253: [SPARK-22037][SQL] Collapse Project if it is the child o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19253 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81844/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19253: [SPARK-22037][SQL] Collapse Project if it is the child o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19253 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19253: [SPARK-22037][SQL] Collapse Project if it is the child o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19253 **[Test build #81844 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81844/testReport)** for PR 19253 at commit [`525d654`](https://github.com/apache/spark/commit/525d6546b06173feda4c5806176ebb98c825dcaa). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19252: [SPARK-21969][SQL] CommandUtils.updateTableStats should ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19252 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81842/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19252: [SPARK-21969][SQL] CommandUtils.updateTableStats should ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19252 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19252: [SPARK-21969][SQL] CommandUtils.updateTableStats should ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19252 **[Test build #81842 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81842/testReport)** for PR 19252 at commit [`ba963b4`](https://github.com/apache/spark/commit/ba963b46cd2917315bc2bd0cf237c7d9f79e9d65). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19254: [MINOR][CORE] Cleanup dead code and duplication in Mem. ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19254 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19230: [SPARK-22003][SQL] support array column in vectorized re...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19230 LGTM too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19254: [MINOR][CORE] Cleanup dead code and duplication i...
GitHub user original-brownbear opened a pull request: https://github.com/apache/spark/pull/19254 [MINOR][CORE] Cleanup dead code and duplication in Mem. Management ## What changes were proposed in this pull request? * Removed the method `org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter#alignToWords`. It became unused as a result of 85b0a157543201895557d66306b38b3ca52f2151 (SPARK-15962) introducing word alignment for unsafe arrays. * Cleaned up duplicate code in memory management and unsafe sorters * The change extracting the exception paths is more than just cosmetics since it def. reduces the size the affected methods compile to ## How was this patch tested? * Build still passes after removing the method, grepping the codebase for `alignToWords` shows no reference to it anywhere either. * Dried up code is covered by existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/original-brownbear/spark cleanup-mem-consumer Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19254.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19254 commit 84094a0c75727158f96a942d8d80cc3f5721b955 Author: Armin Date: 2017-09-16T13:34:23Z [MINOR][CORE] Cleanup dead code and duplication in mem. management * Removed the method `org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter#alignToWords`. It became unused as a result of 85b0a157543201895557d66306b38b3ca52f2151 (SPARK-15962) introducing word alignment for unsafe arrays. * Cleaned up duplicate code in memory management and unsafe sorters * The change extracting the exception paths is more than just cosmetics since it def. reduces the size the affected methods compile to * Build still passes after removing the method, grepping the codebase for `alignToWords` shows no reference to it anywhere either. * Dried up code is covered by existing tests. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19230: [SPARK-22003][SQL] support array column in vectorized re...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19230 LGTM except some minor comments --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19179: [TRIVIAL][SQL] Cleanup Todo for removal of org.ap...
Github user original-brownbear closed the pull request at: https://github.com/apache/spark/pull/19179 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19230: [SPARK-22003][SQL] support array column in vector...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19230#discussion_r139287957 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnVectorSuite.scala --- @@ -0,0 +1,202 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.vectorized + +import org.scalatest.BeforeAndAfterEach + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.util.ArrayData +import org.apache.spark.sql.types._ +import org.apache.spark.unsafe.types.UTF8String + +class ColumnVectorSuite extends SparkFunSuite with BeforeAndAfterEach { + + var testVector: WritableColumnVector = _ + + private def allocate(capacity: Int, dt: DataType): WritableColumnVector = { +new OnHeapColumnVector(capacity, dt) + } + + override def afterEach(): Unit = { +testVector.close() + } + + test("boolean") { +testVector = allocate(10, BooleanType) +(0 until 10).foreach { i => + testVector.appendBoolean(i % 2 == 0) +} + +val array = new ColumnVector.Array(testVector) + +(0 until 10).foreach { i => + assert(array.get(i, BooleanType) === (i % 2 == 0)) +} + } + + test("byte") { +testVector = allocate(10, ByteType) +(0 until 10).foreach { i => + testVector.appendByte(i.toByte) +} + +val array = new ColumnVector.Array(testVector) + +(0 until 10).foreach { i => + assert(array.get(i, ByteType) === (i.toByte)) +} + } + + test("short") { +testVector = allocate(10, ShortType) +(0 until 10).foreach { i => + testVector.appendShort(i.toShort) +} + +val array = new ColumnVector.Array(testVector) + +(0 until 10).foreach { i => + assert(array.get(i, ShortType) === (i.toShort)) +} + } + + test("int") { +testVector = allocate(10, IntegerType) +(0 until 10).foreach { i => + testVector.appendInt(i) +} + +val array = new ColumnVector.Array(testVector) + +(0 until 10).foreach { i => + assert(array.get(i, IntegerType) === i) +} + } + + test("long") { +testVector = allocate(10, LongType) +(0 until 10).foreach { i => + testVector.appendLong(i) +} + +val array = new ColumnVector.Array(testVector) + +(0 until 10).foreach { i => + assert(array.get(i, LongType) === i) +} + } + + test("float") { +testVector = allocate(10, FloatType) +(0 until 10).foreach { i => + testVector.appendFloat(i.toFloat) +} + +val array = new ColumnVector.Array(testVector) + +(0 until 10).foreach { i => + assert(array.get(i, FloatType) === i.toFloat) +} + } + + test("double") { +testVector = allocate(10, DoubleType) +(0 until 10).foreach { i => + testVector.appendDouble(i.toDouble) +} + +val array = new ColumnVector.Array(testVector) + +(0 until 10).foreach { i => + assert(array.get(i, DoubleType) === i.toDouble) +} + } + + test("string") { +testVector = allocate(10, StringType) +(0 until 10).map { i => + val utf8 = s"str$i".getBytes("utf8") + testVector.appendByteArray(utf8, 0, utf8.length) +} + +val array = new ColumnVector.Array(testVector) + +(0 until 10).foreach { i => + assert(array.get(i, StringType) === UTF8String.fromString(s"str$i")) +} + } + + test("binary") { +testVector = allocate(10, BinaryType) +(0 until 10).map { i => + val utf8 = s"str$i".getBytes("utf8") + testVector.appendByteArray(utf8, 0, utf8.length) +} + +val array = new ColumnVector.Array(testVector) + +(
[GitHub] spark pull request #19230: [SPARK-22003][SQL] support array column in vector...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19230#discussion_r139287815 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java --- @@ -158,7 +158,7 @@ private static void appendValue(WritableColumnVector dst, DataType t, Object o) dst.getChildColumn(0).appendInt(c.months); dst.getChildColumn(1).appendLong(c.microseconds); } else if (t instanceof DateType) { -dst.appendInt(DateTimeUtils.fromJavaDate((Date)o)); +dst.appendInt((int) DateTimeUtils.fromJavaDate((Date)o)); --- End diff -- is it necessary? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19253: [SPARK-22037][SQL] Collapse Project if it is the child o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19253 **[Test build #81844 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81844/testReport)** for PR 19253 at commit [`525d654`](https://github.com/apache/spark/commit/525d6546b06173feda4c5806176ebb98c825dcaa). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19253: [SPARK-22037][SQL] Collapse Project if it is the ...
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/19253 [SPARK-22037][SQL] Collapse Project if it is the child of Aggregate ## What changes were proposed in this pull request? If Aggregate's child is Project, collapse the Project into the Aggregate. This rule is to make query plans simpler. ## How was this patch tested? Unit test Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gengliangwang/spark CollapseProjectIntoAggregate Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19253.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19253 commit 525d6546b06173feda4c5806176ebb98c825dcaa Author: Wang Gengliang Date: 2017-09-16T13:45:10Z Collapse Project if it is the child of Aggregate --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18323: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18323 **[Test build #81843 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81843/testReport)** for PR 18323 at commit [`2e2b2ca`](https://github.com/apache/spark/commit/2e2b2ca39ffb595ec5c26bcec71afa9df8a612c6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19218 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19218 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81841/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19218 **[Test build #81841 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81841/testReport)** for PR 19218 at commit [`c7ff62c`](https://github.com/apache/spark/commit/c7ff62cc1d4e35f7ffd2f711068a2ed5dc47e406). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19249: [SPARK-22032] Speed up StructType.fromInternal
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19249#discussion_r139286221 --- Diff: python/pyspark/sql/types.py --- @@ -619,7 +621,8 @@ def fromInternal(self, obj): # it's already converted by pickler return obj if self._needSerializeAnyField: -values = [f.fromInternal(v) for f, v in zip(self.fields, obj)] +values = [f.fromInternal(v) if n else v --- End diff -- Let's describe this in more details and add some numbers (and in your other PRs too). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19249: [SPARK-22032] Speed up StructType.fromInternal
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19249#discussion_r139286195 --- Diff: python/pyspark/sql/types.py --- @@ -619,7 +621,8 @@ def fromInternal(self, obj): # it's already converted by pickler return obj if self._needSerializeAnyField: -values = [f.fromInternal(v) for f, v in zip(self.fields, obj)] +values = [f.fromInternal(v) if n else v --- End diff -- Ah, I see. This can be recursive and per-record and we avoid here by pre-computing. I see. That makes much sense. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19249: [SPARK-22032] Speed up StructType.fromInternal
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19249 Okay, then let's go ahead then. Let'd add some numbers in the PR description. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19249: [SPARK-22032] Speed up StructType.fromInternal
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/19249 Yep. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18685: [SPARK-21439] Support for ABCMeta in PySpark
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/18685 Ping received. I'll try to add tests and resolve conflict --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19249: [SPARK-22032] Speed up StructType.fromInternal
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19249 Did it save 6~7% of the total execution time? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19249: [SPARK-22032] Speed up StructType.fromInternal
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/19249 I was checking this with my production code. This give me about 6-7% of speed up and remove 408 millions of function calls :) I'll try to create benchmark for this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19252: [SPARK-21969][SQL] CommandUtils.updateTableStats should ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19252 **[Test build #81842 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81842/testReport)** for PR 19252 at commit [`ba963b4`](https://github.com/apache/spark/commit/ba963b46cd2917315bc2bd0cf237c7d9f79e9d65). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19249: [SPARK-22032] Speed up StructType.fromInternal
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19249 To be honest, it looks too trivial that I won't bother. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19252: [SPARK-21969][SQL] CommandUtils.updateTableStats ...
GitHub user aokolnychyi opened a pull request: https://github.com/apache/spark/pull/19252 [SPARK-21969][SQL] CommandUtils.updateTableStats should call refreshTable ## What changes were proposed in this pull request? Tables in the catalog cache are not invalidated once their statistics are updated. As a consequence, existing sessions will use the cached information even though it is not valid anymore. Consider and an example below. ``` // step 1 spark.range(100).write.saveAsTable("tab1") // step 2 spark.sql("analyze table tab1 compute statistics") // step 3 spark.sql("explain cost select distinct * from tab1").show(false) // step 4 spark.range(100).write.mode("append").saveAsTable("tab1") // step 5 spark.sql("explain cost select distinct * from tab1").show(false) ``` After step 3, the table will be present in the catalog relation cache. Step 4 will correctly update the metadata inside the catalog but will NOT invalidate the cache. By the way, ``spark.sql("analyze table tab1 compute statistics")`` between step 3 and step 4 would also solve the problem. ## How was this patch tested? Current and additional unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/aokolnychyi/spark spark-21969 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19252.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19252 commit ba963b46cd2917315bc2bd0cf237c7d9f79e9d65 Author: aokolnychyi Date: 2017-09-16T11:57:52Z [SPARK-21969][SQL] CommandUtils.updateTableStats should call refreshTable --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19249: [SPARK-22032] Speed up StructType.fromInternal
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19249 How many functions call does it save and much it improves? I'd not bother fixing this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19234: [SPARK-22010] Change fromInternal method of Times...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19234#discussion_r139285021 --- Diff: python/pyspark/sql/types.py --- @@ -196,7 +199,9 @@ def toInternal(self, dt): def fromInternal(self, ts): if ts is not None: # using int to avoid precision loss in float -return datetime.datetime.fromtimestamp(ts // 100).replace(microsecond=ts % 100) +y, m, d, hh, mm, ss, _, _, _ = (time.gmtime(ts // 100) if _is_utc +else time.localtime(ts // 100)) +return datetime.datetime(y, m, d, hh, mm, ss, ts % 100) --- End diff -- BTW, the change itself seems fine to me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19234: [SPARK-22010] Change fromInternal method of Times...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19234#discussion_r139284915 --- Diff: python/pyspark/sql/types.py --- @@ -196,7 +199,9 @@ def toInternal(self, dt): def fromInternal(self, ts): if ts is not None: # using int to avoid precision loss in float -return datetime.datetime.fromtimestamp(ts // 100).replace(microsecond=ts % 100) +y, m, d, hh, mm, ss, _, _, _ = (time.gmtime(ts // 100) if _is_utc +else time.localtime(ts // 100)) +return datetime.datetime(y, m, d, hh, mm, ss, ts % 100) --- End diff -- Could we have more details in the PR description? I can follow what it fixes but I believe we should better describe details for other guys to follow the history in the future. To me, it has been always hard to track such changes and ensure the root cause of a bug. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19246: [SPARK-22025] Speeding up fromInternal for StructField
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/19246 @dongjoon-hyun I'll do it on Monday. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19234: [SPARK-22010] Change fromInternal method of Times...
Github user maver1ck commented on a diff in the pull request: https://github.com/apache/spark/pull/19234#discussion_r139284824 --- Diff: python/pyspark/sql/types.py --- @@ -196,7 +199,9 @@ def toInternal(self, dt): def fromInternal(self, ts): if ts is not None: # using int to avoid precision loss in float -return datetime.datetime.fromtimestamp(ts // 100).replace(microsecond=ts % 100) +y, m, d, hh, mm, ss, _, _, _ = (time.gmtime(ts // 100) if _is_utc +else time.localtime(ts // 100)) +return datetime.datetime(y, m, d, hh, mm, ss, ts % 100) --- End diff -- I think the only difference is this `ss = min(ss, 59)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19234: [SPARK-22010] Change fromInternal method of Times...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19234#discussion_r139284793 --- Diff: python/pyspark/sql/types.py --- @@ -196,7 +199,9 @@ def toInternal(self, dt): def fromInternal(self, ts): if ts is not None: # using int to avoid precision loss in float -return datetime.datetime.fromtimestamp(ts // 100).replace(microsecond=ts % 100) +y, m, d, hh, mm, ss, _, _, _ = (time.gmtime(ts // 100) if _is_utc +else time.localtime(ts // 100)) +return datetime.datetime(y, m, d, hh, mm, ss, ts % 100) --- End diff -- I'd explain how the current change is equivalent to `fromtimestamp` in the PR description. https://github.com/python/cpython/blob/018d353c1c8c87767d2335cd884017c2ce12e045/Lib/datetime.py#L1425-L1458 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user fjh100456 commented on the issue: https://github.com/apache/spark/pull/19218 @dongjoon-hyun A problem has been encountered, There are two ways to specify the compression format: 1. CREATE TABLE Test(id int) STORED AS ORC TBLPROPERTIES ('orc.compress'='SNAPPY'); 2. set orc.compress=ZLIB; If the table already has been specified a compressed format when it was created, and then specified another compression format by setting 'orc.compress', the latter will take effect. So whether the spark side should not have the default value, we can distinguish by 'undefined'; or discard this change, and explain in the document that 'spark.sql.parquet.compression.codec' for partitioned tables does not take effect, and 'spark.sql.orc.compression.codec ' is not valid for hive tables. Or your other better solution. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19218 **[Test build #81841 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81841/testReport)** for PR 19218 at commit [`c7ff62c`](https://github.com/apache/spark/commit/c7ff62cc1d4e35f7ffd2f711068a2ed5dc47e406). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19251: [SPARK-22035][SQL]the value of statistical logicalPlan.s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19251 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19251: [SPARK-22035][SQL]the value of statistical logica...
GitHub user heary-cao opened a pull request: https://github.com/apache/spark/pull/19251 [SPARK-22035][SQL]the value of statistical logicalPlan.stats.sizeInBytes which is not expected ## What changes were proposed in this pull request? Currently, assume there will be the same number of rows as child has. statistics `logicalPlan.stats.SizeInBytes` is calculated based on the percentage of the size of the child's data type and the size of the current data type size. But there is a problem. Statistics are not very accurate. for example: ``` val N = 1 << 3 val df = spark.range(N).selectExpr("id as k1", s"cast(id % 3 as string) as idString1", s"cast((id + 1) % 5 as string) as idString3") val sizeInBytes = df.logicalPlan.stats.sizeInBytes println("sizeInBytes : " + sizeInBytes) ``` before modifyï¼ sizeInBytes is 224(8 * 8 * ( (8 + 20 + 20 +8) / (8 + 8))). debug information in ` SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode ` ``` p.child.dataType: LongType defaultSize: 8 p.dataType: LongType defaultSize: 8 p.dataType: StringType defaultSize: 20 p.dataType: StringType defaultSize: 20 childRowSize: 16 outputRowSize: 56 p.child.stats.sizeInBytes : 64 p.stats.sizeInBytes : 224 sizeInBytes: 224 ``` after modify: sizeInBytes is 384( 8 * 8 ((8 + 20 + 20) / 8) ). debug information in ` SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode ` ``` p.child.dataType: LongType defaultSize: 8 p.dataType: LongType defaultSize: 8 p.dataType: StringType defaultSize: 20 p.dataType: StringType defaultSize: 20 childRowSize: 8 outputRowSize: 48 p.child.stats.sizeInBytes : 64 p.stats.sizeInBytes : 384 sizeInBytes: 384 ``` this PR fix it. ## How was this patch tested? add new test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/heary-cao/spark sizeInBytes Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19251.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19251 commit 99d48e5045775853ed7f487c94522e43f98a867a Author: caoxuewen Date: 2017-09-16T10:11:46Z improved statistics logicalPlan.stats.sizeInBytes --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18936: [SPARK-21688][ML][MLLIB] make native BLAS the first choi...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18936 BTW I do think this is a promising idea. I'd welcome more info about the performance implications, but if it seems like a net win for most users we should do it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19218 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19218 **[Test build #81840 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81840/testReport)** for PR 19218 at commit [`732266c`](https://github.com/apache/spark/commit/732266cc77eb370442dce0f125a70d634b3ebc6d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19218 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81840/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19218 **[Test build #81840 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81840/testReport)** for PR 19218 at commit [`732266c`](https://github.com/apache/spark/commit/732266cc77eb370442dce0f125a70d634b3ebc6d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19248: [SPARK-22027] Add missing explanation of default ...
Github user exKAZUu commented on a diff in the pull request: https://github.com/apache/spark/pull/19248#discussion_r139282831 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala --- @@ -44,7 +44,7 @@ private[ml] trait HasRegParam extends Params { private[ml] trait HasMaxIter extends Params { /** - * Param for maximum number of iterations (>= 0). + * Param for maximum number of iterations (>= 0). (default = 20) --- End diff -- I see. Sorry for my misunderstanding. I will close this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org