[GitHub] spark issue #20005: [DO-NOT-MERGE] Investigating SparkR test failure
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20005 **[Test build #85045 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85045/testReport)** for PR 20005 at commit [`6646eb5`](https://github.com/apache/spark/commit/6646eb56431fd5cd54c0c6342fca69428a2d51e0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20005: [DO-NOT-MERGE] Investigating SparkR test failure
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20005 Oh, Thanks for cran sysadmin team's quick reply, they fixed that broken page. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20005: [DO-NOT-MERGE] Investigating SparkR test failure
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20005 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r157417517 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -163,11 +204,51 @@ class CodegenContext { * the list of default imports available. * Also, generic type arguments are accepted but ignored. * @param variableName Name of the field. - * @param initCode The statement(s) to put into the init() method to initialize this field. + * @param initFunc Function includes statement(s) to put into the init() method to initialize + * this field. The argument is the name of the mutable state variable. * If left blank, the field will be default-initialized. + * @param forceInline whether the declaration and initialization code may be inlined rather than + *compacted. Please set `true` into forceInline, if you want to access the + *status fast (e.g. frequently accessed) or if you want to use the original + *variable name + * @param useFreshName If this is false and forceInline is true, the name is not changed + * @return the name of the mutable state variable, which is the original name or fresh name if + * the variable is inlined to the outer class, or an array access if the variable is to + * be stored in an array of variables of the same type. + * A variable will be inlined into the outer class when one of the following conditions + * are satisfied: + * 1. forceInline is true + * 2. its type is primitive type and the total number of the inlined mutable variables + *is less than `CodeGenerator.OUTER_CLASS_VARIABLES_THRESHOLD` + * 3. its type is multi-dimensional array + * A primitive type variable will be inlined into outer class when the total number of + * When a variable is compacted into an array, the max size of the array for compaction --- End diff -- Actually this line `A primitive type variable will be inlined into outer class when the total number of` looks redundant. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r157417320 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -163,11 +204,51 @@ class CodegenContext { * the list of default imports available. * Also, generic type arguments are accepted but ignored. * @param variableName Name of the field. - * @param initCode The statement(s) to put into the init() method to initialize this field. + * @param initFunc Function includes statement(s) to put into the init() method to initialize + * this field. The argument is the name of the mutable state variable. * If left blank, the field will be default-initialized. + * @param forceInline whether the declaration and initialization code may be inlined rather than + *compacted. Please set `true` into forceInline, if you want to access the + *status fast (e.g. frequently accessed) or if you want to use the original + *variable name + * @param useFreshName If this is false and forceInline is true, the name is not changed + * @return the name of the mutable state variable, which is the original name or fresh name if + * the variable is inlined to the outer class, or an array access if the variable is to + * be stored in an array of variables of the same type. + * A variable will be inlined into the outer class when one of the following conditions + * are satisfied: + * 1. forceInline is true + * 2. its type is primitive type and the total number of the inlined mutable variables + *is less than `CodeGenerator.OUTER_CLASS_VARIABLES_THRESHOLD` + * 3. its type is multi-dimensional array + * A primitive type variable will be inlined into outer class when the total number of + * When a variable is compacted into an array, the max size of the array for compaction --- End diff -- The sentences looks broken? I.e., `...total number of` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19989: [SPARK-22793][SQL]Memory leak in Spark Thrift Server
Github user zuotingbing commented on the issue: https://github.com/apache/spark/pull/19989 we can find the cache size of FileSystem `deleteOnExit` will keep increasing. [ ![mshot](https://user-images.githubusercontent.com/24823338/34095036-d1fcc408-e40a-11e7-9599-2acdd96da2d9.png) ](url) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19892: [SPARK-20542][FollowUp][PySpark] Bucketizer suppo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19892#discussion_r157416631 --- Diff: python/pyspark/ml/param/__init__.py --- @@ -134,6 +134,20 @@ def toListFloat(value): return [float(v) for v in value] raise TypeError("Could not convert %s to list of floats" % value) +@staticmethod +def toListListFloat(value): +""" +Convert a value to list of list of floats, if possible. +""" +if TypeConverters._can_convert_to_list(value): +value = TypeConverters.toList(value) +if all(map(lambda v: TypeConverters._can_convert_to_list(v), value)): +ll = [] +for v in value: +ll.append([float(i) for i in TypeConverters.toList(v)]) --- End diff -- `toListFloat` requires each list entry is a numeric `TypeConverters._is_numeric`. Should `toListListFloat` have such requirement? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19989: [SPARK-22793][SQL]Memory leak in Spark Thrift Server
Github user zuotingbing commented on the issue: https://github.com/apache/spark/pull/19989 as i debug, every time when i connect to thrift server through beeline, the `SessionState.start(state)` will be called two times. one is in `HiveSessionImpl:open` , **another is in `HiveClientImpl` for sql `use default`** . SessionManager.java#L151 or `HiveSessionImpl:close` only to clean the first. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r157415970 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -112,15 +112,14 @@ case class Like(left: Expression, right: Expression) extends StringRegexExpressi override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val patternClass = classOf[Pattern].getName val escapeFunc = StringUtils.getClass.getName.stripSuffix("$") + ".escapeLikeRegex" -val pattern = ctx.freshName("pattern") if (right.foldable) { val rVal = right.eval() if (rVal != null) { val regexStr = StringEscapeUtils.escapeJava(escape(rVal.asInstanceOf[UTF8String].toString())) -ctx.addMutableState(patternClass, pattern, - s"""$pattern = ${patternClass}.compile("$regexStr");""") +val pattern = ctx.addMutableState(patternClass, "patternLike", + v => s"""$v = ${patternClass}.compile("$regexStr");""", forceInline = true) --- End diff -- my only concern is about point 2. I think it is a dangerous thing to do. What if we generate a lot of frequently used variable? I think it is safer at the moment to consider only 1 and 3 in the decision whether to inline or not. In the future, with a different codegen method, we might then define a threshold over which we generate an array for the given class, otherwise we use plain variables, which IMHO would be the best option but at the moment it is not feasible... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20006: [SPARK-22821][TEST] Basic tests for WidenSetOperationTyp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20006 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85039/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20006: [SPARK-22821][TEST] Basic tests for WidenSetOperationTyp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20006 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20006: [SPARK-22821][TEST] Basic tests for WidenSetOperationTyp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20006 **[Test build #85039 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85039/testReport)** for PR 20006 at commit [`7ee9aec`](https://github.com/apache/spark/commit/7ee9aeccfc5adc6107c33401ef0c5212a65d9577). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19884 **[Test build #85044 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85044/testReport)** for PR 19884 at commit [`ad8d5e2`](https://github.com/apache/spark/commit/ad8d5e26d040f282dea531785c41dd645b9f0a9e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20005: [DO-NOT-MERGE] Investigating SparkR test failure
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20005 cc @shivaram, @felixcheung and @falaki too FYI. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r157414340 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -140,10 +140,10 @@ final class Bucketizer @Since("1.4.0") (@Since("1.4.0") override val uid: String * by `inputCol`. A warning will be printed if both are set. */ private[feature] def isBucketizeMultipleColumns(): Boolean = { -if (isSet(inputCols) && isSet(inputCol)) { - logWarning("Both `inputCol` and `inputCols` are set, we ignore `inputCols` and this " + -"`Bucketizer` only map one column specified by `inputCol`") - false +if (isSet(inputCols) && isSet(inputCol) || isSet(inputCols) && isSet(outputCol) || + isSet(inputCol) && isSet(outputCols)) { + throw new IllegalArgumentException("Both `inputCol` and `inputCols` are set, `Bucketizer` " + +"only supports setting either `inputCol` or `inputCols`.") --- End diff -- thanks. I will add these checks while doing the changes requested by @hhbyyh. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19993 @hhbyyh thanks for the review. I see that for some classes there are ongoing PRs. Thus I cannot change them now in order to have a common place and a common test. Should I wait for those PRs to be merged then? Or were you suggesting something different? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19892: [SPARK-20542][FollowUp][PySpark] Bucketizer suppo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19892#discussion_r157414231 --- Diff: python/pyspark/ml/feature.py --- @@ -315,13 +315,14 @@ class BucketedRandomProjectionLSHModel(LSHModel, JavaMLReadable, JavaMLWritable) @inherit_doc -class Bucketizer(JavaTransformer, HasInputCol, HasOutputCol, HasHandleInvalid, - JavaMLReadable, JavaMLWritable): +class Bucketizer(JavaTransformer, HasInputCol, HasOutputCol, HasInputCols, HasOutputCols, + HasHandleInvalid, JavaMLReadable, JavaMLWritable): """ Maps a column of continuous features to a column of feature buckets. --- End diff -- Update this doc too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19933: [SPARK-22744][CORE] Add a configuration to show the appl...
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/19933 And **spark.driver.host** can be set to sparkConf, why spark.submit.host can not be. `def makeDriverRef(name: String, conf: SparkConf, rpcEnv: RpcEnv): RpcEndpointRef = {` ` val driverHost: String = conf.get("spark.driver.host", "localhost")` ` val driverPort: Int = conf.getInt("spark.driver.port", 7077)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19193: [WIP][SPARK-21896][SQL] Fix Stack Overflow when window f...
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/19193 @hvanhovell here is a summary of tried scenarios: ``` val df = Seq((1, 2), (1, 3), (2, 4), (5, 5)).toDF("a", "b") val window1 = Window.orderBy('a) val window2 = Window.orderBy('a.desc) ``` **Scenario 1: An aggregate on top of a window expression (did not work before, looks OK now)** ``` df.groupBy().agg(max(rank().over(window1))).explain(true) df.groupBy().agg(max(rank().over(window1))).show(false) == Analyzed Logical Plan == max(RANK() OVER (ORDER BY a ASC NULLS FIRST unspecifiedframe$())): int Aggregate [max(_we0#27) AS max(RANK() OVER (ORDER BY a ASC NULLS FIRST unspecifiedframe$()))#16] +- Project [a#5, b#6, _we0#27, _we0#27] +- Window [rank(a#5) windowspecdefinition(a#5 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS _we0#27], [a#5 ASC NULLS FIRST] +- Project [_1#2 AS a#5, _2#3 AS b#6] +- LocalRelation [_1#2, _2#3] == Optimized Logical Plan == Aggregate [max(_we0#27) AS max(RANK() OVER (ORDER BY a ASC NULLS FIRST unspecifiedframe$()))#16] +- Project [_we0#27, _we0#27] +- Window [rank(a#5) windowspecdefinition(a#5 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS _we0#27], [a#5 ASC NULLS FIRST] +- LocalRelation [a#5] +-+ |max(RANK() OVER (ORDER BY a ASC NULLS FIRST unspecifiedframe$()))| +-+ |4| +-+ ``` **Scenario 2: An aggregate with grouping expressions on top of a window expression** TODO: Is the result wrong? What is expected? ``` df.groupBy('a).agg(max(rank().over(window1))).explain(true) df.groupBy('a).agg(max(rank().over(window1))).show(false) == Analyzed Logical Plan == a: int, max(RANK() OVER (ORDER BY a ASC NULLS FIRST unspecifiedframe$())): int Aggregate [a#5], [a#5, max(_we0#75) AS max(RANK() OVER (ORDER BY a ASC NULLS FIRST unspecifiedframe$()))#63] +- Project [a#5, b#6, _we0#75, _we0#75] +- Window [rank(a#5) windowspecdefinition(a#5 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS _we0#75], [a#5 ASC NULLS FIRST] +- Project [_1#2 AS a#5, _2#3 AS b#6] +- LocalRelation [_1#2, _2#3] == Optimized Logical Plan == Aggregate [a#5], [a#5, max(_we0#75) AS max(RANK() OVER (ORDER BY a ASC NULLS FIRST unspecifiedframe$()))#63] +- Project [a#5, _we0#75, _we0#75] +- Window [rank(a#5) windowspecdefinition(a#5 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS _we0#75], [a#5 ASC NULLS FIRST] +- LocalRelation [a#5] +---+-+ |a |max(RANK() OVER (ORDER BY a ASC NULLS FIRST unspecifiedframe$()))| +---+-+ |1 |1| |2 |3| |5 |4| +---+-+ ``` **Scenario 3: A normal aggregate, an aggregate on top of a window expression, a window expression on top of an aggregate in one query** This is resolved in two steps. ``` df.groupBy('a).agg(max(rank().over(window1)), sum('b), sum(sum('b)).over(window2)).explain(true) df.groupBy('a).agg(max(rank().over(window1)), sum('b), sum(sum('b)).over(window2)).show(false) == Analyzed Logical Plan == a: int, max(RANK() OVER (ORDER BY a ASC NULLS FIRST unspecifiedframe$())): int, sum(b): bigint, sum(sum(b)) OVER (ORDER BY a DESC NULLS LAST unspecifiedframe$()): bigint Project [a#5, max(RANK() OVER (ORDER BY a ASC NULLS FIRST unspecifiedframe$()))#116, sum(b)#117L, sum(sum(b)) OVER (ORDER BY a DESC NULLS LAST unspecifiedframe$())#118L] +- Project [a#5, max(RANK() OVER (ORDER BY a ASC NULLS FIRST unspecifiedframe$()))#116, sum(b)#117L, _w0#137L, sum(sum(b)) OVER (ORDER BY a DESC NULLS LAST unspecifiedframe$())#118L, sum(sum(b)) OVER (ORDER BY a DESC NULLS LAST unspecifiedframe$())#118L] +- Window [sum(_w0#137L) windowspecdefinition(a#5 DESC NULLS LAST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), currentrow$())) AS sum(sum(b)) OVER (ORDER BY a DESC NULLS LAST unspecifiedframe$())#118L], [a#5 DESC NULLS LAST] +- Aggregate [a#5], [a#5, max(_we0#133) AS max(RANK() OVER (ORDER BY a ASC NULLS FIRST
[GitHub] spark issue #20005: [DO-NOT-MERGE] Investigating SparkR test failure
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20005 I use R version 3.2.3 and I can reproduce this too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20005: [DO-NOT-MERGE] Investigating SparkR test failure
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20005 I see. Nice to get some helps. BTW, this also seems related with specific R version as well. I wasn't able to reproduce this with R 3.4.0 but I am able to reproduce this with R 3.1.1 (same as Jenkins'). Probably, some corner cases were handled in `tools:::.check_package_CRAN_incoming` in the upper version. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19811 LGTM except some minor comments, great job! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r157410773 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -112,15 +112,14 @@ case class Like(left: Expression, right: Expression) extends StringRegexExpressi override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val patternClass = classOf[Pattern].getName val escapeFunc = StringUtils.getClass.getName.stripSuffix("$") + ".escapeLikeRegex" -val pattern = ctx.freshName("pattern") if (right.foldable) { val rVal = right.eval() if (rVal != null) { val regexStr = StringEscapeUtils.escapeJava(escape(rVal.asInstanceOf[UTF8String].toString())) -ctx.addMutableState(patternClass, pattern, - s"""$pattern = ${patternClass}.compile("$regexStr");""") +val pattern = ctx.addMutableState(patternClass, "patternLike", + v => s"""$v = ${patternClass}.compile("$regexStr");""", forceInline = true) --- End diff -- yes, if it's not a lot of them... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20005: [DO-NOT-MERGE] Investigating SparkR test failure
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20005 Note: I've emailed to cran sysadmin team and asked them for help to fix the broken page. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r157410521 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -440,9 +435,9 @@ case class RangeExec(range: org.apache.spark.sql.catalyst.plans.logical.Range) | } """.stripMargin) -val input = ctx.freshName("input") // Right now, Range is only used when there is one upstream. -ctx.addMutableState("scala.collection.Iterator", input, s"$input = inputs[0];") +val input = ctx.addMutableState("scala.collection.Iterator", "input", --- End diff -- seems it's never used --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r157410349 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SortExec.scala --- @@ -133,20 +133,17 @@ case class SortExec( override def needStopCheck: Boolean = false override protected def doProduce(ctx: CodegenContext): String = { -val needToSort = ctx.freshName("needToSort") -ctx.addMutableState(ctx.JAVA_BOOLEAN, needToSort, s"$needToSort = true;") +val needToSort = ctx.addMutableState(ctx.JAVA_BOOLEAN, "needToSort", v => s"$v = true;") // Initialize the class member variables. This includes the instance of the Sorter and // the iterator to return sorted rows. val thisPlan = ctx.addReferenceObj("plan", this) -sorterVariable = ctx.freshName("sorter") -ctx.addMutableState(classOf[UnsafeExternalRowSorter].getName, sorterVariable, - s"$sorterVariable = $thisPlan.createSorter();") -val metrics = ctx.freshName("metrics") -ctx.addMutableState(classOf[TaskMetrics].getName, metrics, - s"$metrics = org.apache.spark.TaskContext.get().taskMetrics();") -val sortedIterator = ctx.freshName("sortedIter") -ctx.addMutableState("scala.collection.Iterator", sortedIterator, "") +sorterVariable = ctx.addMutableState(classOf[UnsafeExternalRowSorter].getName, "sorter", + v => s"$v = $thisPlan.createSorter();", forceInline = true) +val metrics = ctx.addMutableState(classOf[TaskMetrics].getName, "metrics", + v => s"$v = org.apache.spark.TaskContext.get().taskMetrics();", forceInline = true) +val sortedIterator = ctx.addMutableState("scala.collection.Iterator", "sortedIter", + forceInline = true) --- End diff -- e.g. https://github.com/apache/spark/pull/19811/files#diff-2eb948516b5beaeb746aadac27fbd5b4R613 ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r157410191 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SortExec.scala --- @@ -133,20 +133,17 @@ case class SortExec( override def needStopCheck: Boolean = false override protected def doProduce(ctx: CodegenContext): String = { -val needToSort = ctx.freshName("needToSort") -ctx.addMutableState(ctx.JAVA_BOOLEAN, needToSort, s"$needToSort = true;") +val needToSort = ctx.addMutableState(ctx.JAVA_BOOLEAN, "needToSort", v => s"$v = true;") // Initialize the class member variables. This includes the instance of the Sorter and // the iterator to return sorted rows. val thisPlan = ctx.addReferenceObj("plan", this) -sorterVariable = ctx.freshName("sorter") -ctx.addMutableState(classOf[UnsafeExternalRowSorter].getName, sorterVariable, - s"$sorterVariable = $thisPlan.createSorter();") -val metrics = ctx.freshName("metrics") -ctx.addMutableState(classOf[TaskMetrics].getName, metrics, - s"$metrics = org.apache.spark.TaskContext.get().taskMetrics();") -val sortedIterator = ctx.freshName("sortedIter") -ctx.addMutableState("scala.collection.Iterator", sortedIterator, "") +sorterVariable = ctx.addMutableState(classOf[UnsafeExternalRowSorter].getName, "sorter", + v => s"$v = $thisPlan.createSorter();", forceInline = true) +val metrics = ctx.addMutableState(classOf[TaskMetrics].getName, "metrics", + v => s"$v = org.apache.spark.TaskContext.get().taskMetrics();", forceInline = true) +val sortedIterator = ctx.addMutableState("scala.collection.Iterator", "sortedIter", + forceInline = true) --- End diff -- one question: is there any other places like this? do you have a list? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r157410164 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -112,15 +112,14 @@ case class Like(left: Expression, right: Expression) extends StringRegexExpressi override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val patternClass = classOf[Pattern].getName val escapeFunc = StringUtils.getClass.getName.stripSuffix("$") + ".escapeLikeRegex" -val pattern = ctx.freshName("pattern") if (right.foldable) { val rVal = right.eval() if (rVal != null) { val regexStr = StringEscapeUtils.escapeJava(escape(rVal.asInstanceOf[UTF8String].toString())) -ctx.addMutableState(patternClass, pattern, - s"""$pattern = ${patternClass}.compile("$regexStr");""") +val pattern = ctx.addMutableState(patternClass, "patternLike", + v => s"""$v = ${patternClass}.compile("$regexStr");""", forceInline = true) --- End diff -- Now, we have three rules to apply inlining 1. Have to use the original name 2. Frequently used in the hot spot 3. [Not expected to be frequently generated](https://github.com/apache/spark/pull/19811#discussion_r157110933) proposed by @viirya Now, we have no rule for 2. I will try to run microbenchmark for 2. Is it better to add these benchmarks into the benchmark directory? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r157410133 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SortExec.scala --- @@ -133,20 +133,17 @@ case class SortExec( override def needStopCheck: Boolean = false override protected def doProduce(ctx: CodegenContext): String = { -val needToSort = ctx.freshName("needToSort") -ctx.addMutableState(ctx.JAVA_BOOLEAN, needToSort, s"$needToSort = true;") +val needToSort = ctx.addMutableState(ctx.JAVA_BOOLEAN, "needToSort", v => s"$v = true;") // Initialize the class member variables. This includes the instance of the Sorter and // the iterator to return sorted rows. val thisPlan = ctx.addReferenceObj("plan", this) -sorterVariable = ctx.freshName("sorter") -ctx.addMutableState(classOf[UnsafeExternalRowSorter].getName, sorterVariable, - s"$sorterVariable = $thisPlan.createSorter();") -val metrics = ctx.freshName("metrics") -ctx.addMutableState(classOf[TaskMetrics].getName, metrics, - s"$metrics = org.apache.spark.TaskContext.get().taskMetrics();") -val sortedIterator = ctx.freshName("sortedIter") -ctx.addMutableState("scala.collection.Iterator", sortedIterator, "") +sorterVariable = ctx.addMutableState(classOf[UnsafeExternalRowSorter].getName, "sorter", + v => s"$v = $thisPlan.createSorter();", forceInline = true) +val metrics = ctx.addMutableState(classOf[TaskMetrics].getName, "metrics", + v => s"$v = org.apache.spark.TaskContext.get().taskMetrics();", forceInline = true) +val sortedIterator = ctx.addMutableState("scala.collection.Iterator", "sortedIter", + forceInline = true) --- End diff -- this looks reasonable as it's very unlikely we have a lot of sort operators in one stage. We have to inline it manually as we don't have the ability to find this out automatically yet. Same as https://github.com/apache/spark/pull/19811/files#r157408804 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r157409537 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -217,7 +217,7 @@ case class Stack(children: Seq[Expression]) extends Generator { ctx.addMutableState( s"$wrapperClass", ev.value, - s"${ev.value} = $wrapperClass$$.MODULE$$.make($rowData);") --- End diff -- I will do it in another PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19884 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19884 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85043/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19884 **[Test build #85043 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85043/testReport)** for PR 19884 at commit [`084d30b`](https://github.com/apache/spark/commit/084d30b6fae89fbcb7f47013f47b7848ff135387). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19884 **[Test build #85043 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85043/testReport)** for PR 19884 at commit [`084d30b`](https://github.com/apache/spark/commit/084d30b6fae89fbcb7f47013f47b7848ff135387). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r157408804 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala --- @@ -385,20 +385,43 @@ class CodeGenerationSuite extends SparkFunSuite with ExpressionEvalHelper { val ctx = new CodegenContext val schema = new StructType().add("a", IntegerType).add("b", StringType) CreateExternalRow(Seq(Literal(1), Literal("x")), schema).genCode(ctx) -assert(ctx.mutableStates.isEmpty) +assert(ctx.inlinedMutableStates.isEmpty) } test("SPARK-22696: InitializeJavaBean should not use global variables") { val ctx = new CodegenContext InitializeJavaBean(Literal.fromObject(new java.util.LinkedList[Int]), Map("add" -> Literal(1))).genCode(ctx) -assert(ctx.mutableStates.isEmpty) +assert(ctx.inlinedMutableStates.isEmpty) } test("SPARK-22716: addReferenceObj should not add mutable states") { val ctx = new CodegenContext val foo = new Object() ctx.addReferenceObj("foo", foo) -assert(ctx.mutableStates.isEmpty) +assert(ctx.inlinedMutableStates.isEmpty) + } + + test("SPARK-18016: define mutable states by using an array") { +val ctx1 = new CodegenContext +for (i <- 1 to CodeGenerator.OUTER_CLASS_VARIABLES_THRESHOLD + 10) { + ctx1.addMutableState(ctx1.JAVA_INT, "i", v => s"$v = $i;") +} +assert(ctx1.inlinedMutableStates.size == CodeGenerator.OUTER_CLASS_VARIABLES_THRESHOLD) +// When the number of primitive type mutable states is over the threshold, others are +// allocated into an array --- End diff -- Some notes: It's better if we can collect all mutable states before deciding which one should be inlined. However it's impossible to do with the current string based codegen framework, we need to decide inline or not immediately. We can revisit this in the future when we have an AST based codegen framework. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20005: [DO-NOT-MERGE] Investigating SparkR test failure
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20005 I've tried to investigate this and I think it is due to a wrongly formatted document at https://cran.r-project.org/src/contrib/PACKAGES.in: ``` Package: alR X-CRAN-Comment: Archived again on 2017-12-15 as none of the errors (failure to install with clang and on Solaris, buffer overflows) were corrected. X-CRAN-History: Archived on 2017-12-01 as C++ errors (and segfaults) were not corrected despite reminders. Unarchived on 2017-12-07. ``` There is an additional breaking line here and it breaks the function `tools:::.check_package_CRAN_incoming`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r157408425 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -112,15 +112,14 @@ case class Like(left: Expression, right: Expression) extends StringRegexExpressi override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val patternClass = classOf[Pattern].getName val escapeFunc = StringUtils.getClass.getName.stripSuffix("$") + ".escapeLikeRegex" -val pattern = ctx.freshName("pattern") if (right.foldable) { val rVal = right.eval() if (rVal != null) { val regexStr = StringEscapeUtils.escapeJava(escape(rVal.asInstanceOf[UTF8String].toString())) -ctx.addMutableState(patternClass, pattern, - s"""$pattern = ${patternClass}.compile("$regexStr");""") +val pattern = ctx.addMutableState(patternClass, "patternLike", + v => s"""$v = ${patternClass}.compile("$regexStr");""", forceInline = true) --- End diff -- Do we have a clear rule when a global variable should be inlined for better performance? e.g. a microbenchmark showing noteworthy difference definitely proves we should inline. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r157408061 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -217,7 +217,7 @@ case class Stack(children: Seq[Expression]) extends Generator { ctx.addMutableState( s"$wrapperClass", ev.value, - s"${ev.value} = $wrapperClass$$.MODULE$$.make($rowData);") --- End diff -- We can localize the global variable `ev.value` here to save one global variable slot. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r157407681 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -163,11 +204,51 @@ class CodegenContext { * the list of default imports available. * Also, generic type arguments are accepted but ignored. * @param variableName Name of the field. - * @param initCode The statement(s) to put into the init() method to initialize this field. + * @param initFunc Function includes statement(s) to put into the init() method to initialize + * this field. The argument is the name of the mutable state variable. * If left blank, the field will be default-initialized. + * @param forceInline whether the declaration and initialization code may be inlined rather than + *compacted. Please set `true` into forceInline, if you want to access the + *status fast (e.g. frequently accessed) or if you want to use the original + *variable name + * @param useFreshName If this is false and forceInline is true, the name is not changed --- End diff -- more accurate: `If this is false and mutable state ends up inlining in the outer class, the name is not changed` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19933: [SPARK-22744][CORE] Add a configuration to show the appl...
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/19933 > driver host is actually probably already in the logs. What I were care about is submit host not driver host and I have built the spark under trunk and tested in yarn-cluster mode, there is no submit host in am log (even enable debug). So current platform team still difficult to know where an application submitted. They even have to check the YARN audit log to find out where an create operation upon some .sparkStage/files to know the submit host. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19981 **[Test build #85042 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85042/testReport)** for PR 19981 at commit [`9f5d21f`](https://github.com/apache/spark/commit/9f5d21f1aa71b1e2ee0a820a1430f453ecf66f0f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19981 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19981 **[Test build #85041 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85041/testReport)** for PR 19981 at commit [`5737705`](https://github.com/apache/spark/commit/57377055a3db6f12f67c55524917e4da1b49df94). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19981 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85041/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19981 **[Test build #85041 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85041/testReport)** for PR 19981 at commit [`5737705`](https://github.com/apache/spark/commit/57377055a3db6f12f67c55524917e4da1b49df94). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19981 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85040/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19981 **[Test build #85040 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85040/testReport)** for PR 19981 at commit [`2e66722`](https://github.com/apache/spark/commit/2e667222690cad5dcd839b220ae9f938203813db). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19981 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19981 **[Test build #85040 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85040/testReport)** for PR 19981 at commit [`2e66722`](https://github.com/apache/spark/commit/2e667222690cad5dcd839b220ae9f938203813db). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19981 @vanzin We can add `AppStatusPlugin` back if you can convince other people that the SQL listener and UI tag should be set up during `SparkContext` initialization(I'm kind of convinced after some more thoughts, for future SQL listener usage it's more important to catch all events including non-SQL ones), and figure out how to keep `SharedState.statusStore`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20004 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85038/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20004 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20004 **[Test build #85038 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85038/testReport)** for PR 20004 at commit [`652dec8`](https://github.com/apache/spark/commit/652dec84bc1ee57e3cf0cfb553b821cbe54cdef3). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20004 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20004 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85037/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20004 **[Test build #85037 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85037/testReport)** for PR 20004 at commit [`2285807`](https://github.com/apache/spark/commit/22858071cd25acf7732c1ca4d984cced4d403b90). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20004 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85036/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20004 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20004 **[Test build #85036 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85036/testReport)** for PR 20004 at commit [`57be3f1`](https://github.com/apache/spark/commit/57be3f1ba654ed2ea42c06d9d9325f06222196ef). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19811 ping @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20006: [SPARK-22821][TEST] Basic tests for WidenSetOperationTyp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20006 **[Test build #85039 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85039/testReport)** for PR 20006 at commit [`7ee9aec`](https://github.com/apache/spark/commit/7ee9aeccfc5adc6107c33401ef0c5212a65d9577). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20006: [SPARK-22821][TEST] Basic tests for WidenSetOpera...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/20006 [SPARK-22821][TEST] Basic tests for WidenSetOperationTypes, BooleanEquality, StackCoercion and Division ## What changes were proposed in this pull request? Test Coverage for `WidenSetOperationTypes`, `BooleanEquality`, `StackCoercion` and `Division`, this is a Sub-tasks for [SPARK-22722](https://issues.apache.org/jira/browse/SPARK-22722). ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-22821 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20006.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20006 commit 7ee9aeccfc5adc6107c33401ef0c5212a65d9577 Author: Yuming WangDate: 2017-12-18T04:43:55Z Basic tests for WidenSetOperationTypes, BooleanEquality, StackCoercion and Division --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19884 @wesm I was able to install pyarrow 0.8.0 to my local environment via conda. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19981: [SPARK-22786][SQL] only use AppStatusPlugin in hi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19981#discussion_r157397655 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListenerSuite.scala --- @@ -489,16 +501,17 @@ private case class MyPlan(sc: SparkContext, expectedValue: Long) extends LeafExe } -class SQLListenerMemoryLeakSuite extends SparkFunSuite { +class SQLAppStatusListenerMemoryLeakSuite extends SparkFunSuite { - // TODO: this feature is not yet available in SQLAppStatusStore. - ignore("no memory leak") { + test("no memory leak") { quietly { val conf = new SparkConf() .setMaster("local") .setAppName("test") +.set(LIVE_ENTITY_UPDATE_PERIOD, 0L) // Update the UI data immediately .set(config.MAX_TASK_FAILURES, 1) // Don't retry the tasks to run this test quickly -.set("spark.sql.ui.retainedExecutions", "50") // Set it to 50 to run this test quickly +// TODO: this feature is not yet available in SQLAppStatusStore. +// .set("spark.sql.ui.retainedExecutions", "50") // Set it to 50 to run this test quickly --- End diff -- ok let me revert this part --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19864 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85035/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19864 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19864 **[Test build #85035 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85035/testReport)** for PR 19864 at commit [`b2d829b`](https://github.com/apache/spark/commit/b2d829b2ff30d03af681931260a78da579915624). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19988: [Spark-22795] [ML] Raise error when line search i...
Github user mrkm4ntr commented on a diff in the pull request: https://github.com/apache/spark/pull/19988#discussion_r157396794 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -244,9 +244,9 @@ class LinearSVC @Since("2.2.0") ( } bcFeaturesStd.destroy(blocking = false) - if (state == null) { + if (state == null || state.searchFailed) { val msg = s"${optimizer.getClass.getName} failed." -logError(msg) +logWarning(msg) throw new SparkException(msg) --- End diff -- The message that line search failed is logged by breeze. https://github.com/scalanlp/breeze/blob/4611fe3f4466ac5b1885e9d6288ba27fc9d205ec/math/src/main/scala/breeze/optimize/FirstOrderMinimizer.scala#L78 I think it is a bit verbose that logging at this place. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19977 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19977 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85031/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19977 **[Test build #85031 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85031/testReport)** for PR 19977 at commit [`2d3926e`](https://github.com/apache/spark/commit/2d3926e546b3aa60c46449151706941ed17e2441). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` class FunctionArgumentConversion(conf: SQLConf) extends TypeCoercionRule ` * `case class Concat(children: Seq[Expression], isBinaryMode: Boolean = false)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nested expr...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19813 > Another solution is to not make this contract. By a quick look this seems hard to do, because at the time of doing this, the code(method body) is already generated and we don't know how to replace statement like a + 1 with the generated parameter name, inside the method body. We may need to do this fix earlier in the codegen procedure. @cloud-fan Can you say more about this limitation? Can't we just replace all occurrences of `a + 1` with a parameter name like `varA` in method body if we are going to use it as parameter name in the split method? E.g., from a method body like: ```scala void splitMethod1(int a + 1, int b, boolean c) { ... int d = a + 1 + b; if (a + 1 > 10 && c) { ... } ``` to ```scala void splitMethod1(int varA, int b, boolean c) { ... int d = varA + b; if (varA > 10 && c) { ... } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r157393913 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -140,10 +140,10 @@ final class Bucketizer @Since("1.4.0") (@Since("1.4.0") override val uid: String * by `inputCol`. A warning will be printed if both are set. */ private[feature] def isBucketizeMultipleColumns(): Boolean = { -if (isSet(inputCols) && isSet(inputCol)) { - logWarning("Both `inputCol` and `inputCols` are set, we ignore `inputCols` and this " + -"`Bucketizer` only map one column specified by `inputCol`") - false +if (isSet(inputCols) && isSet(inputCol) || isSet(inputCols) && isSet(outputCol) || + isSet(inputCol) && isSet(outputCols)) { + throw new IllegalArgumentException("Both `inputCol` and `inputCols` are set, `Bucketizer` " + +"only supports setting either `inputCol` or `inputCols`.") --- End diff -- Here it is better to add one more check: if single column, only set `splits` param. if multiple column, only set `splitsArray` param. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19988: [Spark-22795] [ML] Raise error when line search i...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19988#discussion_r157393049 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -244,9 +244,9 @@ class LinearSVC @Since("2.2.0") ( } bcFeaturesStd.destroy(blocking = false) - if (state == null) { + if (state == null || state.searchFailed) { val msg = s"${optimizer.getClass.getName} failed." -logError(msg) +logWarning(msg) throw new SparkException(msg) --- End diff -- But the msg in the Exception should include the root reason "line search failed." Current msg only including "LBFGS failed." --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19994: [SPARK-22810][ML][PySpark] Expose Python API for ...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19994#discussion_r157391859 --- Diff: python/pyspark/ml/regression.py --- @@ -155,6 +183,14 @@ def intercept(self): """ return self._call_java("intercept") +@property +@since("2.3.0") +def scale(self): +""" +The value by which \|y - X'w\| is scaled down when loss is "huber". --- End diff -- add doc "When square loss the value is 1.0" --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19994: [SPARK-22810][ML][PySpark] Expose Python API for ...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19994#discussion_r157391801 --- Diff: python/pyspark/ml/tests.py --- @@ -1725,6 +1725,27 @@ def test_offset(self): self.assertTrue(np.isclose(model.intercept, -1.561613, atol=1E-4)) +class LinearRegressionTest(SparkSessionTestCase): + +def test_linear_regression_with_huber_loss(self): + +data_path = "data/mllib/sample_linear_regression_data.txt" +df = self.spark.read.format("libsvm").load(data_path) + +lir = LinearRegression(loss="huber") --- End diff -- The testcase should include `setEpsilon` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20004 **[Test build #85038 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85038/testReport)** for PR 20004 at commit [`652dec8`](https://github.com/apache/spark/commit/652dec84bc1ee57e3cf0cfb553b821cbe54cdef3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20004 **[Test build #85037 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85037/testReport)** for PR 20004 at commit [`2285807`](https://github.com/apache/spark/commit/22858071cd25acf7732c1ca4d984cced4d403b90). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20004: [Spark-22818][SQL] csv escape of quote escape
Github user ep1804 commented on a diff in the pull request: https://github.com/apache/spark/pull/20004#discussion_r157391252 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -482,6 +482,36 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils { } } + test("save csv with quote escaping, using escapeQuoteEscaping option") { +withTempPath { path => + // when a string to be quoted ends with the escape character, escapeQuoteEscaping is required + val df1 = Seq( +"""You are "beautiful, +"""Yes, \in the inside\""", // ends with the escape character '\\' +"""AA\\BB""",// 2 escape char +"""AA""BB""",// 2 quote char +"""AA\"BB\""", // 1 escape char and 1 quote char +"""AA\""BB\""", // 1 escape char and 2 quote char +"""AA\\"BB\""" // 2 escape char and 1 quote char + ).toDF() + + // escapeQuotes is true by default + // quote character is '\"' by default + // escape character is '\\' by default + df1.coalesce(1).write +.format("csv") +.option("escapeQuoteEscaping", "\\") --- End diff -- OK. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20004: [Spark-22818][SQL] csv escape of quote escape
Github user ep1804 commented on a diff in the pull request: https://github.com/apache/spark/pull/20004#discussion_r157391233 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -482,6 +482,36 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils { } } + test("save csv with quote escaping, using escapeQuoteEscaping option") { +withTempPath { path => + // when a string to be quoted ends with the escape character, escapeQuoteEscaping is required + val df1 = Seq( +"""You are "beautiful, +"""Yes, \in the inside\""", // ends with the escape character '\\' +"""AA\\BB""",// 2 escape char +"""AA""BB""",// 2 quote char +"""AA\"BB\""", // 1 escape char and 1 quote char +"""AA\""BB\""", // 1 escape char and 2 quote char +"""AA\\"BB\""" // 2 escape char and 1 quote char + ).toDF() + + // escapeQuotes is true by default + // quote character is '\"' by default + // escape character is '\\' by default + df1.coalesce(1).write +.format("csv") +.option("escapeQuoteEscaping", "\\") --- End diff -- It fails. As follows: ``` [info] == Results == [info] !== Correct Answer - 7 == == Spark Answer - 7 == [info] !struct<_c0:string> struct [info][AA""BB] [AA""BB] [info] ![AA\""BB"] [AA\""BB\] [info] ![AA\"BB"] [AA\"BB\] [info] ![AA\\"BB"] [AA\\"BB\] [info][AA\\BB] [AA\\BB] [info] ![Yes, \in the inside"] [Yes, \in the inside\] [info][You are "beautiful"] [You are "beautiful"] (QueryTest.scala:163) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20004: [Spark-22818][SQL] csv escape of quote escape
Github user ep1804 commented on a diff in the pull request: https://github.com/apache/spark/pull/20004#discussion_r157391102 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala --- @@ -249,6 +249,8 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo * `com.databricks.spark.csv`. * `escape` (default `\`): sets the single character used for escaping quotes inside * an already quoted value. + * `escapeQuoteEscaping` (default `\0`): sets the single character used for escaping + * the quote-escape character. --- End diff -- About the naming, Univocity and spark uses little bit different option namings as follows: | univocity | spark | | :--| :-- | | delimiter| sep | | quote | quote | | quoteEscape | escape | | charToEscapeQuoteEscaping | escapeQuoteEscaping | I tried to follow Spark's way by simply removing `charTo`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18576: [SPARK-21351][SQL] Update nullability based on children'...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18576 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18576: [SPARK-21351][SQL] Update nullability based on children'...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18576 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85032/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18576: [SPARK-21351][SQL] Update nullability based on children'...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18576 **[Test build #85032 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85032/testReport)** for PR 18576 at commit [`84d32ef`](https://github.com/apache/spark/commit/84d32efbc05d614ce8b9f942623b05bec67d3cfc). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class FilterExec(condition: Expression, child: SparkPlan, outputAttrs: Seq[Attribute])` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20004 **[Test build #85036 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85036/testReport)** for PR 20004 at commit [`57be3f1`](https://github.com/apache/spark/commit/57be3f1ba654ed2ea42c06d9d9325f06222196ef). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20004: [Spark-22818][SQL] csv escape of quote escape
Github user ep1804 commented on a diff in the pull request: https://github.com/apache/spark/pull/20004#discussion_r157389005 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala --- @@ -249,6 +249,8 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo * `com.databricks.spark.csv`. * `escape` (default `\`): sets the single character used for escaping quotes inside * an already quoted value. + * `escapeQuoteEscaping` (default `\0`): sets the single character used for escaping --- End diff -- OK. I'll do it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19864 **[Test build #85035 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85035/testReport)** for PR 19864 at commit [`b2d829b`](https://github.com/apache/spark/commit/b2d829b2ff30d03af681931260a78da579915624). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19864 Let's hold it on. Seems globally failing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17400: [SPARK-19981][SQL] Update output partitioning info. when...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17400 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85034/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17400: [SPARK-19981][SQL] Update output partitioning info. when...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17400 **[Test build #85034 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85034/testReport)** for PR 17400 at commit [`e6ce117`](https://github.com/apache/spark/commit/e6ce11716015cac614f6155ef2e9a88b420d3ee6). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class AggregateExec extends UnaryExecNode ` * `case class ProjectExec(` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17400: [SPARK-19981][SQL] Update output partitioning info. when...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17400 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20005: [DO-NOT-MERGE] Investigating SparkR test failure
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20005 **[Test build #85033 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85033/testReport)** for PR 20005 at commit [`6646eb5`](https://github.com/apache/spark/commit/6646eb56431fd5cd54c0c6342fca69428a2d51e0). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20005: [DO-NOT-MERGE] Investigating SparkR test failure
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20005 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20005: [DO-NOT-MERGE] Investigating SparkR test failure
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20005 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85033/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17400: [SPARK-19981][SQL] Update output partitioning info. when...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17400 **[Test build #85034 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85034/testReport)** for PR 17400 at commit [`e6ce117`](https://github.com/apache/spark/commit/e6ce11716015cac614f6155ef2e9a88b420d3ee6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19864 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19864 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19864 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85030/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19864 **[Test build #85030 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85030/testReport)** for PR 19864 at commit [`b2d829b`](https://github.com/apache/spark/commit/b2d829b2ff30d03af681931260a78da579915624). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20005: [DO-NOT-MERGE] Investigating SparkR test failure
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20005 **[Test build #85033 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85033/testReport)** for PR 20005 at commit [`6646eb5`](https://github.com/apache/spark/commit/6646eb56431fd5cd54c0c6342fca69428a2d51e0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org