[GitHub] spark issue #20005: [DO-NOT-MERGE] Investigating SparkR test failure

2017-12-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20005
  
**[Test build #85045 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85045/testReport)**
 for PR 20005 at commit 
[`6646eb5`](https://github.com/apache/spark/commit/6646eb56431fd5cd54c0c6342fca69428a2d51e0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20005: [DO-NOT-MERGE] Investigating SparkR test failure

2017-12-17 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20005
  
Oh, Thanks for cran sysadmin team's quick reply, they fixed that broken 
page.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20005: [DO-NOT-MERGE] Investigating SparkR test failure

2017-12-17 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20005
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...

2017-12-17 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19811#discussion_r157417517
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -163,11 +204,51 @@ class CodegenContext {
* the list of default imports available.
* Also, generic type arguments are accepted but ignored.
* @param variableName Name of the field.
-   * @param initCode The statement(s) to put into the init() method to 
initialize this field.
+   * @param initFunc Function includes statement(s) to put into the init() 
method to initialize
+   * this field. The argument is the name of the mutable 
state variable.
* If left blank, the field will be default-initialized.
+   * @param forceInline whether the declaration and initialization code 
may be inlined rather than
+   *compacted. Please set `true` into forceInline, if 
you want to access the
+   *status fast (e.g. frequently accessed) or if you 
want to use the original
+   *variable name
+   * @param useFreshName If this is false and forceInline is true, the 
name is not changed
+   * @return the name of the mutable state variable, which is the original 
name or fresh name if
+   * the variable is inlined to the outer class, or an array 
access if the variable is to
+   * be stored in an array of variables of the same type.
+   * A variable will be inlined into the outer class when one of 
the following conditions
+   * are satisfied:
+   * 1. forceInline is true
+   * 2. its type is primitive type and the total number of the 
inlined mutable variables
+   *is less than 
`CodeGenerator.OUTER_CLASS_VARIABLES_THRESHOLD`
+   * 3. its type is multi-dimensional array
+   * A primitive type variable will be inlined into outer class 
when the total number of
+   * When a variable is compacted into an array, the max size of 
the array for compaction
--- End diff --

Actually this line `A primitive type variable will be inlined into outer 
class when the total number of` looks redundant.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...

2017-12-17 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19811#discussion_r157417320
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -163,11 +204,51 @@ class CodegenContext {
* the list of default imports available.
* Also, generic type arguments are accepted but ignored.
* @param variableName Name of the field.
-   * @param initCode The statement(s) to put into the init() method to 
initialize this field.
+   * @param initFunc Function includes statement(s) to put into the init() 
method to initialize
+   * this field. The argument is the name of the mutable 
state variable.
* If left blank, the field will be default-initialized.
+   * @param forceInline whether the declaration and initialization code 
may be inlined rather than
+   *compacted. Please set `true` into forceInline, if 
you want to access the
+   *status fast (e.g. frequently accessed) or if you 
want to use the original
+   *variable name
+   * @param useFreshName If this is false and forceInline is true, the 
name is not changed
+   * @return the name of the mutable state variable, which is the original 
name or fresh name if
+   * the variable is inlined to the outer class, or an array 
access if the variable is to
+   * be stored in an array of variables of the same type.
+   * A variable will be inlined into the outer class when one of 
the following conditions
+   * are satisfied:
+   * 1. forceInline is true
+   * 2. its type is primitive type and the total number of the 
inlined mutable variables
+   *is less than 
`CodeGenerator.OUTER_CLASS_VARIABLES_THRESHOLD`
+   * 3. its type is multi-dimensional array
+   * A primitive type variable will be inlined into outer class 
when the total number of
+   * When a variable is compacted into an array, the max size of 
the array for compaction
--- End diff --

The sentences looks broken? I.e., `...total number of`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19989: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

2017-12-17 Thread zuotingbing
Github user zuotingbing commented on the issue:

https://github.com/apache/spark/pull/19989
  
we can find the cache size of FileSystem `deleteOnExit` will keep 
increasing.
[

![mshot](https://user-images.githubusercontent.com/24823338/34095036-d1fcc408-e40a-11e7-9599-2acdd96da2d9.png)
](url)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19892: [SPARK-20542][FollowUp][PySpark] Bucketizer suppo...

2017-12-17 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19892#discussion_r157416631
  
--- Diff: python/pyspark/ml/param/__init__.py ---
@@ -134,6 +134,20 @@ def toListFloat(value):
 return [float(v) for v in value]
 raise TypeError("Could not convert %s to list of floats" % value)
 
+@staticmethod
+def toListListFloat(value):
+"""
+Convert a value to list of list of floats, if possible.
+"""
+if TypeConverters._can_convert_to_list(value):
+value = TypeConverters.toList(value)
+if all(map(lambda v: TypeConverters._can_convert_to_list(v), 
value)):
+ll = []
+for v in value:
+ll.append([float(i) for i in TypeConverters.toList(v)])
--- End diff --

`toListFloat` requires each list entry is a numeric 
`TypeConverters._is_numeric`. Should `toListListFloat` have such requirement?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19989: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

2017-12-17 Thread zuotingbing
Github user zuotingbing commented on the issue:

https://github.com/apache/spark/pull/19989
  
as i debug, every time when i connect to thrift server through beeline, the 
`SessionState.start(state)` will be called two times. one is in 
`HiveSessionImpl:open` , **another is in `HiveClientImpl` for sql `use 
default`** .
SessionManager.java#L151 or `HiveSessionImpl:close` only to clean the first.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...

2017-12-17 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19811#discussion_r157415970
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -112,15 +112,14 @@ case class Like(left: Expression, right: Expression) 
extends StringRegexExpressi
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 val patternClass = classOf[Pattern].getName
 val escapeFunc = StringUtils.getClass.getName.stripSuffix("$") + 
".escapeLikeRegex"
-val pattern = ctx.freshName("pattern")
 
 if (right.foldable) {
   val rVal = right.eval()
   if (rVal != null) {
 val regexStr =
   
StringEscapeUtils.escapeJava(escape(rVal.asInstanceOf[UTF8String].toString()))
-ctx.addMutableState(patternClass, pattern,
-  s"""$pattern = ${patternClass}.compile("$regexStr");""")
+val pattern = ctx.addMutableState(patternClass, "patternLike",
+  v => s"""$v = ${patternClass}.compile("$regexStr");""", 
forceInline = true)
--- End diff --

my only concern is about point 2. I think it is a dangerous thing to do. 
What if we generate a lot of frequently used variable? I think it is safer at 
the moment to consider only 1 and 3 in the decision whether to inline or not. 
In the future, with a different codegen method, we might then define a 
threshold over which we generate an array for the given class, otherwise we use 
plain variables, which IMHO would be the best option but at the moment it is 
not feasible...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20006: [SPARK-22821][TEST] Basic tests for WidenSetOperationTyp...

2017-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20006
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85039/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20006: [SPARK-22821][TEST] Basic tests for WidenSetOperationTyp...

2017-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20006
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20006: [SPARK-22821][TEST] Basic tests for WidenSetOperationTyp...

2017-12-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20006
  
**[Test build #85039 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85039/testReport)**
 for PR 20006 at commit 
[`7ee9aec`](https://github.com/apache/spark/commit/7ee9aeccfc5adc6107c33401ef0c5212a65d9577).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19884
  
**[Test build #85044 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85044/testReport)**
 for PR 19884 at commit 
[`ad8d5e2`](https://github.com/apache/spark/commit/ad8d5e26d040f282dea531785c41dd645b9f0a9e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20005: [DO-NOT-MERGE] Investigating SparkR test failure

2017-12-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20005
  
cc @shivaram, @felixcheung and @falaki too FYI.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2017-12-17 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19993#discussion_r157414340
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala 
---
@@ -140,10 +140,10 @@ final class Bucketizer @Since("1.4.0") 
(@Since("1.4.0") override val uid: String
* by `inputCol`. A warning will be printed if both are set.
*/
   private[feature] def isBucketizeMultipleColumns(): Boolean = {
-if (isSet(inputCols) && isSet(inputCol)) {
-  logWarning("Both `inputCol` and `inputCols` are set, we ignore 
`inputCols` and this " +
-"`Bucketizer` only map one column specified by `inputCol`")
-  false
+if (isSet(inputCols) && isSet(inputCol) || isSet(inputCols) && 
isSet(outputCol) ||
+  isSet(inputCol) && isSet(outputCols)) {
+  throw new IllegalArgumentException("Both `inputCol` and `inputCols` 
are set, `Bucketizer` " +
+"only supports setting either `inputCol` or `inputCols`.")
--- End diff --

thanks. I will add these checks while doing the changes requested by 
@hhbyyh. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...

2017-12-17 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/19993
  
@hhbyyh thanks for the review. I see that for some classes there are 
ongoing PRs. Thus I cannot change them now in order to have a common place and 
a common test. Should I wait for those PRs to be merged then? Or were you 
suggesting something different?
Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19892: [SPARK-20542][FollowUp][PySpark] Bucketizer suppo...

2017-12-17 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19892#discussion_r157414231
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -315,13 +315,14 @@ class BucketedRandomProjectionLSHModel(LSHModel, 
JavaMLReadable, JavaMLWritable)
 
 
 @inherit_doc
-class Bucketizer(JavaTransformer, HasInputCol, HasOutputCol, 
HasHandleInvalid,
- JavaMLReadable, JavaMLWritable):
+class Bucketizer(JavaTransformer, HasInputCol, HasOutputCol, HasInputCols, 
HasOutputCols,
+ HasHandleInvalid, JavaMLReadable, JavaMLWritable):
 """
 Maps a column of continuous features to a column of feature buckets.
--- End diff --

Update this doc too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19933: [SPARK-22744][CORE] Add a configuration to show the appl...

2017-12-17 Thread LantaoJin
Github user LantaoJin commented on the issue:

https://github.com/apache/spark/pull/19933
  
And **spark.driver.host** can be set to sparkConf, why spark.submit.host 
can not be.
`def makeDriverRef(name: String, conf: SparkConf, rpcEnv: RpcEnv): 
RpcEndpointRef = {`
`  val driverHost: String = conf.get("spark.driver.host", "localhost")`
`  val driverPort: Int = conf.getInt("spark.driver.port", 7077)`



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19193: [WIP][SPARK-21896][SQL] Fix Stack Overflow when window f...

2017-12-17 Thread aokolnychyi
Github user aokolnychyi commented on the issue:

https://github.com/apache/spark/pull/19193
  
@hvanhovell here is a summary of tried scenarios:

```
val df = Seq((1, 2), (1, 3), (2, 4), (5, 5)).toDF("a", "b")
val window1 = Window.orderBy('a)
val window2 = Window.orderBy('a.desc)
```

**Scenario 1: An aggregate on top of a window expression (did not work 
before, looks OK now)**
```
df.groupBy().agg(max(rank().over(window1))).explain(true)
df.groupBy().agg(max(rank().over(window1))).show(false)

== Analyzed Logical Plan ==
max(RANK() OVER (ORDER BY a ASC NULLS FIRST unspecifiedframe$())): int
Aggregate [max(_we0#27) AS max(RANK() OVER (ORDER BY a ASC NULLS FIRST 
unspecifiedframe$()))#16]
+- Project [a#5, b#6, _we0#27, _we0#27]
   +- Window [rank(a#5) windowspecdefinition(a#5 ASC NULLS FIRST, 
specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
_we0#27], [a#5 ASC NULLS FIRST]
  +- Project [_1#2 AS a#5, _2#3 AS b#6]
 +- LocalRelation [_1#2, _2#3]

== Optimized Logical Plan ==
Aggregate [max(_we0#27) AS max(RANK() OVER (ORDER BY a ASC NULLS FIRST 
unspecifiedframe$()))#16]
+- Project [_we0#27, _we0#27]
   +- Window [rank(a#5) windowspecdefinition(a#5 ASC NULLS FIRST, 
specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
_we0#27], [a#5 ASC NULLS FIRST]
  +- LocalRelation [a#5]

+-+
|max(RANK() OVER (ORDER BY a ASC NULLS FIRST unspecifiedframe$()))|
+-+
|4|
+-+
```

**Scenario 2: An aggregate with grouping expressions on top of a window 
expression**
TODO:  Is the result wrong? What is expected?
```
df.groupBy('a).agg(max(rank().over(window1))).explain(true)
df.groupBy('a).agg(max(rank().over(window1))).show(false)

== Analyzed Logical Plan ==
a: int, max(RANK() OVER (ORDER BY a ASC NULLS FIRST unspecifiedframe$())): 
int
Aggregate [a#5], [a#5, max(_we0#75) AS max(RANK() OVER (ORDER BY a ASC 
NULLS FIRST unspecifiedframe$()))#63]
+- Project [a#5, b#6, _we0#75, _we0#75]
   +- Window [rank(a#5) windowspecdefinition(a#5 ASC NULLS FIRST, 
specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
_we0#75], [a#5 ASC NULLS FIRST]
  +- Project [_1#2 AS a#5, _2#3 AS b#6]
 +- LocalRelation [_1#2, _2#3]

== Optimized Logical Plan ==
Aggregate [a#5], [a#5, max(_we0#75) AS max(RANK() OVER (ORDER BY a ASC 
NULLS FIRST unspecifiedframe$()))#63]
+- Project [a#5, _we0#75, _we0#75]
   +- Window [rank(a#5) windowspecdefinition(a#5 ASC NULLS FIRST, 
specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
_we0#75], [a#5 ASC NULLS FIRST]
  +- LocalRelation [a#5]

+---+-+
|a  |max(RANK() OVER (ORDER BY a ASC NULLS FIRST unspecifiedframe$()))|
+---+-+
|1  |1|
|2  |3|
|5  |4|
+---+-+
```

**Scenario 3: A normal aggregate, an aggregate on top of a window 
expression, a window expression on top of an aggregate in one query**
This is resolved in two steps.
```
df.groupBy('a).agg(max(rank().over(window1)), sum('b), 
sum(sum('b)).over(window2)).explain(true)
df.groupBy('a).agg(max(rank().over(window1)), sum('b), 
sum(sum('b)).over(window2)).show(false)

== Analyzed Logical Plan ==
a: int, max(RANK() OVER (ORDER BY a ASC NULLS FIRST unspecifiedframe$())): 
int, sum(b): bigint, sum(sum(b)) OVER (ORDER BY a DESC NULLS LAST 
unspecifiedframe$()): bigint
Project [a#5, max(RANK() OVER (ORDER BY a ASC NULLS FIRST 
unspecifiedframe$()))#116, sum(b)#117L, sum(sum(b)) OVER (ORDER BY a DESC NULLS 
LAST unspecifiedframe$())#118L]
+- Project [a#5, max(RANK() OVER (ORDER BY a ASC NULLS FIRST 
unspecifiedframe$()))#116, sum(b)#117L, _w0#137L, sum(sum(b)) OVER (ORDER BY a 
DESC NULLS LAST unspecifiedframe$())#118L, sum(sum(b)) OVER (ORDER BY a DESC 
NULLS LAST unspecifiedframe$())#118L]
   +- Window [sum(_w0#137L) windowspecdefinition(a#5 DESC NULLS LAST, 
specifiedwindowframe(RangeFrame, unboundedpreceding$(), currentrow$())) AS 
sum(sum(b)) OVER (ORDER BY a DESC NULLS LAST unspecifiedframe$())#118L], [a#5 
DESC NULLS LAST]
  +- Aggregate [a#5], [a#5, max(_we0#133) AS max(RANK() OVER (ORDER BY 
a ASC NULLS FIRST 

[GitHub] spark issue #20005: [DO-NOT-MERGE] Investigating SparkR test failure

2017-12-17 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20005
  
I use R version 3.2.3 and I can reproduce this too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20005: [DO-NOT-MERGE] Investigating SparkR test failure

2017-12-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20005
  
I see. Nice to get some helps. BTW, this also seems related with specific R 
version as well. I wasn't able to reproduce this with R 3.4.0 but I am able to 
reproduce this with R 3.1.1 (same as Jenkins').

Probably, some corner cases were handled in 
`tools:::.check_package_CRAN_incoming` in the upper version.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-17 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19811
  
LGTM except some minor comments, great job!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...

2017-12-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19811#discussion_r157410773
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -112,15 +112,14 @@ case class Like(left: Expression, right: Expression) 
extends StringRegexExpressi
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 val patternClass = classOf[Pattern].getName
 val escapeFunc = StringUtils.getClass.getName.stripSuffix("$") + 
".escapeLikeRegex"
-val pattern = ctx.freshName("pattern")
 
 if (right.foldable) {
   val rVal = right.eval()
   if (rVal != null) {
 val regexStr =
   
StringEscapeUtils.escapeJava(escape(rVal.asInstanceOf[UTF8String].toString()))
-ctx.addMutableState(patternClass, pattern,
-  s"""$pattern = ${patternClass}.compile("$regexStr");""")
+val pattern = ctx.addMutableState(patternClass, "patternLike",
+  v => s"""$v = ${patternClass}.compile("$regexStr");""", 
forceInline = true)
--- End diff --

yes, if it's not a lot of them...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20005: [DO-NOT-MERGE] Investigating SparkR test failure

2017-12-17 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20005
  
Note: I've emailed to cran sysadmin team and asked them for help to fix the 
broken page.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...

2017-12-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19811#discussion_r157410521
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
 ---
@@ -440,9 +435,9 @@ case class RangeExec(range: 
org.apache.spark.sql.catalyst.plans.logical.Range)
 | }
""".stripMargin)
 
-val input = ctx.freshName("input")
 // Right now, Range is only used when there is one upstream.
-ctx.addMutableState("scala.collection.Iterator", input, s"$input = 
inputs[0];")
+val input = ctx.addMutableState("scala.collection.Iterator", "input",
--- End diff --

seems it's never used


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...

2017-12-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19811#discussion_r157410349
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SortExec.scala ---
@@ -133,20 +133,17 @@ case class SortExec(
   override def needStopCheck: Boolean = false
 
   override protected def doProduce(ctx: CodegenContext): String = {
-val needToSort = ctx.freshName("needToSort")
-ctx.addMutableState(ctx.JAVA_BOOLEAN, needToSort, s"$needToSort = 
true;")
+val needToSort = ctx.addMutableState(ctx.JAVA_BOOLEAN, "needToSort", v 
=> s"$v = true;")
 
 // Initialize the class member variables. This includes the instance 
of the Sorter and
 // the iterator to return sorted rows.
 val thisPlan = ctx.addReferenceObj("plan", this)
-sorterVariable = ctx.freshName("sorter")
-ctx.addMutableState(classOf[UnsafeExternalRowSorter].getName, 
sorterVariable,
-  s"$sorterVariable = $thisPlan.createSorter();")
-val metrics = ctx.freshName("metrics")
-ctx.addMutableState(classOf[TaskMetrics].getName, metrics,
-  s"$metrics = org.apache.spark.TaskContext.get().taskMetrics();")
-val sortedIterator = ctx.freshName("sortedIter")
-ctx.addMutableState("scala.collection.Iterator", 
sortedIterator, "")
+sorterVariable = 
ctx.addMutableState(classOf[UnsafeExternalRowSorter].getName, "sorter",
+  v => s"$v = $thisPlan.createSorter();", forceInline = true)
+val metrics = ctx.addMutableState(classOf[TaskMetrics].getName, 
"metrics",
+  v => s"$v = org.apache.spark.TaskContext.get().taskMetrics();", 
forceInline = true)
+val sortedIterator = 
ctx.addMutableState("scala.collection.Iterator", "sortedIter",
+  forceInline = true)
--- End diff --

e.g. 
https://github.com/apache/spark/pull/19811/files#diff-2eb948516b5beaeb746aadac27fbd5b4R613
 ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...

2017-12-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19811#discussion_r157410191
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SortExec.scala ---
@@ -133,20 +133,17 @@ case class SortExec(
   override def needStopCheck: Boolean = false
 
   override protected def doProduce(ctx: CodegenContext): String = {
-val needToSort = ctx.freshName("needToSort")
-ctx.addMutableState(ctx.JAVA_BOOLEAN, needToSort, s"$needToSort = 
true;")
+val needToSort = ctx.addMutableState(ctx.JAVA_BOOLEAN, "needToSort", v 
=> s"$v = true;")
 
 // Initialize the class member variables. This includes the instance 
of the Sorter and
 // the iterator to return sorted rows.
 val thisPlan = ctx.addReferenceObj("plan", this)
-sorterVariable = ctx.freshName("sorter")
-ctx.addMutableState(classOf[UnsafeExternalRowSorter].getName, 
sorterVariable,
-  s"$sorterVariable = $thisPlan.createSorter();")
-val metrics = ctx.freshName("metrics")
-ctx.addMutableState(classOf[TaskMetrics].getName, metrics,
-  s"$metrics = org.apache.spark.TaskContext.get().taskMetrics();")
-val sortedIterator = ctx.freshName("sortedIter")
-ctx.addMutableState("scala.collection.Iterator", 
sortedIterator, "")
+sorterVariable = 
ctx.addMutableState(classOf[UnsafeExternalRowSorter].getName, "sorter",
+  v => s"$v = $thisPlan.createSorter();", forceInline = true)
+val metrics = ctx.addMutableState(classOf[TaskMetrics].getName, 
"metrics",
+  v => s"$v = org.apache.spark.TaskContext.get().taskMetrics();", 
forceInline = true)
+val sortedIterator = 
ctx.addMutableState("scala.collection.Iterator", "sortedIter",
+  forceInline = true)
--- End diff --

one question: is there any other places like this? do you have a list?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...

2017-12-17 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19811#discussion_r157410164
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -112,15 +112,14 @@ case class Like(left: Expression, right: Expression) 
extends StringRegexExpressi
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 val patternClass = classOf[Pattern].getName
 val escapeFunc = StringUtils.getClass.getName.stripSuffix("$") + 
".escapeLikeRegex"
-val pattern = ctx.freshName("pattern")
 
 if (right.foldable) {
   val rVal = right.eval()
   if (rVal != null) {
 val regexStr =
   
StringEscapeUtils.escapeJava(escape(rVal.asInstanceOf[UTF8String].toString()))
-ctx.addMutableState(patternClass, pattern,
-  s"""$pattern = ${patternClass}.compile("$regexStr");""")
+val pattern = ctx.addMutableState(patternClass, "patternLike",
+  v => s"""$v = ${patternClass}.compile("$regexStr");""", 
forceInline = true)
--- End diff --

Now, we have three rules to apply inlining
1. Have to use the original name
2. Frequently used in the hot spot
3. [Not expected to be frequently 
generated](https://github.com/apache/spark/pull/19811#discussion_r157110933) 
proposed by @viirya 

Now, we have no rule for 2. I will try to run microbenchmark for 2. Is it 
better to add these benchmarks into the benchmark directory?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...

2017-12-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19811#discussion_r157410133
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SortExec.scala ---
@@ -133,20 +133,17 @@ case class SortExec(
   override def needStopCheck: Boolean = false
 
   override protected def doProduce(ctx: CodegenContext): String = {
-val needToSort = ctx.freshName("needToSort")
-ctx.addMutableState(ctx.JAVA_BOOLEAN, needToSort, s"$needToSort = 
true;")
+val needToSort = ctx.addMutableState(ctx.JAVA_BOOLEAN, "needToSort", v 
=> s"$v = true;")
 
 // Initialize the class member variables. This includes the instance 
of the Sorter and
 // the iterator to return sorted rows.
 val thisPlan = ctx.addReferenceObj("plan", this)
-sorterVariable = ctx.freshName("sorter")
-ctx.addMutableState(classOf[UnsafeExternalRowSorter].getName, 
sorterVariable,
-  s"$sorterVariable = $thisPlan.createSorter();")
-val metrics = ctx.freshName("metrics")
-ctx.addMutableState(classOf[TaskMetrics].getName, metrics,
-  s"$metrics = org.apache.spark.TaskContext.get().taskMetrics();")
-val sortedIterator = ctx.freshName("sortedIter")
-ctx.addMutableState("scala.collection.Iterator", 
sortedIterator, "")
+sorterVariable = 
ctx.addMutableState(classOf[UnsafeExternalRowSorter].getName, "sorter",
+  v => s"$v = $thisPlan.createSorter();", forceInline = true)
+val metrics = ctx.addMutableState(classOf[TaskMetrics].getName, 
"metrics",
+  v => s"$v = org.apache.spark.TaskContext.get().taskMetrics();", 
forceInline = true)
+val sortedIterator = 
ctx.addMutableState("scala.collection.Iterator", "sortedIter",
+  forceInline = true)
--- End diff --

this looks reasonable as it's very unlikely we have a lot of sort operators 
in one stage. We have to inline it manually as we don't have the ability to 
find this out automatically yet. Same as 
https://github.com/apache/spark/pull/19811/files#r157408804


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...

2017-12-17 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19811#discussion_r157409537
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -217,7 +217,7 @@ case class Stack(children: Seq[Expression]) extends 
Generator {
 ctx.addMutableState(
   s"$wrapperClass",
   ev.value,
-  s"${ev.value} = $wrapperClass$$.MODULE$$.make($rowData);")
--- End diff --

I will do it in another PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19884
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19884
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85043/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19884
  
**[Test build #85043 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85043/testReport)**
 for PR 19884 at commit 
[`084d30b`](https://github.com/apache/spark/commit/084d30b6fae89fbcb7f47013f47b7848ff135387).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19884
  
**[Test build #85043 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85043/testReport)**
 for PR 19884 at commit 
[`084d30b`](https://github.com/apache/spark/commit/084d30b6fae89fbcb7f47013f47b7848ff135387).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...

2017-12-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19811#discussion_r157408804
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala
 ---
@@ -385,20 +385,43 @@ class CodeGenerationSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 val ctx = new CodegenContext
 val schema = new StructType().add("a", IntegerType).add("b", 
StringType)
 CreateExternalRow(Seq(Literal(1), Literal("x")), schema).genCode(ctx)
-assert(ctx.mutableStates.isEmpty)
+assert(ctx.inlinedMutableStates.isEmpty)
   }
 
   test("SPARK-22696: InitializeJavaBean should not use global variables") {
 val ctx = new CodegenContext
 InitializeJavaBean(Literal.fromObject(new java.util.LinkedList[Int]),
   Map("add" -> Literal(1))).genCode(ctx)
-assert(ctx.mutableStates.isEmpty)
+assert(ctx.inlinedMutableStates.isEmpty)
   }
 
   test("SPARK-22716: addReferenceObj should not add mutable states") {
 val ctx = new CodegenContext
 val foo = new Object()
 ctx.addReferenceObj("foo", foo)
-assert(ctx.mutableStates.isEmpty)
+assert(ctx.inlinedMutableStates.isEmpty)
+  }
+
+  test("SPARK-18016: define mutable states by using an array") {
+val ctx1 = new CodegenContext
+for (i <- 1 to CodeGenerator.OUTER_CLASS_VARIABLES_THRESHOLD + 10) {
+  ctx1.addMutableState(ctx1.JAVA_INT, "i", v => s"$v = $i;")
+}
+assert(ctx1.inlinedMutableStates.size == 
CodeGenerator.OUTER_CLASS_VARIABLES_THRESHOLD)
+// When the number of primitive type mutable states is over the 
threshold, others are
+// allocated into an array
--- End diff --

Some notes: It's better if we can collect all mutable states before 
deciding which one should be inlined. However it's impossible to do with the 
current string based codegen framework, we need to decide inline or not 
immediately. We can revisit this in the future when we have an AST based 
codegen framework.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20005: [DO-NOT-MERGE] Investigating SparkR test failure

2017-12-17 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20005
  
I've tried to investigate this and I think it is due to a wrongly formatted 
document at https://cran.r-project.org/src/contrib/PACKAGES.in:

```
Package: alR
X-CRAN-Comment: Archived again on 2017-12-15 as none of the errors (failure 
to install with clang and on Solaris, buffer overflows) were corrected.

X-CRAN-History: Archived on 2017-12-01 as C++ errors (and segfaults) were 
not corrected despite reminders.
Unarchived on 2017-12-07.
```

There is an additional breaking line here and it breaks the function 
`tools:::.check_package_CRAN_incoming`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...

2017-12-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19811#discussion_r157408425
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -112,15 +112,14 @@ case class Like(left: Expression, right: Expression) 
extends StringRegexExpressi
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 val patternClass = classOf[Pattern].getName
 val escapeFunc = StringUtils.getClass.getName.stripSuffix("$") + 
".escapeLikeRegex"
-val pattern = ctx.freshName("pattern")
 
 if (right.foldable) {
   val rVal = right.eval()
   if (rVal != null) {
 val regexStr =
   
StringEscapeUtils.escapeJava(escape(rVal.asInstanceOf[UTF8String].toString()))
-ctx.addMutableState(patternClass, pattern,
-  s"""$pattern = ${patternClass}.compile("$regexStr");""")
+val pattern = ctx.addMutableState(patternClass, "patternLike",
+  v => s"""$v = ${patternClass}.compile("$regexStr");""", 
forceInline = true)
--- End diff --

Do we have a clear rule when a global variable should be inlined for better 
performance? e.g. a microbenchmark  showing noteworthy difference definitely 
proves we should inline.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...

2017-12-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19811#discussion_r157408061
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -217,7 +217,7 @@ case class Stack(children: Seq[Expression]) extends 
Generator {
 ctx.addMutableState(
   s"$wrapperClass",
   ev.value,
-  s"${ev.value} = $wrapperClass$$.MODULE$$.make($rowData);")
--- End diff --

We can localize the global variable `ev.value` here to save one global 
variable slot.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...

2017-12-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19811#discussion_r157407681
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -163,11 +204,51 @@ class CodegenContext {
* the list of default imports available.
* Also, generic type arguments are accepted but ignored.
* @param variableName Name of the field.
-   * @param initCode The statement(s) to put into the init() method to 
initialize this field.
+   * @param initFunc Function includes statement(s) to put into the init() 
method to initialize
+   * this field. The argument is the name of the mutable 
state variable.
* If left blank, the field will be default-initialized.
+   * @param forceInline whether the declaration and initialization code 
may be inlined rather than
+   *compacted. Please set `true` into forceInline, if 
you want to access the
+   *status fast (e.g. frequently accessed) or if you 
want to use the original
+   *variable name
+   * @param useFreshName If this is false and forceInline is true, the 
name is not changed
--- End diff --

more accurate: `If this is false and mutable state ends up inlining in the 
outer class, the name is not changed`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19933: [SPARK-22744][CORE] Add a configuration to show the appl...

2017-12-17 Thread LantaoJin
Github user LantaoJin commented on the issue:

https://github.com/apache/spark/pull/19933
  
> driver host is actually probably already in the logs.
What I were care about is submit host not driver host and I have built the 
spark under trunk and tested in yarn-cluster mode, there is no submit host in 
am log (even enable debug). So current platform team still difficult to know 
where an application submitted. They even have to check the YARN audit log to 
find out where an create operation upon some .sparkStage/files to know the 
submit host.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...

2017-12-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19981
  
**[Test build #85042 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85042/testReport)**
 for PR 19981 at commit 
[`9f5d21f`](https://github.com/apache/spark/commit/9f5d21f1aa71b1e2ee0a820a1430f453ecf66f0f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...

2017-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19981
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...

2017-12-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19981
  
**[Test build #85041 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85041/testReport)**
 for PR 19981 at commit 
[`5737705`](https://github.com/apache/spark/commit/57377055a3db6f12f67c55524917e4da1b49df94).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...

2017-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19981
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85041/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...

2017-12-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19981
  
**[Test build #85041 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85041/testReport)**
 for PR 19981 at commit 
[`5737705`](https://github.com/apache/spark/commit/57377055a3db6f12f67c55524917e4da1b49df94).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...

2017-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19981
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85040/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...

2017-12-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19981
  
**[Test build #85040 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85040/testReport)**
 for PR 19981 at commit 
[`2e66722`](https://github.com/apache/spark/commit/2e667222690cad5dcd839b220ae9f938203813db).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...

2017-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19981
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...

2017-12-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19981
  
**[Test build #85040 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85040/testReport)**
 for PR 19981 at commit 
[`2e66722`](https://github.com/apache/spark/commit/2e667222690cad5dcd839b220ae9f938203813db).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...

2017-12-17 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19981
  
@vanzin We can add `AppStatusPlugin` back if you can convince other people 
that the SQL listener and UI tag should be set up during `SparkContext` 
initialization(I'm kind of convinced after some more thoughts, for future SQL 
listener usage it's more important to catch all events including non-SQL ones), 
and figure out how to keep `SharedState.statusStore`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape

2017-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20004
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85038/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape

2017-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20004
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape

2017-12-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20004
  
**[Test build #85038 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85038/testReport)**
 for PR 20004 at commit 
[`652dec8`](https://github.com/apache/spark/commit/652dec84bc1ee57e3cf0cfb553b821cbe54cdef3).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape

2017-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20004
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape

2017-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20004
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85037/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape

2017-12-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20004
  
**[Test build #85037 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85037/testReport)**
 for PR 20004 at commit 
[`2285807`](https://github.com/apache/spark/commit/22858071cd25acf7732c1ca4d984cced4d403b90).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape

2017-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20004
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85036/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape

2017-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20004
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape

2017-12-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20004
  
**[Test build #85036 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85036/testReport)**
 for PR 20004 at commit 
[`57be3f1`](https://github.com/apache/spark/commit/57be3f1ba654ed2ea42c06d9d9325f06222196ef).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-17 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19811
  
ping @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20006: [SPARK-22821][TEST] Basic tests for WidenSetOperationTyp...

2017-12-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20006
  
**[Test build #85039 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85039/testReport)**
 for PR 20006 at commit 
[`7ee9aec`](https://github.com/apache/spark/commit/7ee9aeccfc5adc6107c33401ef0c5212a65d9577).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20006: [SPARK-22821][TEST] Basic tests for WidenSetOpera...

2017-12-17 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/20006

[SPARK-22821][TEST] Basic tests for WidenSetOperationTypes, 
BooleanEquality, StackCoercion and Division

## What changes were proposed in this pull request?

Test Coverage for `WidenSetOperationTypes`, `BooleanEquality`, 
`StackCoercion`  and `Division`, this is a Sub-tasks for 
[SPARK-22722](https://issues.apache.org/jira/browse/SPARK-22722).

## How was this patch tested?
N/A


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-22821

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20006.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20006


commit 7ee9aeccfc5adc6107c33401ef0c5212a65d9577
Author: Yuming Wang 
Date:   2017-12-18T04:43:55Z

Basic tests for WidenSetOperationTypes, BooleanEquality, StackCoercion and 
Division




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-17 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/19884
  
@wesm I was able to install pyarrow 0.8.0 to my local environment via 
conda. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19981: [SPARK-22786][SQL] only use AppStatusPlugin in hi...

2017-12-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19981#discussion_r157397655
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListenerSuite.scala
 ---
@@ -489,16 +501,17 @@ private case class MyPlan(sc: SparkContext, 
expectedValue: Long) extends LeafExe
 }
 
 
-class SQLListenerMemoryLeakSuite extends SparkFunSuite {
+class SQLAppStatusListenerMemoryLeakSuite extends SparkFunSuite {
 
-  // TODO: this feature is not yet available in SQLAppStatusStore.
-  ignore("no memory leak") {
+  test("no memory leak") {
 quietly {
   val conf = new SparkConf()
 .setMaster("local")
 .setAppName("test")
+.set(LIVE_ENTITY_UPDATE_PERIOD, 0L) // Update the UI data 
immediately
 .set(config.MAX_TASK_FAILURES, 1) // Don't retry the tasks to run 
this test quickly
-.set("spark.sql.ui.retainedExecutions", "50") // Set it to 50 to 
run this test quickly
+// TODO: this feature is not yet available in SQLAppStatusStore.
+// .set("spark.sql.ui.retainedExecutions", "50") // Set it to 50 
to run this test quickly
--- End diff --

ok let me revert this part


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...

2017-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19864
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85035/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...

2017-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19864
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...

2017-12-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19864
  
**[Test build #85035 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85035/testReport)**
 for PR 19864 at commit 
[`b2d829b`](https://github.com/apache/spark/commit/b2d829b2ff30d03af681931260a78da579915624).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19988: [Spark-22795] [ML] Raise error when line search i...

2017-12-17 Thread mrkm4ntr
Github user mrkm4ntr commented on a diff in the pull request:

https://github.com/apache/spark/pull/19988#discussion_r157396794
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
@@ -244,9 +244,9 @@ class LinearSVC @Since("2.2.0") (
   }
 
   bcFeaturesStd.destroy(blocking = false)
-  if (state == null) {
+  if (state == null || state.searchFailed) {
 val msg = s"${optimizer.getClass.getName} failed."
-logError(msg)
+logWarning(msg)
 throw new SparkException(msg)
--- End diff --

The message that line search failed is logged by breeze.

https://github.com/scalanlp/breeze/blob/4611fe3f4466ac5b1885e9d6288ba27fc9d205ec/math/src/main/scala/breeze/optimize/FirstOrderMinimizer.scala#L78
I think it is a bit verbose that logging at this place.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...

2017-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19977
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...

2017-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19977
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85031/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...

2017-12-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19977
  
**[Test build #85031 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85031/testReport)**
 for PR 19977 at commit 
[`2d3926e`](https://github.com/apache/spark/commit/2d3926e546b3aa60c46449151706941ed17e2441).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  class FunctionArgumentConversion(conf: SQLConf) extends 
TypeCoercionRule `
  * `case class Concat(children: Seq[Expression], isBinaryMode: Boolean = 
false)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nested expr...

2017-12-17 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19813
  
> Another solution is to not make this contract. By a quick look this seems 
hard to do, because at the time of doing this, the code(method body) is already 
generated and we don't know how to replace statement like a + 1 with the 
generated parameter name, inside the method body. We may need to do this fix 
earlier in the codegen procedure.

@cloud-fan Can you say more about this limitation? Can't we just replace 
all occurrences of `a + 1` with a parameter name like `varA` in method body if 
we are going to use it as parameter name in the split method?

E.g., from a method body like:

```scala
void splitMethod1(int a + 1, int b, boolean c) {
 ...
 int d = a + 1 + b;
 if (a + 1 > 10 && c) {
...
 }
```

to

```scala
void splitMethod1(int varA, int b, boolean c) {
 ...
 int d = varA + b;
 if (varA > 10 && c) {
...
 }
```





---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2017-12-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19993#discussion_r157393913
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala 
---
@@ -140,10 +140,10 @@ final class Bucketizer @Since("1.4.0") 
(@Since("1.4.0") override val uid: String
* by `inputCol`. A warning will be printed if both are set.
*/
   private[feature] def isBucketizeMultipleColumns(): Boolean = {
-if (isSet(inputCols) && isSet(inputCol)) {
-  logWarning("Both `inputCol` and `inputCols` are set, we ignore 
`inputCols` and this " +
-"`Bucketizer` only map one column specified by `inputCol`")
-  false
+if (isSet(inputCols) && isSet(inputCol) || isSet(inputCols) && 
isSet(outputCol) ||
+  isSet(inputCol) && isSet(outputCols)) {
+  throw new IllegalArgumentException("Both `inputCol` and `inputCols` 
are set, `Bucketizer` " +
+"only supports setting either `inputCol` or `inputCols`.")
--- End diff --

Here it is better to add one more check:
if single column, only set `splits` param.
if multiple column, only set `splitsArray` param.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19988: [Spark-22795] [ML] Raise error when line search i...

2017-12-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19988#discussion_r157393049
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
@@ -244,9 +244,9 @@ class LinearSVC @Since("2.2.0") (
   }
 
   bcFeaturesStd.destroy(blocking = false)
-  if (state == null) {
+  if (state == null || state.searchFailed) {
 val msg = s"${optimizer.getClass.getName} failed."
-logError(msg)
+logWarning(msg)
 throw new SparkException(msg)
--- End diff --

But the msg in the Exception should include the root reason "line search 
failed." Current msg only including "LBFGS failed."


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19994: [SPARK-22810][ML][PySpark] Expose Python API for ...

2017-12-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19994#discussion_r157391859
  
--- Diff: python/pyspark/ml/regression.py ---
@@ -155,6 +183,14 @@ def intercept(self):
 """
 return self._call_java("intercept")
 
+@property
+@since("2.3.0")
+def scale(self):
+"""
+The value by which \|y - X'w\| is scaled down when loss is "huber".
--- End diff --

add doc "When square loss the value is 1.0"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19994: [SPARK-22810][ML][PySpark] Expose Python API for ...

2017-12-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19994#discussion_r157391801
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -1725,6 +1725,27 @@ def test_offset(self):
 self.assertTrue(np.isclose(model.intercept, -1.561613, atol=1E-4))
 
 
+class LinearRegressionTest(SparkSessionTestCase):
+
+def test_linear_regression_with_huber_loss(self):
+
+data_path = "data/mllib/sample_linear_regression_data.txt"
+df = self.spark.read.format("libsvm").load(data_path)
+
+lir = LinearRegression(loss="huber")
--- End diff --

The testcase should include `setEpsilon`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape

2017-12-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20004
  
**[Test build #85038 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85038/testReport)**
 for PR 20004 at commit 
[`652dec8`](https://github.com/apache/spark/commit/652dec84bc1ee57e3cf0cfb553b821cbe54cdef3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape

2017-12-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20004
  
**[Test build #85037 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85037/testReport)**
 for PR 20004 at commit 
[`2285807`](https://github.com/apache/spark/commit/22858071cd25acf7732c1ca4d984cced4d403b90).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20004: [Spark-22818][SQL] csv escape of quote escape

2017-12-17 Thread ep1804
Github user ep1804 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20004#discussion_r157391252
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ---
@@ -482,6 +482,36 @@ class CSVSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
 }
   }
 
+  test("save csv with quote escaping, using escapeQuoteEscaping option") {
+withTempPath { path =>
+  // when a string to be quoted ends with the escape character, 
escapeQuoteEscaping is required
+  val df1 = Seq(
+"""You are "beautiful,
+"""Yes, \in the inside\""",  // ends with the escape character '\\'
+"""AA\\BB""",// 2 escape char
+"""AA""BB""",// 2 quote char
+"""AA\"BB\""",   // 1 escape char and 1 quote char
+"""AA\""BB\""",  // 1 escape char and 2 quote char
+"""AA\\"BB\"""   // 2 escape char and 1 quote char
+  ).toDF()
+
+  // escapeQuotes is true by default
+  // quote character is '\"' by default
+  // escape character is '\\' by default
+  df1.coalesce(1).write
+.format("csv")
+.option("escapeQuoteEscaping", "\\")
--- End diff --

OK.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20004: [Spark-22818][SQL] csv escape of quote escape

2017-12-17 Thread ep1804
Github user ep1804 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20004#discussion_r157391233
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ---
@@ -482,6 +482,36 @@ class CSVSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
 }
   }
 
+  test("save csv with quote escaping, using escapeQuoteEscaping option") {
+withTempPath { path =>
+  // when a string to be quoted ends with the escape character, 
escapeQuoteEscaping is required
+  val df1 = Seq(
+"""You are "beautiful,
+"""Yes, \in the inside\""",  // ends with the escape character '\\'
+"""AA\\BB""",// 2 escape char
+"""AA""BB""",// 2 quote char
+"""AA\"BB\""",   // 1 escape char and 1 quote char
+"""AA\""BB\""",  // 1 escape char and 2 quote char
+"""AA\\"BB\"""   // 2 escape char and 1 quote char
+  ).toDF()
+
+  // escapeQuotes is true by default
+  // quote character is '\"' by default
+  // escape character is '\\' by default
+  df1.coalesce(1).write
+.format("csv")
+.option("escapeQuoteEscaping", "\\")
--- End diff --

It fails. As follows:

```
[info]   == Results ==
[info]   !== Correct Answer - 7 ==   == Spark Answer - 7 ==
[info]   !struct<_c0:string> struct
[info][AA""BB]   [AA""BB]
[info]   ![AA\""BB"] [AA\""BB\]
[info]   ![AA\"BB"]  [AA\"BB\]
[info]   ![AA\\"BB"] [AA\\"BB\]
[info][AA\\BB]   [AA\\BB]
[info]   ![Yes, \in the inside"] [Yes, \in the inside\]
[info][You are "beautiful"]  [You are "beautiful"] 
(QueryTest.scala:163)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20004: [Spark-22818][SQL] csv escape of quote escape

2017-12-17 Thread ep1804
Github user ep1804 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20004#discussion_r157391102
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala 
---
@@ -249,6 +249,8 @@ final class DataStreamReader private[sql](sparkSession: 
SparkSession) extends Lo
* `com.databricks.spark.csv`.
* `escape` (default `\`): sets the single character used for 
escaping quotes inside
* an already quoted value.
+   * `escapeQuoteEscaping` (default `\0`): sets the single character 
used for escaping
+   * the quote-escape character.
--- End diff --

About the naming, Univocity and spark uses little bit different option 
namings as follows:

| univocity | spark |
| :--| :--   |
| delimiter| sep  |
| quote | quote  |
| quoteEscape  |  escape |
| charToEscapeQuoteEscaping |  escapeQuoteEscaping |

I tried to follow Spark's way by simply removing `charTo`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18576: [SPARK-21351][SQL] Update nullability based on children'...

2017-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18576
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18576: [SPARK-21351][SQL] Update nullability based on children'...

2017-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18576
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85032/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18576: [SPARK-21351][SQL] Update nullability based on children'...

2017-12-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18576
  
**[Test build #85032 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85032/testReport)**
 for PR 18576 at commit 
[`84d32ef`](https://github.com/apache/spark/commit/84d32efbc05d614ce8b9f942623b05bec67d3cfc).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class FilterExec(condition: Expression, child: SparkPlan, 
outputAttrs: Seq[Attribute])`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape

2017-12-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20004
  
**[Test build #85036 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85036/testReport)**
 for PR 20004 at commit 
[`57be3f1`](https://github.com/apache/spark/commit/57be3f1ba654ed2ea42c06d9d9325f06222196ef).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20004: [Spark-22818][SQL] csv escape of quote escape

2017-12-17 Thread ep1804
Github user ep1804 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20004#discussion_r157389005
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala 
---
@@ -249,6 +249,8 @@ final class DataStreamReader private[sql](sparkSession: 
SparkSession) extends Lo
* `com.databricks.spark.csv`.
* `escape` (default `\`): sets the single character used for 
escaping quotes inside
* an already quoted value.
+   * `escapeQuoteEscaping` (default `\0`): sets the single character 
used for escaping
--- End diff --

OK. I'll do it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...

2017-12-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19864
  
**[Test build #85035 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85035/testReport)**
 for PR 19864 at commit 
[`b2d829b`](https://github.com/apache/spark/commit/b2d829b2ff30d03af681931260a78da579915624).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...

2017-12-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19864
  
Let's hold it on. Seems globally failing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17400: [SPARK-19981][SQL] Update output partitioning info. when...

2017-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17400
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85034/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17400: [SPARK-19981][SQL] Update output partitioning info. when...

2017-12-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17400
  
**[Test build #85034 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85034/testReport)**
 for PR 17400 at commit 
[`e6ce117`](https://github.com/apache/spark/commit/e6ce11716015cac614f6155ef2e9a88b420d3ee6).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class AggregateExec extends UnaryExecNode `
  * `case class ProjectExec(`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17400: [SPARK-19981][SQL] Update output partitioning info. when...

2017-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17400
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20005: [DO-NOT-MERGE] Investigating SparkR test failure

2017-12-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20005
  
**[Test build #85033 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85033/testReport)**
 for PR 20005 at commit 
[`6646eb5`](https://github.com/apache/spark/commit/6646eb56431fd5cd54c0c6342fca69428a2d51e0).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20005: [DO-NOT-MERGE] Investigating SparkR test failure

2017-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20005
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20005: [DO-NOT-MERGE] Investigating SparkR test failure

2017-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20005
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85033/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17400: [SPARK-19981][SQL] Update output partitioning info. when...

2017-12-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17400
  
**[Test build #85034 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85034/testReport)**
 for PR 17400 at commit 
[`e6ce117`](https://github.com/apache/spark/commit/e6ce11716015cac614f6155ef2e9a88b420d3ee6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...

2017-12-17 Thread CodingCat
Github user CodingCat commented on the issue:

https://github.com/apache/spark/pull/19864
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...

2017-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19864
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...

2017-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19864
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85030/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...

2017-12-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19864
  
**[Test build #85030 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85030/testReport)**
 for PR 19864 at commit 
[`b2d829b`](https://github.com/apache/spark/commit/b2d829b2ff30d03af681931260a78da579915624).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20005: [DO-NOT-MERGE] Investigating SparkR test failure

2017-12-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20005
  
**[Test build #85033 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85033/testReport)**
 for PR 20005 at commit 
[`6646eb5`](https://github.com/apache/spark/commit/6646eb56431fd5cd54c0c6342fca69428a2d51e0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >