[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-08 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19082
  
Basically, cutting is to decide the boundaries of `blocking loop`.

@kiszk and @rednaxelafx can explain what I said above better. This is 
related to how JVM works and how whole-stage codegen works  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-08 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19082
  
> The regression caused by spark.sql.codegen.hugeMethodLimit shows the 
potential regression caused by horizontal cuts, although 
spark.sql.codegen.hugeMethodLimit does nothing.

`hugeMethodLimit` just affects if wholestage codegen is enabled. The 
regression is possibly caused by we completely disable wholestage codegen when 
detecting maybe one huge method. The regression of disabled wholestage codegen 
is larger than the affect of huge method.

I didn't know how you link it to so-called "horizontal cuts" or split 
generated code.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-08 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19082
  
Btw, I'd like to know what the horizontal/vertical cuts you meant. Can you 
give a simple example?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-08 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19082
  
The regression caused by `spark.sql.codegen.hugeMethodLimit` shows the 
potential regression caused by horizontal cuts, although 
`spark.sql.codegen.hugeMethodLimit` does nothing. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-10-08 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/18664
  
I'm sorry for the delay.
I agree with @HyukjinKwon's suggestion to keep the behavior of current 
`toPandas` without Arrow for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-08 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19082
  
I don't think `spark.sql.codegen.hugeMethodLimit` is the same level thing 
as #18931 or this PR.

`hugeMethodLimit` didn't do anything to affect how generated codes are 
split.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-08 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19082
  
The current `spark.sql.codegen.hugeMethodLimit` shows an extreme case. 

Just imagine we have two nodes, we want to do a horizontal/ring cut. 
Basically, in this scenario, horizontal/ring cutting means whether we do a 
whole-stage codegen or not. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19251: [SPARK-22035][SQL]the value of statistical logicalPlan.s...

2017-10-08 Thread heary-cao
Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/19251
  
Leave a comment


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19251: [SPARK-22035][SQL]the value of statistical logica...

2017-10-08 Thread heary-cao
Github user heary-cao closed the pull request at:

https://github.com/apache/spark/pull/19251


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-08 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19082
  
@gatorsmile hmm, I don't know how you get to the conclusion. Is 
`spark.sql.codegen.hugeMethodLimit` any related to codegen cut? I think it is 
just a threshold used to determine whether to enable wholestage or not.

Can you also explain the horizontal/vertical cuts?

Btw, we didn't see any performance regression based on the benchmark 
numbers. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19444: [SPARK-22214][SQL] Refactor the list hive partiti...

2017-10-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19444#discussion_r143386555
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -405,6 +405,11 @@ object CatalogTypes {
* Specifications of a table partition. Mapping column name to column 
value.
*/
   type TablePartitionSpec = Map[String, String]
+
+  /**
+   * Initialize an empty spec.
+   */
+  lazy val emptyTablePartitionSpec: TablePartitionSpec = Map.empty[String, 
String]
--- End diff --

`Map.empty` is already an object, I think we can jus inline it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-08 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19082
  
The latest regression (introduced by `spark.sql.codegen.hugeMethodLimit`) 
clearly shows the ring/onion/horizontal cut 
(https://github.com/apache/spark/pull/18931) could introduce a performance 
regression. 

Vertical cuts like this PR is more promising. However, it still does not 
resolve all the performance regressions introduced by 
`spark.sql.codegen.hugeMethodLimit`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19449: [SPARK-22219][SQL] Refactor code to get a value f...

2017-10-08 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19449#discussion_r143385878
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -929,7 +929,7 @@ class CodegenContext {
 // be extremely expensive in certain cases, such as deeply-nested 
expressions which operate over
 // inputs with wide schemas. For more details on the performance 
issues that motivated this
 // flat, see SPARK-15680.
-if (SparkEnv.get != null && 
SparkEnv.get.conf.getBoolean("spark.sql.codegen.comments", false)) {
--- End diff --

So far, I do not have a bandwidth to fix it. If anybody is interested in 
this, please feel free to start it. This requires a design doc at first. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19251: [SPARK-22035][SQL]the value of statistical logica...

2017-10-08 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19251#discussion_r143385665
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/SizeInBytesOnlyStatsPlanVisitor.scala
 ---
@@ -32,12 +31,14 @@ object SizeInBytesOnlyStatsPlanVisitor extends 
LogicalPlanVisitor[Statistics] {
* same as the output row number, and compute sizes based on the column 
types.
*/
   private def visitUnaryNode(p: UnaryNode): Statistics = {
-// There should be some overhead in Row object, the size should not be 
zero when there is
-// no columns, this help to prevent divide-by-zero error.
-val childRowSize = p.child.output.map(_.dataType.defaultSize).sum + 8
-val outputRowSize = p.output.map(_.dataType.defaultSize).sum + 8
--- End diff --

yes


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-10-08 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18270
  
@cenyuhai Could you also address this comment: 
https://github.com/apache/spark/pull/18270/files#r136121931?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19077: [SPARK-21860][core]Improve memory reuse for heap ...

2017-10-08 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/19077#discussion_r143380706
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -116,9 +116,10 @@ private [sql] object GenArrayData {
s"final ArrayData $arrayDataName = new 
$genericArrayClass($arrayName);",
arrayDataName)
 } else {
+  val numBytes = elementType.defaultSize * numElements
   val unsafeArraySizeInBytes =
 UnsafeArrayData.calculateHeaderPortionInBytes(numElements) +
-
ByteArrayMethods.roundNumberOfBytesToNearestWord(elementType.defaultSize * 
numElements)
+ByteArrayMethods.roundNumberOfBytesToNearestWord(numBytes).toInt
--- End diff --

Minor: why don't we inline this instead of creating a new variable?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19287: [SPARK-22074][Core] Task killed by other attempt task sh...

2017-10-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19287
  
**[Test build #82546 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82546/testReport)**
 for PR 19287 at commit 
[`1c8c849`](https://github.com/apache/spark/commit/1c8c84937e85302f2ac48bcbdbdb5507c9b445e4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19287: [SPARK-22074][Core] Task killed by other attempt task sh...

2017-10-08 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19287
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19360: [SPARK-22139][CORE]Remove the variable which is n...

2017-10-08 Thread guoxiaolongzte
Github user guoxiaolongzte closed the pull request at:

https://github.com/apache/spark/pull/19360


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19447: [SPARK-22215][SQL] Add configuration to set the threshol...

2017-10-08 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19447
  
I feel it is a bit annoying to add a parameters for each Constant Pool 
issue and we better look for solutions so that less parameters (e.g., other 
metrics as @kiszk suggested) can almost solve the issue. I think splitting 
classes is not a silver bullet; if we can't compile gen'd code, we should fail 
over interpreted mode (this is filed in SPARK-21320).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2017-10-08 Thread discipleforteen
Github user discipleforteen commented on the issue:

https://github.com/apache/spark/pull/19218
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19364: [SPARK-22144][SQL] ExchangeCoordinator combine the parti...

2017-10-08 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19364
  
cc: @gatorsmile @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19419: [SPARK-22188] [CORE] Adding security headers for ...

2017-10-08 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/19419#discussion_r143377794
  
--- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala ---
@@ -79,6 +79,9 @@ private[spark] object JettyUtils extends Logging {
 val allowFramingFrom = conf.getOption("spark.ui.allowFramingFrom")
 val xFrameOptionsValue =
   allowFramingFrom.map(uri => s"ALLOW-FROM 
$uri").getOrElse("SAMEORIGIN")
+val xXssProtectionValue = conf.getOption("spark.ui.xXssProtection")
--- End diff --

Please use `ConfigEntry` for newly added configurations, you could refer to 
`org.apache.spark.internal.config`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19419: [SPARK-22188] [CORE] Adding security headers for ...

2017-10-08 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/19419#discussion_r143377976
  
--- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala ---
@@ -89,6 +92,9 @@ private[spark] object JettyUtils extends Logging {
 val result = servletParams.responder(request)
 response.setHeader("Cache-Control", "no-cache, no-store, 
must-revalidate")
 response.setHeader("X-Frame-Options", xFrameOptionsValue)
+
xXssProtectionValue.foreach(response.setHeader("X-XSS-Protection", _))
+
xContentTypeOptionsValue.foreach(response.setHeader("X-Content-Type-Options", 
_))
+
strictTransportSecurityValue.foreach(response.setHeader("Strict-Transport-Security",
 _))
--- End diff --

The changes here will also affect HTTP request, is that OK?

Also is it enough to only change the GET request?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19419: [SPARK-22188] [CORE] Adding security headers for ...

2017-10-08 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/19419#discussion_r143377740
  
--- Diff: conf/spark-defaults.conf.template ---
@@ -25,3 +25,10 @@
 # spark.serializer 
org.apache.spark.serializer.KryoSerializer
 # spark.driver.memory  5g
 # spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value 
-Dnumbers="one two three"
+
+# spark.ui.allowFramingFrom https://www.example.com/
--- End diff --

Agree with @srowen , we should remove the configurations here in template, 
since they're not common configurations. Also add them to 
`docs/configuration.md`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19364: [SPARK-22144][SQL] ExchangeCoordinator combine the parti...

2017-10-08 Thread liutang123
Github user liutang123 commented on the issue:

https://github.com/apache/spark/pull/19364
  
@maropu Any other suggestions and can this PR be merged?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19251: [SPARK-22035][SQL]the value of statistical logica...

2017-10-08 Thread heary-cao
Github user heary-cao commented on a diff in the pull request:

https://github.com/apache/spark/pull/19251#discussion_r143377744
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/SizeInBytesOnlyStatsPlanVisitor.scala
 ---
@@ -32,12 +31,14 @@ object SizeInBytesOnlyStatsPlanVisitor extends 
LogicalPlanVisitor[Statistics] {
* same as the output row number, and compute sizes based on the column 
types.
*/
   private def visitUnaryNode(p: UnaryNode): Statistics = {
-// There should be some overhead in Row object, the size should not be 
zero when there is
-// no columns, this help to prevent divide-by-zero error.
-val childRowSize = p.child.output.map(_.dataType.defaultSize).sum + 8
-val outputRowSize = p.output.map(_.dataType.defaultSize).sum + 8
--- End diff --

Does 8 represent the overhead of each row of data?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19360: [SPARK-22139][CORE]Remove the variable which is never us...

2017-10-08 Thread guoxiaolongzte
Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/19360
  
@HyukjinKwon The problem of the PR you follow, I do not care, I will close 
this PR. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-10-08 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/18270
  
@gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11205: [SPARK-11334][Core] Handle maximum task failure situatio...

2017-10-08 Thread rustagi
Github user rustagi commented on the issue:

https://github.com/apache/spark/pull/11205
  
Sorry haven't been able to confirm this patch becaus have not seen issue in 
production for quite some time.
It was much more persistent with 2.0 than 2.1
Not sure of cause.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11205: [SPARK-11334][Core] Handle maximum task failure situatio...

2017-10-08 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/11205
  
I guess the issue still exists, let me verify the issue again, if it still 
exists I will bring the PR to latest. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-08 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19082
  
@maropu Thanks. Then looks there isn't any significant regression brought 
by this or #18931. We need to be careful but this numbers give more confidence.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-08 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19082
  
ok, done (welcome any re-run requests);
```
OpenJDK 64-Bit Server VM 1.8.0_141-b16 on Linux 4.9.38-16.35.amzn1.x86_64
Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
TPCDS Snappy:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


q94(master) 25107 / 25322  0.2  
  6591.8   1.0X

OpenJDK 64-Bit Server VM 1.8.0_141-b16 on Linux 4.9.38-16.35.amzn1.x86_64
Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
TPCDS Snappy:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


q94(pr18931)25099 / 25383  0.2  
  6589.7   1.0X

OpenJDK 64-Bit Server VM 1.8.0_141-b16 on Linux 4.9.38-16.35.amzn1.x86_64
Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
TPCDS Snappy:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


q94(pr19082)25140 / 25361  0.2  
  6600.4   1.0X

OpenJDK 64-Bit Server VM 1.8.0_141-b16 on Linux 4.9.38-16.35.amzn1.x86_64
Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
TPCDS Snappy:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


q94(pr18931+pr19082 25211 / 25532  0.2  
  6618.9   1.0X
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-08 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19082
  
just a sec, I'll re-run `q94` (sometimes, numbers fluctuate).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-08 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19082
  
Thanks @maropu.

After counting accurate bytecode size, there seems a bottleneck in 
generated codes in aggregation, so this can improve q66 a lot.

Overall, the numbers looks great, except for there is a significant 
regression in q94 when applying this change.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19061: [SPARK-21568][CORE] ConsoleProgressBar should only be en...

2017-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19061
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19061: [SPARK-21568][CORE] ConsoleProgressBar should only be en...

2017-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19061
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82545/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19061: [SPARK-21568][CORE] ConsoleProgressBar should only be en...

2017-10-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19061
  
**[Test build #82545 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82545/testReport)**
 for PR 19061 at commit 
[`ff4b8e4`](https://github.com/apache/spark/commit/ff4b8e49355e1866a9f0f337cb0c7673e13fdcaf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18817: [SPARK-21612] Allow unicode strings in __getitem_...

2017-10-08 Thread rik-coenders
Github user rik-coenders closed the pull request at:

https://github.com/apache/spark/pull/18817


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18817: [SPARK-21612] Allow unicode strings in __getitem__ of St...

2017-10-08 Thread rik-coenders
Github user rik-coenders commented on the issue:

https://github.com/apache/spark/pull/18817
  
Unfortunately I do not have time to work on this issue at the moment, so I 
will close this PR for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparison should respect case-s...

2017-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18460
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparison should respect case-s...

2017-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18460
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82544/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparison should respect case-s...

2017-10-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18460
  
**[Test build #82544 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82544/testReport)**
 for PR 18460 at commit 
[`67a037c`](https://github.com/apache/spark/commit/67a037c053e596432f83dc0e2383bad895e9ce21).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19061: [SPARK-21568][CORE] ConsoleProgressBar should only be en...

2017-10-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19061
  
**[Test build #82545 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82545/testReport)**
 for PR 19061 at commit 
[`ff4b8e4`](https://github.com/apache/spark/commit/ff4b8e49355e1866a9f0f337cb0c7673e13fdcaf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19061: [SPARK-21568][CORE] ConsoleProgressBar should only be en...

2017-10-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19061
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19061: [SPARK-21568][CORE] ConsoleProgressBar should only be en...

2017-10-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19061
  
Hi, @vanzin and @jerryshao .
Could you review this again when you have a chance? Thank you!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparison should respect case-s...

2017-10-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18460
  
**[Test build #82544 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82544/testReport)**
 for PR 18460 at commit 
[`67a037c`](https://github.com/apache/spark/commit/67a037c053e596432f83dc0e2383bad895e9ce21).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparison should respect case-s...

2017-10-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/18460
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparison should respect case-s...

2017-10-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/18460
  
When you have a chance, could you review this please, @gatorsmile ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19456: [SPARK] [Scheduler] Configurable default scheduling mode

2017-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19456
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19456: [SPARK] [Scheduler] Configurable default scheduli...

2017-10-08 Thread blyncsy-david-lewis
GitHub user blyncsy-david-lewis opened a pull request:

https://github.com/apache/spark/pull/19456

[SPARK] [Scheduler] Configurable default scheduling mode

Pulling default values for scheduling mode from spark conf.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/blyncsy-david-lewis/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19456.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19456


commit f55ced3899cb80e23617dcacc3c548a88873f4c0
Author: David Lewis 
Date:   2017-10-08T18:25:19Z

allowing configuration of default scheduling mode and properties




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19443: [SPARK-22212][SQL][PySpark] Some SQL functions in Python...

2017-10-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19443
  
I could consider going ahead if the small fix makes all the things in 
`functions.py` consistent, but I guess it is not. I think I am less sure 
because, IIUC, we are not even clear on what to do with Scala-side on this 
although I think we should rather deprecate if I have to choose one side, for 
now.

Let's close this for now. I guess this one should not be worth enough 
spending much time.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18270
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18270
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82543/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-10-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18270
  
**[Test build #82543 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82543/testReport)**
 for PR 18270 at commit 
[`eac37f0`](https://github.com/apache/spark/commit/eac37f01e405e5ae48c424a41fada951b32ff912).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18747: [SPARK-20822][SQL] Generate code to directly get value f...

2017-10-08 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/18747
  
@cloud-fan could you please review this in my PRs at first?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19455: Branch 2.0

2017-10-08 Thread deeppark
Github user deeppark commented on the issue:

https://github.com/apache/spark/pull/19455
  
Hi All,

 Apologies I did it by mistake. I'll try to close it.


Regards,
Deepak

On 8 Oct 2017 4:23 pm, "UCB AMPLab"  wrote:

> Can one of the admins verify this patch?
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19082: [SPARK-21870][SQL] Split aggregation code into sm...

2017-10-08 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19082#discussion_r143359416
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 ---
@@ -797,26 +904,44 @@ case class HashAggregateExec(
 
 
 def updateRowInFastHashMap(isVectorized: Boolean): Option[String] = {
-  ctx.INPUT_ROW = fastRowBuffer
+  // We need to copy the aggregation row buffer to a local row first 
because each aggregate
+  // function directly updates the buffer when it finishes.
+  val localRowBuffer = ctx.freshName("localFastRowBuffer")
+  val initLocalRowBuffer = s"InternalRow $localRowBuffer = 
$fastRowBuffer.copy();"
--- End diff --

I just passed the local variable as each function argument;
```
/* 329 */ // do aggregate
/* 330 */ // copy aggregation row buffer to the local
/* 331 */ InternalRow agg_localFastRowBuffer = 
agg_fastAggBuffer.copy();
/* 332 */ // common sub-expressions
/* 333 */ boolean agg_isNull27 = false;
/* 334 */ long agg_value30 = -1L;
/* 335 */ if (!false) {
/* 336 */   agg_value30 = (long) inputadapter_value;
/* 337 */ }
/* 338 */ // process aggregate functions to update aggregation 
buffer
/* 339 */ agg_doAggregateVal_add2(inputadapter_value, agg_value30, 
agg_fastAggBuffer, agg_localFastRowBuffer, agg_isNull27);
/* 340 */ agg_doAggregateVal_add3(inputadapter_value, agg_value30, 
agg_fastAggBuffer, agg_localFastRowBuffer, agg_isNull27);
/* 341 */ agg_doAggregateVal_if1(inputadapter_value, agg_value30, 
agg_fastAggBuffer, agg_localFastRowBuffer, agg_isNull27);
/* 342 */
```
Since each split function directly updates an input row, we need to copy it 
to the local so that all the split functions can reference the old state.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-08 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19082
  
I checked the three pattens on `q66`;
```
   q66
master   15960
master + pr18931 14226
master + pr19082 + pr189311960
```
You can all the result 
[here](https://docs.google.com/spreadsheets/d/1RCLxauxLPR64znllFu55fgMo1YXAY7ihZy929XHZa5s/edit?usp=sharing).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19380: [SPARK-22157] [SQL] The uniux_timestamp method handles t...

2017-10-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19380
  
I'd close this for now. Optionally, we ask this case and discuss in the 
mailing list if this is important.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-10-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18270
  
**[Test build #82543 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82543/testReport)**
 for PR 18270 at commit 
[`eac37f0`](https://github.com/apache/spark/commit/eac37f01e405e5ae48c424a41fada951b32ff912).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19443: [SPARK-22212][SQL][PySpark] Some SQL functions in Python...

2017-10-08 Thread jsnowacki
Github user jsnowacki commented on the issue:

https://github.com/apache/spark/pull/19443
  
This PR fixes only the functions created using `_create_function`, which to 
what I found, were the only ones affected by the issue. Rest of the functions 
either have different assumption or eventually do the same thing but more 
explicitly. I didn't check all of the functions in `functions.py` nor the test 
does that, as the list of functions to check would have to be done manually. 
Nonetheless, from what I've seen all the explicitly defined functions get 
column object via `_to_java_column`, which check the type of argument and casts 
string column names to `Column` object. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on...

2017-10-08 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19370#discussion_r143354349
  
--- Diff: bin/find-spark-home.cmd ---
@@ -0,0 +1,44 @@
+@echo off
+
+rem
+rem Licensed to the Apache Software Foundation (ASF) under one or more
+rem contributor license agreements.  See the NOTICE file distributed with
+rem this work for additional information regarding copyright ownership.
+rem The ASF licenses this file to You under the Apache License, Version 2.0
+rem (the "License"); you may not use this file except in compliance with
+rem the License.  You may obtain a copy of the License at
+rem
+remhttp://www.apache.org/licenses/LICENSE-2.0
+rem
+rem Unless required by applicable law or agreed to in writing, software
+rem distributed under the License is distributed on an "AS IS" BASIS,
+rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+rem See the License for the specific language governing permissions and
+rem limitations under the License.
+rem
+
+rem Path to Python script finding SPARK_HOME
+set FIND_SPARK_HOME_SCRIPT=%~dp0find_spark_home.py
+
+rem Default to standard python interpreter unless told otherwise
+set PYTHON_RUNNER=python
--- End diff --

It should be easy for reviewers to double check too I guess.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on...

2017-10-08 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19370#discussion_r143354306
  
--- Diff: bin/find-spark-home.cmd ---
@@ -0,0 +1,44 @@
+@echo off
+
+rem
+rem Licensed to the Apache Software Foundation (ASF) under one or more
+rem contributor license agreements.  See the NOTICE file distributed with
+rem this work for additional information regarding copyright ownership.
+rem The ASF licenses this file to You under the Apache License, Version 2.0
+rem (the "License"); you may not use this file except in compliance with
+rem the License.  You may obtain a copy of the License at
+rem
+remhttp://www.apache.org/licenses/LICENSE-2.0
+rem
+rem Unless required by applicable law or agreed to in writing, software
+rem distributed under the License is distributed on an "AS IS" BASIS,
+rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+rem See the License for the specific language governing permissions and
+rem limitations under the License.
+rem
+
+rem Path to Python script finding SPARK_HOME
+set FIND_SPARK_HOME_SCRIPT=%~dp0find_spark_home.py
+
+rem Default to standard python interpreter unless told otherwise
+set PYTHON_RUNNER=python
--- End diff --

Would we be possible to resemble and follow the order of those codes in 
https://github.com/apache/spark/blob/master/bin/find-spark-home as possible as 
we can so that we can fix it up together in the future?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19454: [SPARK-22152][SPARK-18855 ][SQL] Added flatten functions...

2017-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19454
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19454: [SPARK-22152][SPARK-18855 ][SQL] Added flatten functions...

2017-10-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19454
  
**[Test build #82542 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82542/testReport)**
 for PR 19454 at commit 
[`261e45a`](https://github.com/apache/spark/commit/261e45a9a2298df2d4d1f9adc1ca1ced22e90b60).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19454: [SPARK-22152][SPARK-18855 ][SQL] Added flatten functions...

2017-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19454
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82542/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19454: [SPARK-22152][SPARK-18855 ][SQL] Added flatten functions...

2017-10-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19454
  
**[Test build #82542 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82542/testReport)**
 for PR 19454 at commit 
[`261e45a`](https://github.com/apache/spark/commit/261e45a9a2298df2d4d1f9adc1ca1ced22e90b60).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19369: [SPARK-22147][CORE] Removed redundant allocations...

2017-10-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19369


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19369: [SPARK-22147][CORE] Removed redundant allocations from B...

2017-10-08 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19369
  
Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19454: [SPARK-22152][SPARK-18855 ][SQL] Added flatten functions...

2017-10-08 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19454
  
This is missing from Python and Java. It also doesn't bother to implement 
this more efficiently than flatMap(identity). I am not sure this is worth while?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19454: [SPARK-22152][SPARK-18855 ][SQL] Added flatten functions...

2017-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19454
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82541/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19454: [SPARK-22152][SPARK-18855 ][SQL] Added flatten functions...

2017-10-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19454
  
**[Test build #82541 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82541/testReport)**
 for PR 19454 at commit 
[`075e7ef`](https://github.com/apache/spark/commit/075e7ef3f27af91c5190d039770cf15b08a66c81).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19454: [SPARK-22152][SPARK-18855 ][SQL] Added flatten functions...

2017-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19454
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19454: [SPARK-22152][SPARK-18855 ][SQL] Added flatten functions...

2017-10-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19454
  
**[Test build #82541 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82541/testReport)**
 for PR 19454 at commit 
[`075e7ef`](https://github.com/apache/spark/commit/075e7ef3f27af91c5190d039770cf15b08a66c81).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19454: [SPARK-22152][SPARK-18855 ][SQL] Added flatten functions...

2017-10-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19454
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19455: Branch 2.0

2017-10-08 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19455
  
@deeppark could you please close this if this is a PR that you did not 
intend?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19455: Branch 2.0

2017-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19455
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18270
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82540/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18270
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19449: [SPARK-22219][SQL] Refactor code to get a value f...

2017-10-08 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19449#discussion_r143351534
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -929,7 +929,7 @@ class CodegenContext {
 // be extremely expensive in certain cases, such as deeply-nested 
expressions which operate over
 // inputs with wide schemas. For more details on the performance 
issues that motivated this
 // flat, see SPARK-15680.
-if (SparkEnv.get != null && 
SparkEnv.get.conf.getBoolean("spark.sql.codegen.comments", false)) {
--- End diff --

Good to hear that
@tejasapatil Could you please let us know when it is available as a PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-10-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18270
  
**[Test build #82540 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82540/testReport)**
 for PR 18270 at commit 
[`1202bfa`](https://github.com/apache/spark/commit/1202bfa125775646d3e0872f38b5ce1f8a455ee7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19455: Branch 2.0

2017-10-08 Thread deeppark
GitHub user deeppark opened a pull request:

https://github.com/apache/spark/pull/19455

Branch 2.0

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19455.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19455


commit 5ec3e6680a091883369c002ae599d6b03f38c863
Author: Ergin Seyfe 
Date:   2016-10-11T19:51:08Z

[SPARK-17816][CORE][BRANCH-2.0] Fix ConcurrentModificationException issue 
in BlockStatusesAccumulator

## What changes were proposed in this pull request?
Replaced `BlockStatusesAccumulator` with `CollectionAccumulator` which is 
thread safe and few more cleanups.

## How was this patch tested?
Tested in master branch and cherry-picked.

Author: Ergin Seyfe 

Closes #15425 from seyfe/race_cond_jsonprotocal_branch-2.0.

commit e68e95e947045704d3e6a36bb31e104a99d3adcc
Author: Alexander Pivovarov 
Date:   2016-10-12T05:31:21Z

Fix hadoop.version in building-spark.md

Couple of mvn build examples use `-Dhadoop.version=VERSION` instead of 
actual version number

Author: Alexander Pivovarov 

Closes #15440 from apivovarov/patch-1.

(cherry picked from commit 299eb04ba05038c7dbb3ecf74a35d4bbfa456643)
Signed-off-by: Reynold Xin 

commit f3d82b53c42a971deedc04de6950b9228e5262ea
Author: Kousuke Saruta 
Date:   2016-10-12T05:36:57Z

[SPARK-17880][DOC] The url linking to `AccumulatorV2` in the document is 
incorrect.

## What changes were proposed in this pull request?

In `programming-guide.md`, the url which links to `AccumulatorV2` says 
`api/scala/index.html#org.apache.spark.AccumulatorV2` but 
`api/scala/index.html#org.apache.spark.util.AccumulatorV2` is correct.

## How was this patch tested?
manual test.

Author: Kousuke Saruta 

Closes #15439 from sarutak/SPARK-17880.

(cherry picked from commit b512f04f8e546843d5a3f35dcc6b675b5f4f5bc0)
Signed-off-by: Reynold Xin 

commit f12b74c02eec9e201fec8a16dac1f8e549c1b4f0
Author: cody koeninger 
Date:   2016-10-12T07:40:47Z

[SPARK-17853][STREAMING][KAFKA][DOC] make it clear that reusing group.id is 
bad

## What changes were proposed in this pull request?

Documentation fix to make it clear that reusing group id for different 
streams is super duper bad, just like it is with the underlying Kafka consumer.

## How was this patch tested?

I built jekyll doc and made sure it looked ok.

Author: cody koeninger 

Closes #15442 from koeninger/SPARK-17853.

(cherry picked from commit c264ef9b1918256a5018c7a42a1a2b42308ea3f7)
Signed-off-by: Reynold Xin 

commit 4dcbde48de6c46e2fd8ccfec732b8ff5c24f97a4
Author: Bryan Cutler 
Date:   2016-10-11T06:29:52Z

[SPARK-17808][PYSPARK] Upgraded version of Pyrolite to 4.13

## What changes were proposed in this pull request?
Upgraded to a newer version of Pyrolite which supports serialization of a 
BinaryType StructField for PySpark.SQL

## How was this patch tested?
Added a unit test which fails with a raised ValueError when using the 
previous version of Pyrolite 4.9 and Python3

Author: Bryan Cutler 

Closes #15386 from BryanCutler/pyrolite-upgrade-SPARK-17808.

(cherry picked from commit 658c7147f5bf637f36e8c66b9207d94b1e7c74c5)
Signed-off-by: Sean Owen 

commit 5451541d1113aa75bab80914ca51a913f6ba4753
Author: prigarg 
Date:   2016-10-12T17:14:45Z

[SPARK-17884][SQL] To resolve Null pointer exception when casting from 
empty string to interval type.

## What changes were proposed in this pull request?
This change adds a check in castToInterval method of Cast expression , such 
that if converted value is null , then isNull variable should be set to true.

Earlier, the expression Cast(Literal(), CalendarIntervalType) was throwing 
NullPointerException because of the above mentioned reason.

## How was 

[GitHub] spark pull request #19454: [SPARK-22152][SPARK-18855 ][SQL] Added flatten fu...

2017-10-08 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19454#discussion_r143351442
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2543,6 +2543,11 @@ class Dataset[T] private[sql](
 mapPartitions(_.flatMap(func))
 
   /**
+* Returns a new Dataset by by flattening a traversable collection into 
a collection itself.
+*/
--- End diff --

Could you please add `@since 2.3.0`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19454: [SPARK-22152][SPARK-18855 ][SQL] Added flatten functions...

2017-10-08 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19454
  
Could you please add test cases?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

2017-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19438
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82539/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

2017-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19438
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

2017-10-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19438
  
**[Test build #82539 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82539/testReport)**
 for PR 19438 at commit 
[`b3e976f`](https://github.com/apache/spark/commit/b3e976f5209651cbec5f8fd360a3ea43bd620d80).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19454: [SPARK-22152][SPARK-18855 ][SQL] Added flatten functions...

2017-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19454
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19454: Added flatten functions for RDD and Dataset

2017-10-08 Thread sohum2002
GitHub user sohum2002 opened a pull request:

https://github.com/apache/spark/pull/19454

Added flatten functions for RDD and Dataset

## What changes were proposed in this pull request?
This PR creates a _flatten_ function in two places: RDD and Dataset 
classes. This PR resolves the following issues: SPARK-22152 and SPARK-18855.

Author: Sohum Sachdev 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sohum2002/spark SPARK-18855_SPARK-18855

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19454.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19454


commit 075e7ef3f27af91c5190d039770cf15b08a66c81
Author: Sachathamakul, Patrachai (Agoda) 
Date:   2017-10-08T10:24:44Z

Added flatten functions for RDD and Dataset




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19389: [SPARK-22165][SQL] Resolve type conflicts between decima...

2017-10-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19389
  
ping?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19419: [SPARK-22188] [CORE] Adding security headers for ...

2017-10-08 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/19419#discussion_r143349235
  
--- Diff: conf/spark-defaults.conf.template ---
@@ -25,3 +25,10 @@
 # spark.serializer 
org.apache.spark.serializer.KryoSerializer
 # spark.driver.memory  5g
 # spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value 
-Dnumbers="one two three"
+
+# spark.ui.allowFramingFrom https://www.example.com/
--- End diff --

Hm, my last thought on this is that these are undocumented options, and 
maybe rightly so as they're kind of advanced options. But if so they don't 
really have a place in the default props template; what do they mean to a user?

I think they should actually be removed here, on second glance, but, 
wouldn't be a big deal to document them in the configuration doc as they're 
arguably non-internal, important functionality for some users.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-10-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18270
  
**[Test build #82540 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82540/testReport)**
 for PR 18270 at commit 
[`1202bfa`](https://github.com/apache/spark/commit/1202bfa125775646d3e0872f38b5ce1f8a455ee7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

2017-10-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19438
  
**[Test build #82539 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82539/testReport)**
 for PR 19438 at commit 
[`b3e976f`](https://github.com/apache/spark/commit/b3e976f5209651cbec5f8fd360a3ea43bd620d80).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

2017-10-08 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/19438
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

2017-10-08 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19438#discussion_r143348208
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -2738,7 +2738,7 @@ test_that("sampleBy() on a DataFrame", {
 })
 
 test_that("approxQuantile() on a DataFrame", {
-  l <- lapply(c(0:99), function(i) { list(i, 99 - i) })
+  l <- lapply(c(1:100), function(i) { list(i, 101 - i) })
--- End diff --

For data 0-99, before this pr, the 0.5 percentile is 50, after this pr, the 
percentile is 49. Both 49 and 50 is correct answer as 0.5 percentile for 0-99.
So we can fix the test by either change data to 1-100, or change the 
expected percentile to 49 if data unchanged (0-99).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

2017-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19438
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

2017-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19438
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82538/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

2017-10-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19438
  
**[Test build #82538 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82538/testReport)**
 for PR 19438 at commit 
[`b3e976f`](https://github.com/apache/spark/commit/b3e976f5209651cbec5f8fd360a3ea43bd620d80).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

2017-10-08 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/19438#discussion_r143347310
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -2738,7 +2738,7 @@ test_that("sampleBy() on a DataFrame", {
 })
 
 test_that("approxQuantile() on a DataFrame", {
-  l <- lapply(c(0:99), function(i) { list(i, 99 - i) })
+  l <- lapply(c(1:100), function(i) { list(i, 101 - i) })
--- End diff --

could you elaborate how this fix the test?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >