[GitHub] spark issue #19767: [SPARK-22543][SQL] fix java 64kb compile error for deepl...

2017-11-17 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19767
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19728
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19728
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83983/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19728
  
**[Test build #83983 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83983/testReport)**
 for PR 19728 at commit 
[`b9ca4ff`](https://github.com/apache/spark/commit/b9ca4ff8cefadcf02bd6c03bad53429da042f700).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19082
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19082
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83982/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19082
  
**[Test build #83982 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83982/testReport)**
 for PR 19082 at commit 
[`4c2816b`](https://github.com/apache/spark/commit/4c2816bd6513dc3ab2568584ecc8da6d1547fbf6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19767: [SPARK-22543][SQL] fix java 64kb compile error fo...

2017-11-17 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19767#discussion_r151830994
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -105,6 +105,41 @@ abstract class Expression extends TreeNode[Expression] 
{
   val isNull = ctx.freshName("isNull")
   val value = ctx.freshName("value")
   val ve = doGenCode(ctx, ExprCode("", isNull, value))
+
+  // TODO: support whole stage codegen too
+  if (ve.code.trim.length > 1024 && ctx.INPUT_ROW != null && 
ctx.currentVars == null) {
+val setIsNull = if (ve.isNull != "false" && ve.isNull != "true") {
+  val globalIsNull = ctx.freshName("globalIsNull")
+  ctx.addMutableState("boolean", globalIsNull, s"$globalIsNull = 
false;")
+  val localIsNull = ve.isNull
+  ve.isNull = globalIsNull
+  s"$globalIsNull = $localIsNull;"
+} else {
+  ""
+}
+
+val setValue = {
+  val globalValue = ctx.freshName("globalValue")
+  ctx.addMutableState(
+ctx.javaType(dataType), globalValue, s"$globalValue = 
${ctx.defaultValue(dataType)};")
+  val localValue = ve.value
+  ve.value = globalValue
+  s"$globalValue = $localValue;"
+}
+
+val funcName = ctx.freshName(nodeName)
+val funcFullName = ctx.addNewFunction(funcName,
+  s"""
+ |private void $funcName(InternalRow ${ctx.INPUT_ROW}) {
+ |  ${ve.code.trim}
+ |  $setValue
--- End diff --

Is that a bad idea to prepare some utility classes to store a pair (value, 
isNull) for this splitting cases? I feel class fields are valuable resources.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19769: [SPARK-12297][SQL] Adjust timezone for int96 data...

2017-11-17 Thread henryr
Github user henryr commented on a diff in the pull request:

https://github.com/apache/spark/pull/19769#discussion_r151830301
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetInteroperabilitySuite.scala
 ---
@@ -87,4 +95,109 @@ class ParquetInteroperabilitySuite extends 
ParquetCompatibilityTest with SharedS
   Row(Seq(2, 3
 }
   }
+
+  test("parquet timestamp conversion") {
+// Make a table with one parquet file written by impala, and one 
parquet file written by spark.
+// We should only adjust the timestamps in the impala file, and only 
if the conf is set
+val impalaFile = "test-data/impala_timestamp.parq"
+
+// here are the timestamps in the impala file, as they were saved by 
impala
+val impalaFileData =
+  Seq(
+"2001-01-01 01:01:01",
+"2002-02-02 02:02:02",
+"2003-03-03 03:03:03"
+  ).map(java.sql.Timestamp.valueOf)
+val impalaPath = 
Thread.currentThread().getContextClassLoader.getResource(impalaFile)
+  .toURI.getPath
+withTempPath { tableDir =>
+  val ts = Seq(
+"2004-04-04 04:04:04",
+"2005-05-05 05:05:05",
+"2006-06-06 06:06:06"
+  ).map { s => java.sql.Timestamp.valueOf(s) }
+  import testImplicits._
+  // match the column names of the file from impala
+  val df = 
spark.createDataset(ts).toDF().repartition(1).withColumnRenamed("value", "ts")
+  df.write.parquet(tableDir.getAbsolutePath)
+  FileUtils.copyFile(new File(impalaPath), new File(tableDir, 
"part-1.parq"))
+
+  Seq(false, true).foreach { int96TimestampConversion =>
+Seq(false, true).foreach { vectorized =>
+  withSQLConf(
+  (SQLConf.PARQUET_OUTPUT_TIMESTAMP_TYPE.key,
+SQLConf.ParquetOutputTimestampType.INT96.toString),
+  (SQLConf.PARQUET_INT96_TIMESTAMP_CONVERSION.key, 
int96TimestampConversion.toString()),
+  (SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key, 
vectorized.toString())
+  ) {
+val readBack = 
spark.read.parquet(tableDir.getAbsolutePath).collect()
+assert(readBack.size === 6)
+// if we apply the conversion, we'll get the "right" values, 
as saved by impala in the
+// original file.  Otherwise, they're off by the local 
timezone offset, set to
+// America/Los_Angeles in tests
+val impalaExpectations = if (int96TimestampConversion) {
+  impalaFileData
+} else {
+  impalaFileData.map { ts =>
+DateTimeUtils.toJavaTimestamp(DateTimeUtils.convertTz(
+  DateTimeUtils.fromJavaTimestamp(ts),
+  DateTimeUtils.TimeZoneUTC,
+  DateTimeUtils.getTimeZone(conf.sessionLocalTimeZone)))
+  }
+}
+val fullExpectations = (ts ++ 
impalaExpectations).map(_.toString).sorted.toArray
+val actual = readBack.map(_.getTimestamp(0).toString).sorted
+withClue(s"applyConversion = $int96TimestampConversion; 
vectorized = $vectorized") {
+  assert(fullExpectations === actual)
+
+  // Now test that the behavior is still correct even with a 
filter which could get
+  // pushed down into parquet.  We don't need extra handling 
for pushed down
+  // predicates because (a) in ParquetFilters, we ignore 
TimestampType and (b) parquet
--- End diff --

how about adding an assertion or comment to ParquetFilters that it would be 
unsafe to add timestamp support?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19769: [SPARK-12297][SQL] Adjust timezone for int96 data...

2017-11-17 Thread henryr
Github user henryr commented on a diff in the pull request:

https://github.com/apache/spark/pull/19769#discussion_r151830046
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 ---
@@ -355,17 +362,33 @@ class ParquetFileFormat
   fileSplit.getLocations,
   null)
 
+  val sharedConf = broadcastedHadoopConf.value.value
+  // PARQUET_INT96_TIMESTAMP_CONVERSION says to apply timezone 
conversions to int96 timestamps'
+  // *only* if the file was created by something other than 
"parquet-mr", so check the actual
+  // writer here for this file.  We have to do this per-file, as each 
file in the table may
+  // have different writers.
+  def isCreatedByParquetMr(): Boolean = {
+val footer = ParquetFileReader.readFooter(sharedConf, 
fileSplit.getPath, SKIP_ROW_GROUPS)
--- End diff --

Does it make more sense to have VectorizedParquetRecordReader() do this in 
initialize() rather than here? There doesn't seem to be a way to share footer 
metadata across record readers, which would be helpful to avoid calling 
readFooter() multiple times.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19769: [SPARK-12297][SQL] Adjust timezone for int96 data...

2017-11-17 Thread henryr
Github user henryr commented on a diff in the pull request:

https://github.com/apache/spark/pull/19769#discussion_r151830623
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
 ---
@@ -430,9 +439,11 @@ private void readBinaryBatch(int rowId, int num, 
WritableColumnVector column) {
 } else if (column.dataType() == DataTypes.TimestampType) {
   for (int i = 0; i < num; i++) {
 if (defColumn.readInteger() == maxDefLevel) {
-  column.putLong(rowId + i,
-  // Read 12 bytes for INT96
-  
ParquetRowConverter.binaryToSQLTimestamp(data.readBinary(12)));
+  // Read 12 bytes for INT96
+  long rawTime = 
ParquetRowConverter.binaryToSQLTimestamp(data.readBinary(12));
+  long adjTime =
+  convertTz == null ? rawTime : 
DateTimeUtils.convertTz(rawTime, convertTz, UTC);
--- End diff --

it might be worth hoisting the conditional here as high as possible:

if (convertTz == null) {
  for (int i = 0; i < num; i++) { /// etc
} else {
  for (int i = 0; i < num; i++) { /// etc
}

In this case the code duplication might be worth avoiding a branch on every 
read.   


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19769: [SPARK-12297][SQL] Adjust timezone for int96 data...

2017-11-17 Thread henryr
Github user henryr commented on a diff in the pull request:

https://github.com/apache/spark/pull/19769#discussion_r151830543
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
 ---
@@ -298,7 +304,10 @@ private void decodeDictionaryIds(
 // TODO: Convert dictionary of Binaries to dictionary of Longs
 if (!column.isNullAt(i)) {
   Binary v = 
dictionary.decodeToBinary(dictionaryIds.getDictId(i));
-  column.putLong(i, 
ParquetRowConverter.binaryToSQLTimestamp(v));
+  long rawTime = ParquetRowConverter.binaryToSQLTimestamp(v);
+  long adjTime =
--- End diff --

is it practical to consider caching the decoded and converted dictionary 
values? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19778: [SPARK-22550][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19778
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83980/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19778: [SPARK-22550][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19778
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19778: [SPARK-22550][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19778
  
**[Test build #83980 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83980/testReport)**
 for PR 19778 at commit 
[`feecef0`](https://github.com/apache/spark/commit/feecef0995d55a90c3cac8c4fc5c3680319b3def).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19778: [SPARK-22550][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19778
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83979/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19778: [SPARK-22550][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19778
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19778: [SPARK-22550][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19778
  
**[Test build #83979 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83979/testReport)**
 for PR 19778 at commit 
[`06b103e`](https://github.com/apache/spark/commit/06b103e556756d567ae45626a9a62b8ffb777e36).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19777: [SPARK-22549][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19777
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83978/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19777: [SPARK-22549][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19777
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19777: [SPARK-22549][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19777
  
**[Test build #83978 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83978/testReport)**
 for PR 19777 at commit 
[`1714c88`](https://github.com/apache/spark/commit/1714c883d193f96e43c19e72a96c15c23d5fc193).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "...

2017-11-17 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17436#discussion_r151828392
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -140,6 +140,13 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val COLUMN_VECTOR_OFFHEAP_ENABLED =
+buildConf("spark.sql.columnVector.offheap.enable")
+  .internal()
+  .doc("When true, use OffHeapColumnVector in ColumnarBatch.")
+  .booleanConf
+  .createWithDefault(true)
--- End diff --

Sorry, this was test code in my local environment.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19728
  
**[Test build #83983 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83983/testReport)**
 for PR 19728 at commit 
[`b9ca4ff`](https://github.com/apache/spark/commit/b9ca4ff8cefadcf02bd6c03bad53429da042f700).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-17 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19728
  
Sure done.  
#19777 for `concat_ws`. #19778 for `elt`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19082
  
**[Test build #83982 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83982/testReport)**
 for PR 19082 at commit 
[`4c2816b`](https://github.com/apache/spark/commit/4c2816bd6513dc3ab2568584ecc8da6d1547fbf6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19082
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19082
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83981/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19082
  
**[Test build #83981 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83981/testReport)**
 for PR 19082 at commit 
[`28b47e5`](https://github.com/apache/spark/commit/28b47e52bb262edaf73ad2aa44a2db68eb3a9847).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19082
  
**[Test build #83981 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83981/testReport)**
 for PR 19082 at commit 
[`28b47e5`](https://github.com/apache/spark/commit/28b47e52bb262edaf73ad2aa44a2db68eb3a9847).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19778: [SPARK-22550][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19778
  
**[Test build #83980 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83980/testReport)**
 for PR 19778 at commit 
[`feecef0`](https://github.com/apache/spark/commit/feecef0995d55a90c3cac8c4fc5c3680319b3def).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19778: [SPARK-22550][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19778
  
**[Test build #83979 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83979/testReport)**
 for PR 19778 at commit 
[`06b103e`](https://github.com/apache/spark/commit/06b103e556756d567ae45626a9a62b8ffb777e36).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

2017-11-17 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19774#discussion_r151826928
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -689,6 +689,11 @@ case class DescribeColumnCommand(
   buffer += Row("distinct_count", 
cs.map(_.distinctCount.toString).getOrElse("NULL"))
   buffer += Row("avg_col_len", 
cs.map(_.avgLen.toString).getOrElse("NULL"))
   buffer += Row("max_col_len", 
cs.map(_.maxLen.toString).getOrElse("NULL"))
+  buffer ++= cs.flatMap(_.histogram.map { hist =>
+val header = Row("histogram", s"height: ${hist.height}, 
num_of_bins: ${hist.bins.length}")
+Seq(header) ++ hist.bins.map(bin =>
+  Row("", s"lower_bound: ${bin.lo}, upper_bound: ${bin.hi}, 
distinct_count: ${bin.ndv}"))
--- End diff --

add a comment that the empty column is for indentation and better 
readability?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19778: [SPARK-22550][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-17 Thread kiszk
GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/19778

[SPARK-22550][SQL] Fix 64KB JVM bytecode limit problem with elt

## What changes were proposed in this pull request?

This PR changes `elt` code generation to place generated code for 
expression for arguments into separated methods if these size could be large.
This PR resolved the case of `elt` with a lot of argument

## How was this patch tested?

Added new test cases into `StringExpressionsSuite`


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kiszk/spark SPARK-22550

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19778.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19778


commit 06b103e556756d567ae45626a9a62b8ffb777e36
Author: Kazuaki Ishizaki 
Date:   2017-11-18T03:07:31Z

initial commit




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19777: [SPARK-22549][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19777
  
**[Test build #83978 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83978/testReport)**
 for PR 19777 at commit 
[`1714c88`](https://github.com/apache/spark/commit/1714c883d193f96e43c19e72a96c15c23d5fc193).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

2017-11-17 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/19774
  
seems good to me. ping @gatorsmile @cloud-fan @HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19777: [SPARK-22549][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-17 Thread kiszk
GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/19777

[SPARK-22549][SQL] Fix 64KB JVM bytecode limit problem with concat_ws

## What changes were proposed in this pull request?

This PR changes `concat_ws` code generation to place generated code for 
expression for arguments into separated methods if these size could be large.
This PR resolved the case of `concat_ws` with a lot of argument

## How was this patch tested?

Added new test cases into `StringExpressionsSuite`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kiszk/spark SPARK-22549

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19777.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19777


commit 1714c883d193f96e43c19e72a96c15c23d5fc193
Author: Kazuaki Ishizaki 
Date:   2017-11-18T02:40:00Z

initial commit




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19760: [SPARK-22533][core] Handle deprecated names in ConfigEnt...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19760
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83977/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19760: [SPARK-22533][core] Handle deprecated names in ConfigEnt...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19760
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19760: [SPARK-22533][core] Handle deprecated names in ConfigEnt...

2017-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19760
  
**[Test build #83977 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83977/testReport)**
 for PR 19760 at commit 
[`8afc56a`](https://github.com/apache/spark/commit/8afc56a825fdb4bce8aac683348bbd4223a29acc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19776: [SPARK-22548][SQL] Incorrect nested AND expression pushe...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19776
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...

2017-11-17 Thread jliwork
GitHub user jliwork opened a pull request:

https://github.com/apache/spark/pull/19776

[SPARK-22548][SQL] Incorrect nested AND expression pushed down to JDB…

…C data source

## What changes were proposed in this pull request?

Let’s say I have a nested AND expression shown below and p2 can not be 
pushed down, 

(p1 AND p2) OR p3

In current Spark code, during data source filter translation, (p1 AND p2) 
is returned as p1 only and p2 is simply lost. This issue occurs with JDBC data 
source and is similar to SPARK-12218 for Parquet. When we have AND nested below 
another expression, we should either push both legs or nothing. Note that 1) 
the current Spark code will always split conjunctive predicate before it 
determines if a predicate can be pushed down or not; 2) the same translation 
method is also called by Data Source V2. 

## How was this patch tested?

Added new unit test cases to JDBCSuite


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jliwork/spark spark-22548

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19776.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19776


commit 58de88c21210d469b8ef14b1f23764c31ca5651e
Author: Jia Li 
Date:   2017-11-18T01:15:01Z

[SPARK-22548][SQL] Incorrect nested AND expression pushed down to JDBC data 
source




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19390: [SPARK-18935][MESOS] Fix dynamic reservations on ...

2017-11-17 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19390#discussion_r151821865
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala
 ---
@@ -228,24 +254,15 @@ trait MesosSchedulerUtils extends Logging {
 (attr.getName, attr.getText.getValue.split(',').toSet)
   }
 
-
-  /** Build a Mesos resource protobuf object */
-  protected def createResource(resourceName: String, quantity: Double): 
Protos.Resource = {
-Resource.newBuilder()
-  .setName(resourceName)
-  .setType(Value.Type.SCALAR)
-  .setScalar(Value.Scalar.newBuilder().setValue(quantity).build())
-  .build()
-  }
-
   /**
* Converts the attributes from the resource offer into a Map of name to 
Attribute Value
* The attribute values are the mesos attribute types and they are
*
* @param offerAttributes the attributes offered
* @return
*/
-  protected def toAttributeMap(offerAttributes: JList[Attribute]): 
Map[String, GeneratedMessage] = {
+  protected def toAttributeMap(offerAttributes: JList[Attribute])
+: Map[String, GeneratedMessageV3] = {
--- End diff --

Convention is to double-indent everything that is part of the method 
signature, so that it's easy to visually separate from the method body.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19772: [SPARK-22538][ML] SQLTransformer should not unper...

2017-11-17 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19772#discussion_r151815120
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/SQLTransformer.scala ---
@@ -70,7 +70,8 @@ class SQLTransformer @Since("1.6.0") (@Since("1.6.0") 
override val uid: String)
 dataset.createOrReplaceTempView(tableName)
 val realStatement = $(statement).replace(tableIdentifier, tableName)
 val result = dataset.sparkSession.sql(realStatement)
-dataset.sparkSession.catalog.dropTempView(tableName)
--- End diff --

Hmm, but the behavior is explicitly defined at `Catalog`. This should be a 
public API. We may not easily change it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19771: [SPARK-22544][SS]FileStreamSource should use its ...

2017-11-17 Thread zsxwing
Github user zsxwing closed the pull request at:

https://github.com/apache/spark/pull/19771


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19771: [SPARK-22544][SS]FileStreamSource should use its own had...

2017-11-17 Thread marmbrus
Github user marmbrus commented on the issue:

https://github.com/apache/spark/pull/19771
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19771: [SPARK-22544][SS]FileStreamSource should use its own had...

2017-11-17 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/19771
  
Thanks! Merging to master and 2.2.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19772: [SPARK-22538][ML] SQLTransformer should not unper...

2017-11-17 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19772#discussion_r151813095
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/SQLTransformer.scala ---
@@ -70,7 +70,8 @@ class SQLTransformer @Since("1.6.0") (@Since("1.6.0") 
override val uid: String)
 dataset.createOrReplaceTempView(tableName)
 val realStatement = $(statement).replace(tableIdentifier, tableName)
 val result = dataset.sparkSession.sql(realStatement)
-dataset.sparkSession.catalog.dropTempView(tableName)
--- End diff --

Yes. I also think it doesn't make a lot sense.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19760: [SPARK-22533][core] Handle deprecated names in ConfigEnt...

2017-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19760
  
**[Test build #83977 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83977/testReport)**
 for PR 19760 at commit 
[`8afc56a`](https://github.com/apache/spark/commit/8afc56a825fdb4bce8aac683348bbd4223a29acc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19388: [SPARK-22162] Executors and the driver should use consis...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19388
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19388: [SPARK-22162] Executors and the driver should use consis...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19388
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83976/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19388: [SPARK-22162] Executors and the driver should use consis...

2017-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19388
  
**[Test build #83976 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83976/testReport)**
 for PR 19388 at commit 
[`500c73c`](https://github.com/apache/spark/commit/500c73cc96290efe0194e371ab84e0cda863347d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class StageInfo extends Serializable `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19763: [SPARK-22537][core] Aggregation of map output sta...

2017-11-17 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/19763#discussion_r151801786
  
--- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ---
@@ -472,17 +474,36 @@ private[spark] class MapOutputTrackerMaster(
 shuffleStatuses.get(shuffleId).map(_.findMissingPartitions())
   }
 
+  /**
+   * Try to equally divide Range(0, num) to divisor slices
+   */
+  def equallyDivide(num: Int, divisor: Int): Iterator[Seq[Int]] = {
+assert(divisor > 0, "Divisor should be positive")
+val (each, remain) = (num / divisor, num % divisor)
+val (smaller, bigger) = (0 until num).splitAt((divisor-remain) * each)
+if (each != 0) {
+  smaller.grouped(each) ++ bigger.grouped(each + 1)
+} else {
+  bigger.grouped(each + 1)
+}
+  }
+
   /**
* Return statistics about all of the outputs for a given shuffle.
*/
   def getStatistics(dep: ShuffleDependency[_, _, _]): MapOutputStatistics 
= {
 shuffleStatuses(dep.shuffleId).withMapStatuses { statuses =>
   val totalSizes = new Array[Long](dep.partitioner.numPartitions)
-  for (s <- statuses) {
-for (i <- 0 until totalSizes.length) {
-  totalSizes(i) += s.getSizeForBlock(i)
+  val parallelism = conf.getInt("spark.adaptive.map.statistics.cores", 
8)
+
+  val mapStatusSubmitTasks = equallyDivide(totalSizes.length, 
parallelism).map {
--- End diff --

Doing this is not cheap. I would add a config and only run this in multiple 
threads when `#mapper * #shuffle_partitions` is large.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19763: [SPARK-22537][core] Aggregation of map output sta...

2017-11-17 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/19763#discussion_r151801570
  
--- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ---
@@ -473,16 +477,41 @@ private[spark] class MapOutputTrackerMaster(
   }
 
   /**
+   * Try to equally divide Range(0, num) to divisor slices
+   */
+  def equallyDivide(num: Int, divisor: Int): Iterator[Seq[Int]] = {
+assert(divisor > 0, "Divisor should be positive")
+val (each, remain) = (num / divisor, num % divisor)
+val (smaller, bigger) = (0 until num).splitAt((divisor-remain) * each)
+if (each != 0) {
+  smaller.grouped(each) ++ bigger.grouped(each + 1)
+} else {
+  bigger.grouped(each + 1)
+}
+  }
+
+  /**
* Return statistics about all of the outputs for a given shuffle.
*/
   def getStatistics(dep: ShuffleDependency[_, _, _]): MapOutputStatistics 
= {
 shuffleStatuses(dep.shuffleId).withMapStatuses { statuses =>
   val totalSizes = new Array[Long](dep.partitioner.numPartitions)
-  for (s <- statuses) {
-for (i <- 0 until totalSizes.length) {
-  totalSizes(i) += s.getSizeForBlock(i)
-}
+  val mapStatusSubmitTasks = ArrayBuffer[Future[_]]()
+  var taskSlices = parallelism
+
+  equallyDivide(totalSizes.length, taskSlices).foreach {
+reduceIds =>
+  mapStatusSubmitTasks += threadPoolMapStats.submit(
+new Runnable {
+  override def run(): Unit = {
+for (s <- statuses; i <- reduceIds) {
+  totalSizes(i) += s.getSizeForBlock(i)
+}
+  }
+}
+  )
   }
+  mapStatusSubmitTasks.foreach(_.get())
--- End diff --

Don't use `scala.concurrent.ExecutionContext.Implicits.global`. You need to 
create a thread pool.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19388: [SPARK-22162] Executors and the driver should use consis...

2017-11-17 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19388
  
High level question: do you need to create the commit id based on the rdd + 
stage ids, or could it be something else (like a monotonically increasing 
value, or a uuid)? That would probably simplify the code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19751: [SPARK-20653][core] Add cleaning of old elements from th...

2017-11-17 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19751
  
Not retesting since there will probably be feedback and the failure seems 
unrelated, so better just wait.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17436
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83973/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17436
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...

2017-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17436
  
**[Test build #83973 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83973/testReport)**
 for PR 17436 at commit 
[`ac2a7da`](https://github.com/apache/spark/commit/ac2a7da48790e632ba681d56e573f5cee764c464).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19751: [SPARK-20653][core] Add cleaning of old elements from th...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19751
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19751: [SPARK-20653][core] Add cleaning of old elements from th...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19751
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83975/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19751: [SPARK-20653][core] Add cleaning of old elements from th...

2017-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19751
  
**[Test build #83975 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83975/testReport)**
 for PR 19751 at commit 
[`e09a376`](https://github.com/apache/spark/commit/e09a376edc6707e564e9f89a202c5347f37ad738).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19728
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83974/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19728
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19728
  
**[Test build #83974 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83974/testReport)**
 for PR 19728 at commit 
[`22019b1`](https://github.com/apache/spark/commit/22019b13b4fa2e8a20d4507d156a34f91e80a07d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19388: [SPARK-22162] Executors and the driver should use consis...

2017-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19388
  
**[Test build #83976 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83976/testReport)**
 for PR 19388 at commit 
[`500c73c`](https://github.com/apache/spark/commit/500c73cc96290efe0194e371ab84e0cda863347d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...

2017-11-17 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17436
  
LGTM except a few minor comment, please update the PR title and 
description, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "...

2017-11-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17436#discussion_r151765263
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadBenchmark.scala
 ---
@@ -260,6 +261,7 @@ object ParquetReadBenchmark {
   def stringWithNullsScanBenchmark(values: Int, fractionOfNulls: Double): 
Unit = {
 withTempPath { dir =>
   withTempTable("t1", "tempTable") {
+val enableOffHeapColumnVector = 
spark.sqlContext.conf.offHeapColumnVectorEnabled
--- End diff --

nit: spark.sessionState.conf.offHeapColumnVectorEnabled


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "...

2017-11-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17436#discussion_r151764870
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala
 ---
@@ -62,7 +69,11 @@ case class InMemoryTableScanExec(
 
   private def createAndDecompressColumn(cachedColumnarBatch: CachedBatch): 
ColumnarBatch = {
 val rowCount = cachedColumnarBatch.numRows
-val columnVectors = OnHeapColumnVector.allocateColumns(rowCount, 
columnarBatchSchema)
+val columnVectors = if (!conf.offHeapColumnVectorEnabled) {
--- End diff --

only enable it when `TaskContext.get != null`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "...

2017-11-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17436#discussion_r151764498
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java
 ---
@@ -101,9 +101,13 @@
   private boolean returnColumnarBatch;
 
   /**
-   * The default config on whether columnarBatch should be offheap.
+   * The config on whether columnarBatch should be offheap.
--- End diff --

nit: the memory mode of the columnarBatch


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "...

2017-11-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17436#discussion_r151764317
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -140,6 +140,13 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val COLUMN_VECTOR_OFFHEAP_ENABLED =
+buildConf("spark.sql.columnVector.offheap.enable")
+  .internal()
+  .doc("When true, use OffHeapColumnVector in ColumnarBatch.")
+  .booleanConf
+  .createWithDefault(true)
--- End diff --

hey let's not change the existing behavior.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-17 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19728
  
can you split it into 3 PRs? The approaches for these 3 expression are 
quite different.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19775: [SPARK-22343][core] Add support for publishing Spark met...

2017-11-17 Thread erikerlandson
Github user erikerlandson commented on the issue:

https://github.com/apache/spark/pull/19775
  
@matyix thanks for re-submitting!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19751: [SPARK-20653][core] Add cleaning of old elements from th...

2017-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19751
  
**[Test build #83975 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83975/testReport)**
 for PR 19751 at commit 
[`e09a376`](https://github.com/apache/spark/commit/e09a376edc6707e564e9f89a202c5347f37ad738).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19751: [SPARK-20653][core] Add cleaning of old elements from th...

2017-11-17 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19751
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19774
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19774
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83972/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19760: [SPARK-22533][core] Handle deprecated names in ConfigEnt...

2017-11-17 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19760
  
> my future plan is to move config related stuff to a new maven module

Sounds good. I'll try to update this before leaving on vacation, but may 
not be able to...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

2017-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19774
  
**[Test build #83972 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83972/testReport)**
 for PR 19774 at commit 
[`9bfa80c`](https://github.com/apache/spark/commit/9bfa80cca04a3b00e0fc2b02beb45c56f2058a34).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can break when...

2017-11-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19257
  
Thank you for the decision, @cloud-fan . It's great to see the progress on 
this!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19769: [SPARK-12297][SQL] Adjust timezone for int96 data from i...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19769
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19769: [SPARK-12297][SQL] Adjust timezone for int96 data from i...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19769
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83971/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19769: [SPARK-12297][SQL] Adjust timezone for int96 data from i...

2017-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19769
  
**[Test build #83971 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83971/testReport)**
 for PR 19769 at commit 
[`9bb4cf0`](https://github.com/apache/spark/commit/9bb4cf0514dddc005b90ddb17a22d3b05be929e5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19728
  
**[Test build #83974 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83974/testReport)**
 for PR 19728 at commit 
[`22019b1`](https://github.com/apache/spark/commit/22019b13b4fa2e8a20d4507d156a34f91e80a07d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...

2017-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17436
  
**[Test build #83973 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83973/testReport)**
 for PR 17436 at commit 
[`ac2a7da`](https://github.com/apache/spark/commit/ac2a7da48790e632ba681d56e573f5cee764c464).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19730
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83970/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19730
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19730
  
**[Test build #83970 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83970/testReport)**
 for PR 19730 at commit 
[`83fef40`](https://github.com/apache/spark/commit/83fef403b92a96a13421901d161a0df5e6a6d7b3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19725: [DO NOT REVIEW][SPARK-22042] [SQL] Insert shuffle...

2017-11-17 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19725#discussion_r151741374
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/InjectPlaceholderExchange.scala
 ---
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.exchange
+
+import org.apache.spark.sql.catalyst.expressions.SortOrder
+import org.apache.spark.sql.catalyst.plans.physical._
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.internal.SQLConf
+
+case class InjectPlaceholderExchange(conf: SQLConf) extends 
Rule[SparkPlan] {
+  private def defaultNumPreShufflePartitions: Int = 
conf.numShufflePartitions
+
+  /**
+   * Given a required distribution, returns a partitioning that satisfies 
that distribution.
+   * @param requiredDistribution The distribution that is required by the 
operator
+   * @param numPartitions Used when the distribution doesn't require a 
specific number of partitions
+   */
+  private def createPartitioning(requiredDistribution: Distribution,
+ numPartitions: Int): Partitioning = {
+requiredDistribution match {
+  case AllTuples => SinglePartition
+  case ClusteredDistribution(clustering, desiredPartitions) =>
+HashPartitioning(clustering, 
desiredPartitions.getOrElse(numPartitions))
+  case OrderedDistribution(ordering) => RangePartitioning(ordering, 
numPartitions)
+  case dist => sys.error(s"Do not know how to satisfy distribution 
$dist")
+}
+  }
+
+  def apply(plan: SparkPlan): SparkPlan = plan.transformUp {
+case operator @ ShuffleExchangeExec(partitioning, child, _) =>
+  child.children match {
+case ShuffleExchangeExec(childPartitioning, baseChild, _)::Nil =>
--- End diff --

No white spaces around `::` intended?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19717: [SPARK-18278] [Submission] Spark on Kubernetes - basic s...

2017-11-17 Thread liyinan926
Github user liyinan926 commented on the issue:

https://github.com/apache/spark/pull/19717
  
@vanzin @jiangxb1987 @mridulm PTAL. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

2017-11-17 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19774#discussion_r151741011
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -689,6 +689,11 @@ case class DescribeColumnCommand(
   buffer += Row("distinct_count", 
cs.map(_.distinctCount.toString).getOrElse("NULL"))
   buffer += Row("avg_col_len", 
cs.map(_.avgLen.toString).getOrElse("NULL"))
   buffer += Row("max_col_len", 
cs.map(_.maxLen.toString).getOrElse("NULL"))
+  buffer ++= cs.flatMap(_.histogram.map { hist =>
--- End diff --

Thanks for your feedback. IMHO I am not sure that it would be easier to 
read. But if someone else agrees with you I can change it, thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r151741098
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -125,19 +138,43 @@ case class ConcatWs(children: Seq[Expression])
 if (children.forall(_.dataType == StringType)) {
   // All children are strings. In that case we can construct a fixed 
size array.
   val evals = children.map(_.genCode(ctx))
-
-  val inputs = evals.map { eval =>
-s"${eval.isNull} ? (UTF8String) null : ${eval.value}"
-  }.mkString(", ")
-
-  ev.copy(evals.map(_.code).mkString("\n") + s"""
-UTF8String ${ev.value} = UTF8String.concatWs($inputs);
+  val separator = evals.head
+  val strings = evals.tail
+  val argNums = strings.length
+  val args = ctx.freshName("args")
+
+  val inputs = strings.zipWithIndex.map { case (eval, index) =>
+if (eval.isNull != "true") {
+  s"""
+ ${eval.code}
+ if (!${eval.isNull}) {
+   $args[$index] = ${eval.value};
+ }
+   """
+} else {
+  ""
+}
+  }
+  val codes = s"${separator.code}\n" +
+(if (ctx.INPUT_ROW != null && ctx.currentVars == null) {
+   ctx.splitExpressions(inputs, "valueConcatWs",
+ ("InternalRow", ctx.INPUT_ROW) :: ("UTF8String[]", args) :: 
Nil)
+ } else {
+   inputs.mkString("\n")
+ })
+  ev.copy(s"""
+UTF8String[] $args = new UTF8String[$argNums];
+$codes
+UTF8String ${ev.value} = UTF8String.concatWs(${separator.value}, 
$args);
--- End diff --

nit:
```
val code = if (ctx.INPUT_ROW != null && ctx.currentVars == null)
...
ev.copy(code = s"""
  UTF8String[] $args = new UTF8String[$argNums];
  ${separator.code}
  ...
""")
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r151740628
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -125,19 +138,43 @@ case class ConcatWs(children: Seq[Expression])
 if (children.forall(_.dataType == StringType)) {
   // All children are strings. In that case we can construct a fixed 
size array.
   val evals = children.map(_.genCode(ctx))
-
-  val inputs = evals.map { eval =>
-s"${eval.isNull} ? (UTF8String) null : ${eval.value}"
-  }.mkString(", ")
-
-  ev.copy(evals.map(_.code).mkString("\n") + s"""
-UTF8String ${ev.value} = UTF8String.concatWs($inputs);
+  val separator = evals.head
+  val strings = evals.tail
+  val argNums = strings.length
--- End diff --

nit: `numArgs`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2017-11-17 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r151739773
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -318,16 +318,28 @@ case class AlterTableChangeColumnCommand(
   s"'${newColumn.name}' with type '${newColumn.dataType}'")
 }
 
+val changeSchema = originColumn.dataType != newColumn.dataType
--- End diff --

What do you think about renaming the val to `typeChanged`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2017-11-17 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r151740225
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -1459,6 +1459,11 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 // Ensure that change column will preserve other metadata fields.
 sql("ALTER TABLE dbx.tab1 CHANGE COLUMN col1 col1 INT COMMENT 'this is 
col1'")
 assert(getMetadata("col1").getString("key") == "value")
+
+// Ensure that change column type take effect
--- End diff --

s/change/changing + s/take/takes


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2017-11-17 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r151739604
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -318,16 +318,28 @@ case class AlterTableChangeColumnCommand(
   s"'${newColumn.name}' with type '${newColumn.dataType}'")
 }
 
+val changeSchema = originColumn.dataType != newColumn.dataType
 val newSchema = table.schema.fields.map { field =>
   if (field.name == originColumn.name) {
-// Create a new column from the origin column with the new comment.
-addComment(field, newColumn.getComment)
+var newField = field
--- End diff --

I'd recommend getting rid of this `var` and re-writting the code as follows:

```
val newField = newColumn.getComment.map(...).getOrElse(field)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r151738992
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -63,15 +63,28 @@ case class Concat(children: Seq[Expression]) extends 
Expression with ImplicitCas
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 val evals = children.map(_.genCode(ctx))
-val inputs = evals.map { eval =>
-  s"${eval.isNull} ? null : ${eval.value}"
-}.mkString(", ")
-ev.copy(evals.map(_.code).mkString("\n") + s"""
-  boolean ${ev.isNull} = false;
-  UTF8String ${ev.value} = UTF8String.concat($inputs);
-  if (${ev.value} == null) {
-${ev.isNull} = true;
-  }
+val numArgs = evals.length
--- End diff --

nit: we can inline it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

2017-11-17 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19774#discussion_r151737674
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -689,6 +689,11 @@ case class DescribeColumnCommand(
   buffer += Row("distinct_count", 
cs.map(_.distinctCount.toString).getOrElse("NULL"))
   buffer += Row("avg_col_len", 
cs.map(_.avgLen.toString).getOrElse("NULL"))
   buffer += Row("max_col_len", 
cs.map(_.maxLen.toString).getOrElse("NULL"))
+  buffer ++= cs.flatMap(_.histogram.map { hist =>
--- End diff --

I'm pretty sure that for-comprehension would make the code read easier.

```scala
for {
  c <- cs
  hist <- c.histogram
  ...
} yield ...
```

Let me know if you need help with that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19747: [Spark-22431][SQL] Ensure that the datatype in the schem...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19747
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83969/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19772: [SPARK-22538][ML] SQLTransformer should not unper...

2017-11-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19772


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19747: [Spark-22431][SQL] Ensure that the datatype in the schem...

2017-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19747
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >