date:20170419

[GitHub] spark pull request #17697: enable configurable reducer count in topByKey()

2017-04-19 Thread yangyangyyy

GitHub user yangyangyyy opened a pull request:

https://github.com/apache/spark/pull/17697

enable configurable reducer count in topByKey()

## What changes were proposed in this pull request?

in the topByKey() call, provide a configurable partition count. without 
this, currently topByKey() results in only 16 reducers even if you have a 100GB 
input size

## How was this patch tested?
use existing unit tests


Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yangyangyyy/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17697.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17697


commit 09d66432787488b3672f22e03c492e558bf45019
Author: Yang Yang 
Date:   2017-04-20T06:49:30Z

enable configurable reducer count in topByKey()




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17670: [SPARK-20281][SQL] Print the identical Range parameters ...

2017-04-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17670
  
**[Test build #75974 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75974/testReport)**
 for PR 17670 at commit 
[`e940c6f`](https://github.com/apache/spark/commit/e940c6f066421cd45712ebe7149a48f7faa394ed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17670: [SPARK-20281][SQL] Print the identical Range parameters ...

2017-04-19 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/17670
  
better to open another pr to backport to v2.2?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17693: [SPARK-20314][SQL] Inconsistent error handling in...

2017-04-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17693#discussion_r112377932
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -149,7 +149,7 @@ case class GetJsonObject(json: Expression, path: 
Expression)
 
 if (parsed.isDefined) {
   try {
-Utils.tryWithResource(jsonFactory.createParser(jsonStr.getBytes)) 
{ parser =>
+Utils.tryWithResource(jsonFactory.createParser(jsonStr.toString)) 
{ parser =>
--- End diff --

Could you check whether the other exceptions could be thrown by the illegal 
outputs?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17670: [SPARK-20281][SQL] Print the identical Range para...

2017-04-19 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17670#discussion_r112377856
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -527,7 +527,7 @@ class SparkSession private(
   @Experimental
   @InterfaceStability.Evolving
   def range(start: Long, end: Long, step: Long): Dataset[java.lang.Long] = 
{
-range(start, end, step, numPartitions = 
sparkContext.defaultParallelism)
--- End diff --

ya, good to me. I'll revert


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16781: [SPARK-12297][SQL] Hive compatibility for Parquet...

2017-04-19 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16781#discussion_r112373593
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/ParquetHiveCompatibilitySuite.scala
 ---
@@ -141,4 +160,326 @@ class ParquetHiveCompatibilitySuite extends 
ParquetCompatibilityTest with TestHi
   Row(Seq(Row(1))),
   "ARRAY>")
   }
+
+  val testTimezones = Seq(
+"UTC" -> "UTC",
+"LA" -> "America/Los_Angeles",
+"Berlin" -> "Europe/Berlin"
+  )
+  // Check creating parquet tables with timestamps, writing data into 
them, and reading it back out
+  // under a variety of conditions:
+  // * tables with explicit tz and those without
+  // * altering table properties directly
+  // * variety of timezones, local & non-local
+  val sessionTimezones = testTimezones.map(_._2).map(Some(_)) ++ Seq(None)
+  sessionTimezones.foreach { sessionTzOpt =>
+val sparkSession = spark.newSession()
+sessionTzOpt.foreach { tz => 
sparkSession.conf.set(SQLConf.SESSION_LOCAL_TIMEZONE.key, tz) }
+testCreateWriteRead(sparkSession, "no_tz", None, sessionTzOpt)
+val localTz = TimeZone.getDefault.getID()
+testCreateWriteRead(sparkSession, "local", Some(localTz), sessionTzOpt)
+// check with a variety of timezones.  The unit tests currently are 
configured to always use
+// America/Los_Angeles, but even if they didn't, we'd be sure to cover 
a non-local timezone.
+Seq(
+  "UTC" -> "UTC",
+  "LA" -> "America/Los_Angeles",
+  "Berlin" -> "Europe/Berlin"
+).foreach { case (tableName, zone) =>
+  if (zone != localTz) {
+testCreateWriteRead(sparkSession, tableName, Some(zone), 
sessionTzOpt)
+  }
+}
+  }
+
+  private def testCreateWriteRead(
+  sparkSession: SparkSession,
+  baseTable: String,
+  explicitTz: Option[String],
+  sessionTzOpt: Option[String]): Unit = {
+testCreateAlterTablesWithTimezone(sparkSession, baseTable, explicitTz, 
sessionTzOpt)
+testWriteTablesWithTimezone(sparkSession, baseTable, explicitTz, 
sessionTzOpt)
+testReadTablesWithTimezone(sparkSession, baseTable, explicitTz, 
sessionTzOpt)
+  }
+
+  private def checkHasTz(table: String, tz: Option[String]): Unit = {
+val tableMetadata = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier(table))
+
assert(tableMetadata.properties.get(ParquetFileFormat.PARQUET_TIMEZONE_TABLE_PROPERTY)
 === tz)
+  }
+
+  private def testCreateAlterTablesWithTimezone(
+  spark: SparkSession,
+  baseTable: String,
+  explicitTz: Option[String],
+  sessionTzOpt: Option[String]): Unit = {
+test(s"SPARK-12297: Create and Alter Parquet tables and timezones; 
explicitTz = $explicitTz; " +
+  s"sessionTzOpt = $sessionTzOpt") {
+  val key = ParquetFileFormat.PARQUET_TIMEZONE_TABLE_PROPERTY
+  withTable(baseTable, s"like_$baseTable", s"select_$baseTable") {
+val localTz = TimeZone.getDefault()
+val localTzId = localTz.getID()
--- End diff --

`localTz` and `localTzId` aren't used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16781: [SPARK-12297][SQL] Hive compatibility for Parquet...

2017-04-19 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16781#discussion_r112373013
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/ParquetHiveCompatibilitySuite.scala
 ---
@@ -141,4 +160,326 @@ class ParquetHiveCompatibilitySuite extends 
ParquetCompatibilityTest with TestHi
   Row(Seq(Row(1))),
   "ARRAY>")
   }
+
+  val testTimezones = Seq(
+"UTC" -> "UTC",
+"LA" -> "America/Los_Angeles",
+"Berlin" -> "Europe/Berlin"
+  )
+  // Check creating parquet tables with timestamps, writing data into 
them, and reading it back out
+  // under a variety of conditions:
+  // * tables with explicit tz and those without
+  // * altering table properties directly
+  // * variety of timezones, local & non-local
+  val sessionTimezones = testTimezones.map(_._2).map(Some(_)) ++ Seq(None)
+  sessionTimezones.foreach { sessionTzOpt =>
+val sparkSession = spark.newSession()
+sessionTzOpt.foreach { tz => 
sparkSession.conf.set(SQLConf.SESSION_LOCAL_TIMEZONE.key, tz) }
+testCreateWriteRead(sparkSession, "no_tz", None, sessionTzOpt)
+val localTz = TimeZone.getDefault.getID()
+testCreateWriteRead(sparkSession, "local", Some(localTz), sessionTzOpt)
+// check with a variety of timezones.  The unit tests currently are 
configured to always use
+// America/Los_Angeles, but even if they didn't, we'd be sure to cover 
a non-local timezone.
+Seq(
+  "UTC" -> "UTC",
+  "LA" -> "America/Los_Angeles",
+  "Berlin" -> "Europe/Berlin"
+).foreach { case (tableName, zone) =>
+  if (zone != localTz) {
+testCreateWriteRead(sparkSession, tableName, Some(zone), 
sessionTzOpt)
+  }
+}
+  }
+
+  private def testCreateWriteRead(
+  sparkSession: SparkSession,
+  baseTable: String,
+  explicitTz: Option[String],
+  sessionTzOpt: Option[String]): Unit = {
+testCreateAlterTablesWithTimezone(sparkSession, baseTable, explicitTz, 
sessionTzOpt)
+testWriteTablesWithTimezone(sparkSession, baseTable, explicitTz, 
sessionTzOpt)
+testReadTablesWithTimezone(sparkSession, baseTable, explicitTz, 
sessionTzOpt)
+  }
+
+  private def checkHasTz(table: String, tz: Option[String]): Unit = {
+val tableMetadata = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier(table))
--- End diff --

Should explicitly pass `sparkSession` and use it here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16781: [SPARK-12297][SQL] Hive compatibility for Parquet...

2017-04-19 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16781#discussion_r112366731
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/ParquetHiveCompatibilitySuite.scala
 ---
@@ -141,4 +160,326 @@ class ParquetHiveCompatibilitySuite extends 
ParquetCompatibilityTest with TestHi
   Row(Seq(Row(1))),
   "ARRAY>")
   }
+
+  val testTimezones = Seq(
+"UTC" -> "UTC",
+"LA" -> "America/Los_Angeles",
+"Berlin" -> "Europe/Berlin"
+  )
+  // Check creating parquet tables with timestamps, writing data into 
them, and reading it back out
+  // under a variety of conditions:
+  // * tables with explicit tz and those without
+  // * altering table properties directly
+  // * variety of timezones, local & non-local
+  val sessionTimezones = testTimezones.map(_._2).map(Some(_)) ++ Seq(None)
+  sessionTimezones.foreach { sessionTzOpt =>
+val sparkSession = spark.newSession()
+sessionTzOpt.foreach { tz => 
sparkSession.conf.set(SQLConf.SESSION_LOCAL_TIMEZONE.key, tz) }
+testCreateWriteRead(sparkSession, "no_tz", None, sessionTzOpt)
+val localTz = TimeZone.getDefault.getID()
+testCreateWriteRead(sparkSession, "local", Some(localTz), sessionTzOpt)
+// check with a variety of timezones.  The unit tests currently are 
configured to always use
+// America/Los_Angeles, but even if they didn't, we'd be sure to cover 
a non-local timezone.
+Seq(
+  "UTC" -> "UTC",
+  "LA" -> "America/Los_Angeles",
+  "Berlin" -> "Europe/Berlin"
+).foreach { case (tableName, zone) =>
+  if (zone != localTz) {
+testCreateWriteRead(sparkSession, tableName, Some(zone), 
sessionTzOpt)
+  }
+}
+  }
+
+  private def testCreateWriteRead(
+  sparkSession: SparkSession,
+  baseTable: String,
+  explicitTz: Option[String],
+  sessionTzOpt: Option[String]): Unit = {
+testCreateAlterTablesWithTimezone(sparkSession, baseTable, explicitTz, 
sessionTzOpt)
+testWriteTablesWithTimezone(sparkSession, baseTable, explicitTz, 
sessionTzOpt)
+testReadTablesWithTimezone(sparkSession, baseTable, explicitTz, 
sessionTzOpt)
+  }
+
+  private def checkHasTz(table: String, tz: Option[String]): Unit = {
+val tableMetadata = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier(table))
+
assert(tableMetadata.properties.get(ParquetFileFormat.PARQUET_TIMEZONE_TABLE_PROPERTY)
 === tz)
+  }
+
+  private def testCreateAlterTablesWithTimezone(
+  spark: SparkSession,
+  baseTable: String,
+  explicitTz: Option[String],
+  sessionTzOpt: Option[String]): Unit = {
+test(s"SPARK-12297: Create and Alter Parquet tables and timezones; 
explicitTz = $explicitTz; " +
+  s"sessionTzOpt = $sessionTzOpt") {
+  val key = ParquetFileFormat.PARQUET_TIMEZONE_TABLE_PROPERTY
+  withTable(baseTable, s"like_$baseTable", s"select_$baseTable") {
+val localTz = TimeZone.getDefault()
+val localTzId = localTz.getID()
+// If we ever add a property to set the table timezone by default, 
defaultTz would change
+val defaultTz = None
+// check that created tables have correct TBLPROPERTIES
+val tblProperties = explicitTz.map {
+  tz => raw"""TBLPROPERTIES ($key="$tz")"""
+}.getOrElse("")
+spark.sql(
+  raw"""CREATE TABLE $baseTable (
+|  x int
+| )
+| STORED AS PARQUET
+| $tblProperties
+""".stripMargin)
+val expectedTableTz = explicitTz.orElse(defaultTz)
+checkHasTz(baseTable, expectedTableTz)
+spark.sql(s"CREATE TABLE like_$baseTable LIKE $baseTable")
+checkHasTz(s"like_$baseTable", expectedTableTz)
+spark.sql(
+  raw"""CREATE TABLE select_$baseTable
+| STORED AS PARQUET
+| AS
+| SELECT * from $baseTable
+""".stripMargin)
+checkHasTz(s"select_$baseTable", defaultTz)
+
+// check alter table, setting, unsetting, resetting the property
+spark.sql(
+  raw"""ALTER TABLE $baseTable SET TBLPROPERTIES 
($key="America/Los_Angeles")""")
+checkHasTz(baseTable, Some("America/Los_Angeles"))
+spark.sql( raw"""ALTER TABLE $baseTable SET TBLPROPERTIES 
($key="UTC")""")
+checkHasTz(baseTable, Some("UTC"))
+spark.sql( raw"""ALTER TABLE $baseTable UNSET TBLPROPERTIES 
($key)""")
+checkHasTz(baseTable, None)
+explicitTz.foreach { tz =>
+  spark.sql( raw"""ALTER TABLE $baseTable SET TBLP

[GitHub] spark pull request #16781: [SPARK-12297][SQL] Hive compatibility for Parquet...

2017-04-19 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16781#discussion_r112370857
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/ParquetHiveCompatibilitySuite.scala
 ---
@@ -42,6 +52,15 @@ class ParquetHiveCompatibilitySuite extends 
ParquetCompatibilityTest with TestHi
""".stripMargin)
   }
 
+  override def afterEach(): Unit = {
--- End diff --

Why do we need this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16781: [SPARK-12297][SQL] Hive compatibility for Parquet...

2017-04-19 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16781#discussion_r112365163
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -233,6 +224,17 @@ private[hive] class HiveMetastoreCatalog(sparkSession: 
SparkSession) extends Log
 result.copy(output = newOutput)
   }
 
+  private def getStorageTzOptions(relation: CatalogRelation): Map[String, 
String] = {
+// We add the table timezone to the relation options, which 
automatically gets injected into the
+// hadoopConf for the Parquet Converters
+val storageTzKey = ParquetFileFormat.PARQUET_TIMEZONE_TABLE_PROPERTY
+val storageTz = relation.tableMeta.properties.getOrElse(storageTzKey, 
"")
+val sessionTz = sparkSession.sessionState.conf.sessionLocalTimeZone
--- End diff --

`sessionTz` isn't used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16781: [SPARK-12297][SQL] Hive compatibility for Parquet...

2017-04-19 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16781#discussion_r112372155
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
 ---
@@ -673,12 +679,35 @@ private[parquet] object ParquetRowConverter {
 unscaled
   }
 
-  def binaryToSQLTimestamp(binary: Binary): SQLTimestamp = {
+  /**
+   * Converts an int96 to a SQLTimestamp, given both the storage timezone 
and the local timezone.
+   * The timestamp is really meant to be interpreted as a "floating time", 
but since we
+   * actually store it as micros since epoch, why we have to apply a 
conversion when timezones
+   * change.
+   *
+   * @param binary
+   * @param sessionTz the session timezone.  This will be used to 
determine how to display the time,
+*  and compute functions on the timestamp which 
involve a timezone, eg. extract
+*  the hour.
+   * @param storageTz the timezone which was used to store the timestamp.  
This should come from the
+*  timestamp table property, or else assume its the 
same as the sessionTz
+   * @return
--- End diff --

Can you also add descriptions for `@param binary` and `@return`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16781: [SPARK-12297][SQL] Hive compatibility for Parquet...

2017-04-19 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16781#discussion_r112365371
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -233,6 +224,17 @@ private[hive] class HiveMetastoreCatalog(sparkSession: 
SparkSession) extends Log
 result.copy(output = newOutput)
   }
 
+  private def getStorageTzOptions(relation: CatalogRelation): Map[String, 
String] = {
+// We add the table timezone to the relation options, which 
automatically gets injected into the
+// hadoopConf for the Parquet Converters
+val storageTzKey = ParquetFileFormat.PARQUET_TIMEZONE_TABLE_PROPERTY
+val storageTz = relation.tableMeta.properties.getOrElse(storageTzKey, 
"")
+val sessionTz = sparkSession.sessionState.conf.sessionLocalTimeZone
+Map(
+  storageTzKey -> storageTz
+)
--- End diff --

Should return `Map.empty` if the value isn't included to the table 
properties?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16781: [SPARK-12297][SQL] Hive compatibility for Parquet...

2017-04-19 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16781#discussion_r112373782
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/ParquetHiveCompatibilitySuite.scala
 ---
@@ -141,4 +160,326 @@ class ParquetHiveCompatibilitySuite extends 
ParquetCompatibilityTest with TestHi
   Row(Seq(Row(1))),
   "ARRAY>")
   }
+
+  val testTimezones = Seq(
+"UTC" -> "UTC",
+"LA" -> "America/Los_Angeles",
+"Berlin" -> "Europe/Berlin"
+  )
+  // Check creating parquet tables with timestamps, writing data into 
them, and reading it back out
+  // under a variety of conditions:
+  // * tables with explicit tz and those without
+  // * altering table properties directly
+  // * variety of timezones, local & non-local
+  val sessionTimezones = testTimezones.map(_._2).map(Some(_)) ++ Seq(None)
+  sessionTimezones.foreach { sessionTzOpt =>
+val sparkSession = spark.newSession()
+sessionTzOpt.foreach { tz => 
sparkSession.conf.set(SQLConf.SESSION_LOCAL_TIMEZONE.key, tz) }
+testCreateWriteRead(sparkSession, "no_tz", None, sessionTzOpt)
+val localTz = TimeZone.getDefault.getID()
+testCreateWriteRead(sparkSession, "local", Some(localTz), sessionTzOpt)
+// check with a variety of timezones.  The unit tests currently are 
configured to always use
+// America/Los_Angeles, but even if they didn't, we'd be sure to cover 
a non-local timezone.
+Seq(
+  "UTC" -> "UTC",
+  "LA" -> "America/Los_Angeles",
+  "Berlin" -> "Europe/Berlin"
+).foreach { case (tableName, zone) =>
+  if (zone != localTz) {
+testCreateWriteRead(sparkSession, tableName, Some(zone), 
sessionTzOpt)
+  }
+}
+  }
+
+  private def testCreateWriteRead(
+  sparkSession: SparkSession,
+  baseTable: String,
+  explicitTz: Option[String],
+  sessionTzOpt: Option[String]): Unit = {
+testCreateAlterTablesWithTimezone(sparkSession, baseTable, explicitTz, 
sessionTzOpt)
+testWriteTablesWithTimezone(sparkSession, baseTable, explicitTz, 
sessionTzOpt)
+testReadTablesWithTimezone(sparkSession, baseTable, explicitTz, 
sessionTzOpt)
+  }
+
+  private def checkHasTz(table: String, tz: Option[String]): Unit = {
+val tableMetadata = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier(table))
+
assert(tableMetadata.properties.get(ParquetFileFormat.PARQUET_TIMEZONE_TABLE_PROPERTY)
 === tz)
+  }
+
+  private def testCreateAlterTablesWithTimezone(
+  spark: SparkSession,
+  baseTable: String,
+  explicitTz: Option[String],
+  sessionTzOpt: Option[String]): Unit = {
+test(s"SPARK-12297: Create and Alter Parquet tables and timezones; 
explicitTz = $explicitTz; " +
+  s"sessionTzOpt = $sessionTzOpt") {
+  val key = ParquetFileFormat.PARQUET_TIMEZONE_TABLE_PROPERTY
+  withTable(baseTable, s"like_$baseTable", s"select_$baseTable") {
+val localTz = TimeZone.getDefault()
+val localTzId = localTz.getID()
+// If we ever add a property to set the table timezone by default, 
defaultTz would change
+val defaultTz = None
+// check that created tables have correct TBLPROPERTIES
+val tblProperties = explicitTz.map {
+  tz => raw"""TBLPROPERTIES ($key="$tz")"""
+}.getOrElse("")
+spark.sql(
+  raw"""CREATE TABLE $baseTable (
+|  x int
+| )
+| STORED AS PARQUET
+| $tblProperties
+""".stripMargin)
+val expectedTableTz = explicitTz.orElse(defaultTz)
+checkHasTz(baseTable, expectedTableTz)
+spark.sql(s"CREATE TABLE like_$baseTable LIKE $baseTable")
+checkHasTz(s"like_$baseTable", expectedTableTz)
+spark.sql(
+  raw"""CREATE TABLE select_$baseTable
+| STORED AS PARQUET
+| AS
+| SELECT * from $baseTable
+""".stripMargin)
+checkHasTz(s"select_$baseTable", defaultTz)
+
+// check alter table, setting, unsetting, resetting the property
+spark.sql(
+  raw"""ALTER TABLE $baseTable SET TBLPROPERTIES 
($key="America/Los_Angeles")""")
+checkHasTz(baseTable, Some("America/Los_Angeles"))
+spark.sql( raw"""ALTER TABLE $baseTable SET TBLPROPERTIES 
($key="UTC")""")
--- End diff --

nit: remove extra white space, and two more below this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the fe

[GitHub] spark pull request #16781: [SPARK-12297][SQL] Hive compatibility for Parquet...

2017-04-19 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16781#discussion_r112368447
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/ParquetHiveCompatibilitySuite.scala
 ---
@@ -141,4 +160,326 @@ class ParquetHiveCompatibilitySuite extends 
ParquetCompatibilityTest with TestHi
   Row(Seq(Row(1))),
   "ARRAY>")
   }
+
+  val testTimezones = Seq(
+"UTC" -> "UTC",
+"LA" -> "America/Los_Angeles",
+"Berlin" -> "Europe/Berlin"
+  )
+  // Check creating parquet tables with timestamps, writing data into 
them, and reading it back out
+  // under a variety of conditions:
+  // * tables with explicit tz and those without
+  // * altering table properties directly
+  // * variety of timezones, local & non-local
+  val sessionTimezones = testTimezones.map(_._2).map(Some(_)) ++ Seq(None)
+  sessionTimezones.foreach { sessionTzOpt =>
+val sparkSession = spark.newSession()
+sessionTzOpt.foreach { tz => 
sparkSession.conf.set(SQLConf.SESSION_LOCAL_TIMEZONE.key, tz) }
+testCreateWriteRead(sparkSession, "no_tz", None, sessionTzOpt)
+val localTz = TimeZone.getDefault.getID()
+testCreateWriteRead(sparkSession, "local", Some(localTz), sessionTzOpt)
+// check with a variety of timezones.  The unit tests currently are 
configured to always use
+// America/Los_Angeles, but even if they didn't, we'd be sure to cover 
a non-local timezone.
+Seq(
+  "UTC" -> "UTC",
+  "LA" -> "America/Los_Angeles",
+  "Berlin" -> "Europe/Berlin"
+).foreach { case (tableName, zone) =>
--- End diff --

Should be `testTimezones.foreach { ...`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16781: [SPARK-12297][SQL] Hive compatibility for Parquet...

2017-04-19 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16781#discussion_r112366489
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/ParquetHiveCompatibilitySuite.scala
 ---
@@ -141,4 +160,326 @@ class ParquetHiveCompatibilitySuite extends 
ParquetCompatibilityTest with TestHi
   Row(Seq(Row(1))),
   "ARRAY>")
   }
+
+  val testTimezones = Seq(
+"UTC" -> "UTC",
+"LA" -> "America/Los_Angeles",
+"Berlin" -> "Europe/Berlin"
+  )
+  // Check creating parquet tables with timestamps, writing data into 
them, and reading it back out
+  // under a variety of conditions:
+  // * tables with explicit tz and those without
+  // * altering table properties directly
+  // * variety of timezones, local & non-local
+  val sessionTimezones = testTimezones.map(_._2).map(Some(_)) ++ Seq(None)
+  sessionTimezones.foreach { sessionTzOpt =>
+val sparkSession = spark.newSession()
+sessionTzOpt.foreach { tz => 
sparkSession.conf.set(SQLConf.SESSION_LOCAL_TIMEZONE.key, tz) }
+testCreateWriteRead(sparkSession, "no_tz", None, sessionTzOpt)
+val localTz = TimeZone.getDefault.getID()
+testCreateWriteRead(sparkSession, "local", Some(localTz), sessionTzOpt)
+// check with a variety of timezones.  The unit tests currently are 
configured to always use
+// America/Los_Angeles, but even if they didn't, we'd be sure to cover 
a non-local timezone.
+Seq(
+  "UTC" -> "UTC",
+  "LA" -> "America/Los_Angeles",
+  "Berlin" -> "Europe/Berlin"
+).foreach { case (tableName, zone) =>
+  if (zone != localTz) {
+testCreateWriteRead(sparkSession, tableName, Some(zone), 
sessionTzOpt)
+  }
+}
+  }
+
+  private def testCreateWriteRead(
+  sparkSession: SparkSession,
+  baseTable: String,
+  explicitTz: Option[String],
+  sessionTzOpt: Option[String]): Unit = {
+testCreateAlterTablesWithTimezone(sparkSession, baseTable, explicitTz, 
sessionTzOpt)
+testWriteTablesWithTimezone(sparkSession, baseTable, explicitTz, 
sessionTzOpt)
+testReadTablesWithTimezone(sparkSession, baseTable, explicitTz, 
sessionTzOpt)
+  }
+
+  private def checkHasTz(table: String, tz: Option[String]): Unit = {
+val tableMetadata = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier(table))
+
assert(tableMetadata.properties.get(ParquetFileFormat.PARQUET_TIMEZONE_TABLE_PROPERTY)
 === tz)
+  }
+
+  private def testCreateAlterTablesWithTimezone(
+  spark: SparkSession,
+  baseTable: String,
+  explicitTz: Option[String],
+  sessionTzOpt: Option[String]): Unit = {
+test(s"SPARK-12297: Create and Alter Parquet tables and timezones; 
explicitTz = $explicitTz; " +
+  s"sessionTzOpt = $sessionTzOpt") {
+  val key = ParquetFileFormat.PARQUET_TIMEZONE_TABLE_PROPERTY
+  withTable(baseTable, s"like_$baseTable", s"select_$baseTable") {
+val localTz = TimeZone.getDefault()
+val localTzId = localTz.getID()
+// If we ever add a property to set the table timezone by default, 
defaultTz would change
+val defaultTz = None
+// check that created tables have correct TBLPROPERTIES
+val tblProperties = explicitTz.map {
+  tz => raw"""TBLPROPERTIES ($key="$tz")"""
+}.getOrElse("")
+spark.sql(
+  raw"""CREATE TABLE $baseTable (
+|  x int
+| )
+| STORED AS PARQUET
+| $tblProperties
+""".stripMargin)
+val expectedTableTz = explicitTz.orElse(defaultTz)
+checkHasTz(baseTable, expectedTableTz)
+spark.sql(s"CREATE TABLE like_$baseTable LIKE $baseTable")
+checkHasTz(s"like_$baseTable", expectedTableTz)
+spark.sql(
+  raw"""CREATE TABLE select_$baseTable
+| STORED AS PARQUET
+| AS
+| SELECT * from $baseTable
+""".stripMargin)
+checkHasTz(s"select_$baseTable", defaultTz)
+
+// check alter table, setting, unsetting, resetting the property
+spark.sql(
+  raw"""ALTER TABLE $baseTable SET TBLPROPERTIES 
($key="America/Los_Angeles")""")
+checkHasTz(baseTable, Some("America/Los_Angeles"))
+spark.sql( raw"""ALTER TABLE $baseTable SET TBLPROPERTIES 
($key="UTC")""")
+checkHasTz(baseTable, Some("UTC"))
+spark.sql( raw"""ALTER TABLE $baseTable UNSET TBLPROPERTIES 
($key)""")
+checkHasTz(baseTable, None)
+explicitTz.foreach { tz =>
+  spark.sql( raw"""ALTER TABLE $baseTable SET TBLP

[GitHub] spark issue #17693: [SPARK-20314][SQL] Inconsistent error handling in JSON p...

2017-04-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17693
  
Could you add a test case in `JsonExpressionsSuite`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17670: [SPARK-20281][SQL] Print the identical Range parameters ...

2017-04-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17670
  
LGTM except a comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-04-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15821#discussion_r112376143
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/ArrowConverters.scala ---
@@ -0,0 +1,432 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql
+
+import java.io.ByteArrayOutputStream
+import java.nio.ByteBuffer
+import java.nio.channels.{Channels, SeekableByteChannel}
+
+import scala.collection.JavaConverters._
+
+import io.netty.buffer.ArrowBuf
+import org.apache.arrow.memory.{BaseAllocator, RootAllocator}
+import org.apache.arrow.vector._
+import org.apache.arrow.vector.BaseValueVector.BaseMutator
+import org.apache.arrow.vector.file.{ArrowReader, ArrowWriter}
+import org.apache.arrow.vector.schema.{ArrowFieldNode, ArrowRecordBatch}
+import org.apache.arrow.vector.types.{FloatingPointPrecision, TimeUnit}
+import org.apache.arrow.vector.types.pojo.{ArrowType, Field, Schema}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.types._
+import org.apache.spark.util.Utils
+
+
+/**
+ * ArrowReader requires a seekable byte channel.
+ * TODO: This is available in arrow-vector now with ARROW-615, to be 
included in 0.2.1 release
+ */
+private[sql] class ByteArrayReadableSeekableByteChannel(var byteArray: 
Array[Byte])
+  extends SeekableByteChannel {
+  var _position: Long = 0L
+
+  override def isOpen: Boolean = {
+byteArray != null
+  }
+
+  override def close(): Unit = {
+byteArray = null
+  }
+
+  override def read(dst: ByteBuffer): Int = {
+val remainingBuf = byteArray.length - _position
+val length = Math.min(dst.remaining(), remainingBuf).toInt
+dst.put(byteArray, _position.toInt, length)
+_position += length
+length
+  }
+
+  override def position(): Long = _position
+
+  override def position(newPosition: Long): SeekableByteChannel = {
+_position = newPosition.toLong
+this
+  }
+
+  override def size: Long = {
+byteArray.length.toLong
+  }
+
+  override def write(src: ByteBuffer): Int = {
+throw new UnsupportedOperationException("Read Only")
+  }
+
+  override def truncate(size: Long): SeekableByteChannel = {
+throw new UnsupportedOperationException("Read Only")
+  }
+}
+
+/**
+ * Intermediate data structure returned from Arrow conversions
+ */
+private[sql] abstract class ArrowPayload extends Iterator[ArrowRecordBatch]
+
+/**
+ * Build a payload from existing ArrowRecordBatches
+ */
+private[sql] class ArrowStaticPayload(batches: ArrowRecordBatch*) extends 
ArrowPayload {
+  private val iter = batches.iterator
+  override def next(): ArrowRecordBatch = iter.next()
+  override def hasNext: Boolean = iter.hasNext
+}
+
+/**
+ * Class that wraps an Arrow RootAllocator used in conversion
+ */
+private[sql] class ArrowConverters {
+  private val _allocator = new RootAllocator(Long.MaxValue)
+
+  private[sql] def allocator: RootAllocator = _allocator
+
+  /**
+   * Iterate over the rows and convert to an ArrowPayload, using 
RootAllocator from this class
+   */
+  def interalRowIterToPayload(rowIter: Iterator[InternalRow], schema: 
StructType): ArrowPayload = {
+val batch = ArrowConverters.internalRowIterToArrowBatch(rowIter, 
schema, _allocator)
+new ArrowStaticPayload(batch)
+  }
+
+  /**
+   * Read an Array of Arrow Record batches as byte Arrays into an 
ArrowPayload, using
+   * RootAllocator from this class
+   */
+  def readPayloadByteArrays(payloadByteArrays: Array[Array[Byte]]): 
ArrowPayload = {
+val batches = 
scala.collection.mutable.ArrayBuffer.empty[ArrowRecordBatch]
+var i = 0
+while (i < payloadByteArrays.length) {
+  val payloadBytes = payloadByteArrays(i)
+  val in = new ByteArrayReadableSeekable

[GitHub] spark issue #17670: [SPARK-20281][SQL] Print the identical Range parameters ...

2017-04-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17670
  
Ok, I am fine to keep the existing way. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-04-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15821#discussion_r112376037
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/ArrowConverters.scala ---
@@ -0,0 +1,432 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql
+
+import java.io.ByteArrayOutputStream
+import java.nio.ByteBuffer
+import java.nio.channels.{Channels, SeekableByteChannel}
+
+import scala.collection.JavaConverters._
+
+import io.netty.buffer.ArrowBuf
+import org.apache.arrow.memory.{BaseAllocator, RootAllocator}
+import org.apache.arrow.vector._
+import org.apache.arrow.vector.BaseValueVector.BaseMutator
+import org.apache.arrow.vector.file.{ArrowReader, ArrowWriter}
+import org.apache.arrow.vector.schema.{ArrowFieldNode, ArrowRecordBatch}
+import org.apache.arrow.vector.types.{FloatingPointPrecision, TimeUnit}
+import org.apache.arrow.vector.types.pojo.{ArrowType, Field, Schema}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.types._
+import org.apache.spark.util.Utils
+
+
+/**
+ * ArrowReader requires a seekable byte channel.
+ * TODO: This is available in arrow-vector now with ARROW-615, to be 
included in 0.2.1 release
+ */
+private[sql] class ByteArrayReadableSeekableByteChannel(var byteArray: 
Array[Byte])
+  extends SeekableByteChannel {
+  var _position: Long = 0L
+
+  override def isOpen: Boolean = {
+byteArray != null
+  }
+
+  override def close(): Unit = {
+byteArray = null
+  }
+
+  override def read(dst: ByteBuffer): Int = {
+val remainingBuf = byteArray.length - _position
+val length = Math.min(dst.remaining(), remainingBuf).toInt
+dst.put(byteArray, _position.toInt, length)
+_position += length
+length
+  }
+
+  override def position(): Long = _position
+
+  override def position(newPosition: Long): SeekableByteChannel = {
+_position = newPosition.toLong
+this
+  }
+
+  override def size: Long = {
+byteArray.length.toLong
+  }
+
+  override def write(src: ByteBuffer): Int = {
+throw new UnsupportedOperationException("Read Only")
+  }
+
+  override def truncate(size: Long): SeekableByteChannel = {
+throw new UnsupportedOperationException("Read Only")
+  }
+}
+
+/**
+ * Intermediate data structure returned from Arrow conversions
+ */
+private[sql] abstract class ArrowPayload extends Iterator[ArrowRecordBatch]
+
+/**
+ * Build a payload from existing ArrowRecordBatches
+ */
+private[sql] class ArrowStaticPayload(batches: ArrowRecordBatch*) extends 
ArrowPayload {
+  private val iter = batches.iterator
+  override def next(): ArrowRecordBatch = iter.next()
+  override def hasNext: Boolean = iter.hasNext
+}
+
+/**
+ * Class that wraps an Arrow RootAllocator used in conversion
+ */
+private[sql] class ArrowConverters {
+  private val _allocator = new RootAllocator(Long.MaxValue)
+
+  private[sql] def allocator: RootAllocator = _allocator
+
+  /**
+   * Iterate over the rows and convert to an ArrowPayload, using 
RootAllocator from this class
+   */
+  def interalRowIterToPayload(rowIter: Iterator[InternalRow], schema: 
StructType): ArrowPayload = {
+val batch = ArrowConverters.internalRowIterToArrowBatch(rowIter, 
schema, _allocator)
+new ArrowStaticPayload(batch)
+  }
+
+  /**
+   * Read an Array of Arrow Record batches as byte Arrays into an 
ArrowPayload, using
+   * RootAllocator from this class
+   */
+  def readPayloadByteArrays(payloadByteArrays: Array[Array[Byte]]): 
ArrowPayload = {
+val batches = 
scala.collection.mutable.ArrayBuffer.empty[ArrowRecordBatch]
+var i = 0
+while (i < payloadByteArrays.length) {
+  val payloadBytes = payloadByteArrays(i)
+  val in = new ByteArrayReadableSeekable

[GitHub] spark pull request #17670: [SPARK-20281][SQL] Print the identical Range para...

2017-04-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17670#discussion_r112376065
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -527,7 +527,7 @@ class SparkSession private(
   @Experimental
   @InterfaceStability.Evolving
   def range(start: Long, end: Long, step: Long): Dataset[java.lang.Long] = 
{
-range(start, end, step, numPartitions = 
sparkContext.defaultParallelism)
--- End diff --

How about reverting the changes in this file? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-04-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15821#discussion_r112375921
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/ArrowConverters.scala ---
@@ -0,0 +1,432 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql
+
+import java.io.ByteArrayOutputStream
+import java.nio.ByteBuffer
+import java.nio.channels.{Channels, SeekableByteChannel}
+
+import scala.collection.JavaConverters._
+
+import io.netty.buffer.ArrowBuf
+import org.apache.arrow.memory.{BaseAllocator, RootAllocator}
+import org.apache.arrow.vector._
+import org.apache.arrow.vector.BaseValueVector.BaseMutator
+import org.apache.arrow.vector.file.{ArrowReader, ArrowWriter}
+import org.apache.arrow.vector.schema.{ArrowFieldNode, ArrowRecordBatch}
+import org.apache.arrow.vector.types.{FloatingPointPrecision, TimeUnit}
+import org.apache.arrow.vector.types.pojo.{ArrowType, Field, Schema}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.types._
+import org.apache.spark.util.Utils
+
+
+/**
+ * ArrowReader requires a seekable byte channel.
+ * TODO: This is available in arrow-vector now with ARROW-615, to be 
included in 0.2.1 release
+ */
+private[sql] class ByteArrayReadableSeekableByteChannel(var byteArray: 
Array[Byte])
+  extends SeekableByteChannel {
+  var _position: Long = 0L
+
+  override def isOpen: Boolean = {
+byteArray != null
+  }
+
+  override def close(): Unit = {
+byteArray = null
+  }
+
+  override def read(dst: ByteBuffer): Int = {
+val remainingBuf = byteArray.length - _position
+val length = Math.min(dst.remaining(), remainingBuf).toInt
+dst.put(byteArray, _position.toInt, length)
+_position += length
+length
+  }
+
+  override def position(): Long = _position
+
+  override def position(newPosition: Long): SeekableByteChannel = {
+_position = newPosition.toLong
+this
+  }
+
+  override def size: Long = {
+byteArray.length.toLong
+  }
+
+  override def write(src: ByteBuffer): Int = {
+throw new UnsupportedOperationException("Read Only")
+  }
+
+  override def truncate(size: Long): SeekableByteChannel = {
+throw new UnsupportedOperationException("Read Only")
+  }
+}
+
+/**
+ * Intermediate data structure returned from Arrow conversions
+ */
+private[sql] abstract class ArrowPayload extends Iterator[ArrowRecordBatch]
+
+/**
+ * Build a payload from existing ArrowRecordBatches
+ */
+private[sql] class ArrowStaticPayload(batches: ArrowRecordBatch*) extends 
ArrowPayload {
+  private val iter = batches.iterator
+  override def next(): ArrowRecordBatch = iter.next()
+  override def hasNext: Boolean = iter.hasNext
+}
+
+/**
+ * Class that wraps an Arrow RootAllocator used in conversion
+ */
+private[sql] class ArrowConverters {
+  private val _allocator = new RootAllocator(Long.MaxValue)
+
+  private[sql] def allocator: RootAllocator = _allocator
+
+  /**
+   * Iterate over the rows and convert to an ArrowPayload, using 
RootAllocator from this class
+   */
+  def interalRowIterToPayload(rowIter: Iterator[InternalRow], schema: 
StructType): ArrowPayload = {
+val batch = ArrowConverters.internalRowIterToArrowBatch(rowIter, 
schema, _allocator)
+new ArrowStaticPayload(batch)
+  }
+
+  /**
+   * Read an Array of Arrow Record batches as byte Arrays into an 
ArrowPayload, using
+   * RootAllocator from this class
+   */
+  def readPayloadByteArrays(payloadByteArrays: Array[Array[Byte]]): 
ArrowPayload = {
+val batches = 
scala.collection.mutable.ArrayBuffer.empty[ArrowRecordBatch]
+var i = 0
+while (i < payloadByteArrays.length) {
+  val payloadBytes = payloadByteArrays(i)
+  val in = new ByteArrayReadableSeekable

[GitHub] spark issue #17655: [SPARK-20156] [SQL] [FOLLOW-UP] Java String toLowerCase ...

2017-04-19 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17655
  
The change looks good to me too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-04-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15821#discussion_r112375496
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/ArrowConverters.scala ---
@@ -0,0 +1,432 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql
+
+import java.io.ByteArrayOutputStream
+import java.nio.ByteBuffer
+import java.nio.channels.{Channels, SeekableByteChannel}
+
+import scala.collection.JavaConverters._
+
+import io.netty.buffer.ArrowBuf
+import org.apache.arrow.memory.{BaseAllocator, RootAllocator}
+import org.apache.arrow.vector._
+import org.apache.arrow.vector.BaseValueVector.BaseMutator
+import org.apache.arrow.vector.file.{ArrowReader, ArrowWriter}
+import org.apache.arrow.vector.schema.{ArrowFieldNode, ArrowRecordBatch}
+import org.apache.arrow.vector.types.{FloatingPointPrecision, TimeUnit}
+import org.apache.arrow.vector.types.pojo.{ArrowType, Field, Schema}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.types._
+import org.apache.spark.util.Utils
+
+
+/**
+ * ArrowReader requires a seekable byte channel.
+ * TODO: This is available in arrow-vector now with ARROW-615, to be 
included in 0.2.1 release
+ */
+private[sql] class ByteArrayReadableSeekableByteChannel(var byteArray: 
Array[Byte])
+  extends SeekableByteChannel {
+  var _position: Long = 0L
+
+  override def isOpen: Boolean = {
+byteArray != null
+  }
+
+  override def close(): Unit = {
+byteArray = null
+  }
+
+  override def read(dst: ByteBuffer): Int = {
+val remainingBuf = byteArray.length - _position
+val length = Math.min(dst.remaining(), remainingBuf).toInt
+dst.put(byteArray, _position.toInt, length)
+_position += length
+length
+  }
+
+  override def position(): Long = _position
+
+  override def position(newPosition: Long): SeekableByteChannel = {
+_position = newPosition.toLong
+this
+  }
+
+  override def size: Long = {
+byteArray.length.toLong
+  }
+
+  override def write(src: ByteBuffer): Int = {
+throw new UnsupportedOperationException("Read Only")
+  }
+
+  override def truncate(size: Long): SeekableByteChannel = {
+throw new UnsupportedOperationException("Read Only")
+  }
+}
+
+/**
+ * Intermediate data structure returned from Arrow conversions
+ */
+private[sql] abstract class ArrowPayload extends Iterator[ArrowRecordBatch]
+
+/**
+ * Build a payload from existing ArrowRecordBatches
+ */
+private[sql] class ArrowStaticPayload(batches: ArrowRecordBatch*) extends 
ArrowPayload {
+  private val iter = batches.iterator
+  override def next(): ArrowRecordBatch = iter.next()
+  override def hasNext: Boolean = iter.hasNext
+}
+
+/**
+ * Class that wraps an Arrow RootAllocator used in conversion
+ */
+private[sql] class ArrowConverters {
+  private val _allocator = new RootAllocator(Long.MaxValue)
+
+  private[sql] def allocator: RootAllocator = _allocator
+
+  /**
+   * Iterate over the rows and convert to an ArrowPayload, using 
RootAllocator from this class
+   */
+  def interalRowIterToPayload(rowIter: Iterator[InternalRow], schema: 
StructType): ArrowPayload = {
+val batch = ArrowConverters.internalRowIterToArrowBatch(rowIter, 
schema, _allocator)
+new ArrowStaticPayload(batch)
+  }
+
+  /**
+   * Read an Array of Arrow Record batches as byte Arrays into an 
ArrowPayload, using
+   * RootAllocator from this class
+   */
+  def readPayloadByteArrays(payloadByteArrays: Array[Array[Byte]]): 
ArrowPayload = {
+val batches = 
scala.collection.mutable.ArrayBuffer.empty[ArrowRecordBatch]
+var i = 0
+while (i < payloadByteArrays.length) {
+  val payloadBytes = payloadByteArrays(i)
+  val in = new ByteArrayReadableSeekable

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-04-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16781
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75966/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17696: [SPARK-20401][DOC]In the spark official configuration do...

2017-04-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17696
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-04-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16781
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-04-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16781
  
**[Test build #75966 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75966/testReport)**
 for PR 16781 at commit 
[`44a8bbb`](https://github.com/apache/spark/commit/44a8bbb17484a61dd984fcb451e3b1be8c539e9f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17696: [SPARK-20401][DOC]In the spark official configura...

2017-04-19 Thread guoxiaolongzte

GitHub user guoxiaolongzte opened a pull request:

https://github.com/apache/spark/pull/17696

[SPARK-20401][DOC]In the spark official configuration document, the 
'spark.driver.supervise' configuration parameter specification and default 
values are necessary.

## What changes were proposed in this pull request?
Use the REST interface submits the spark job.
e.g.
curl -X  POST http://10.43.183.120:6066/v1/submissions/create --header 
"Content-Type:application/json;charset=UTF-8" --data'{
"action": "CreateSubmissionRequest", 
"appArgs": [
"myAppArgument"
], 
"appResource": "/home/mr/gxl/test.jar",  
"clientSparkVersion": "2.2.0", 
"environmentVariables": {
"SPARK_ENV_LOADED": "1"
}, 
"mainClass": "cn.zte.HdfsTest", 
"sparkProperties": {
"spark.jars": "/home/mr/gxl/test.jar",  
**"spark.driver.supervise": "true",** 
"spark.app.name": "HdfsTest", 
"spark.eventLog.enabled": "false", 
"spark.submit.deployMode": "cluster", 
"spark.master": "spark://10.43.183.120:6066"
}
}'


**I hope that make sure that the driver is automatically restarted if it 
fails with non-zero exit code.
But I can not find the 'spark.driver.supervise' configuration parameter 
specification and default values from the spark official document.**
## How was this patch tested?

manual tests

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guoxiaolongzte/spark SPARK-20401

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17696.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17696


commit d383efba12c66addb17006dea107bb0421d50bc3
Author: éå°é¾ 10207633 
Date:   2017-03-31T13:57:09Z

[SPARK-20177]Document about compression way has some little detail changes.

commit 3059013e9d2aec76def14eb314b6761bea0e7ca0
Author: éå°é¾ 10207633 
Date:   2017-04-01T01:38:02Z

[SPARK-20177] event log add a space

commit 555cef88fe09134ac98fd0ad056121c7df2539aa
Author: guoxiaolongzte 
Date:   2017-04-02T00:16:08Z

'/applications/[app-id]/jobs' in rest api,status should be 
[running|succeeded|failed|unknown]

commit 46bb1ad3ddd9fb55b5607ac4f20213a90186cfe9
Author: éå°é¾ 10207633 
Date:   2017-04-05T03:16:50Z

Merge branch 'master' of https://github.com/apache/spark into SPARK-20177

commit 0efb0dd9e404229cce638fe3fb0c966276784df7
Author: éå°é¾ 10207633 
Date:   2017-04-05T03:47:53Z

[SPARK-20218]'/applications/[app-id]/stages' in REST API,add description.

commit 0e37fdeee28e31fc97436dabd001d3c85c5a7794
Author: éå°é¾ 10207633 
Date:   2017-04-05T05:22:54Z

[SPARK-20218] '/applications/[app-id]/stages/[stage-id]' in REST API,remove 
redundant description.

commit 52641bb01e55b48bd9e8579fea217439d14c7dc7
Author: éå°é¾ 10207633 
Date:   2017-04-07T06:24:58Z

Merge branch 'SPARK-20218'

commit d3977c9cab0722d279e3fae7aacbd4eb944c22f6
Author: éå°é¾ 10207633 
Date:   2017-04-08T07:13:02Z

Merge branch 'master' of https://github.com/apache/spark

commit 137b90e5a85cde7e9b904b3e5ea0bb52518c4716
Author: éå°é¾ 10207633 
Date:   2017-04-10T05:13:40Z

Merge branch 'master' of https://github.com/apache/spark

commit 0fe5865b8022aeacdb2d194699b990d8467f7a0a
Author: éå°é¾ 10207633 
Date:   2017-04-10T10:25:22Z

Merge branch 'SPARK-20190' of https://github.com/guoxiaolongzte/spark

commit cf6f42ac84466960f2232c025b8faeb5d7378fe1
Author: éå°é¾ 10207633 
Date:   2017-04-10T10:26:27Z

Merge branch 'master' of https://github.com/apache/spark

commit 685cd6b6e3799c7be65674b2670159ba725f0b8f
Author: éå°é¾ 10207633 
Date:   2017-04-14T01:12:41Z

Merge branch 'master' of https://github.com/apache/spark

commit c716a9231e9ab117d2b03ba67a1c8903d8d9da93
Author: guoxiaolong 
Date:   2017-04-17T06:57:21Z

Merge branch 'master' of https://github.com/apache/spark

commit 679cec36a968fbf995b567ca5f6f8cbd8e32673f
Author: guoxiaolong 
Date:   2017-04-19T07:20:08Z

Merge branch 'master' of https://github.com/apache/spark

commit 3c9387af84a8f39cf8c1ce19e15de99dfcaf0ca5
Author: guoxiaolong 
Date:   2017-04-19T08:15:26Z

Merge branch 'master' of https://github.com/apache/spark

commit cb71f4462a0889cbb0843875b1e4cf14bcb0d020
Author: guoxiaolong 
Date:   2017-04-20T05:52:06Z

Merge branch 'master' of https://github.com/apache/spark

commit 3e74d731a34c2ee112577a4118b6cbf706227120
Author: guoxiaolong 
Date:   2017-04-20T06:26:51Z

[SPARK-20401]In the spark official configuration document, the 
'spark.driver.supervise' configuration parameter specific

[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-04-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17342
  
**[Test build #75973 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75973/testReport)**
 for PR 17342 at commit 
[`be16d1a`](https://github.com/apache/spark/commit/be16d1a23d30ad1a031aed4a15e6a7ee3dd51d45).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-04-19 Thread weiqingy

Github user weiqingy commented on a diff in the pull request:

https://github.com/apache/spark/pull/17342#discussion_r112374078
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
@@ -146,6 +149,7 @@ private[sql] class SharedState(val sparkContext: 
SparkContext) extends Logging {
 }
 
 object SharedState {
+  URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory())
--- End diff --

Good point. I have updated the PR. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-04-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15821#discussion_r112373858
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/ArrowConverters.scala ---
@@ -0,0 +1,432 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql
+
+import java.io.ByteArrayOutputStream
+import java.nio.ByteBuffer
+import java.nio.channels.{Channels, SeekableByteChannel}
+
+import scala.collection.JavaConverters._
+
+import io.netty.buffer.ArrowBuf
+import org.apache.arrow.memory.{BaseAllocator, RootAllocator}
+import org.apache.arrow.vector._
+import org.apache.arrow.vector.BaseValueVector.BaseMutator
+import org.apache.arrow.vector.file.{ArrowReader, ArrowWriter}
+import org.apache.arrow.vector.schema.{ArrowFieldNode, ArrowRecordBatch}
+import org.apache.arrow.vector.types.{FloatingPointPrecision, TimeUnit}
+import org.apache.arrow.vector.types.pojo.{ArrowType, Field, Schema}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.types._
+import org.apache.spark.util.Utils
+
+
+/**
+ * ArrowReader requires a seekable byte channel.
+ * TODO: This is available in arrow-vector now with ARROW-615, to be 
included in 0.2.1 release
+ */
+private[sql] class ByteArrayReadableSeekableByteChannel(var byteArray: 
Array[Byte])
+  extends SeekableByteChannel {
+  var _position: Long = 0L
+
+  override def isOpen: Boolean = {
+byteArray != null
+  }
+
+  override def close(): Unit = {
+byteArray = null
+  }
+
+  override def read(dst: ByteBuffer): Int = {
+val remainingBuf = byteArray.length - _position
+val length = Math.min(dst.remaining(), remainingBuf).toInt
+dst.put(byteArray, _position.toInt, length)
+_position += length
+length
+  }
+
+  override def position(): Long = _position
+
+  override def position(newPosition: Long): SeekableByteChannel = {
+_position = newPosition.toLong
+this
+  }
+
+  override def size: Long = {
+byteArray.length.toLong
+  }
+
+  override def write(src: ByteBuffer): Int = {
+throw new UnsupportedOperationException("Read Only")
+  }
+
+  override def truncate(size: Long): SeekableByteChannel = {
+throw new UnsupportedOperationException("Read Only")
+  }
+}
+
+/**
+ * Intermediate data structure returned from Arrow conversions
+ */
+private[sql] abstract class ArrowPayload extends Iterator[ArrowRecordBatch]
+
+/**
+ * Build a payload from existing ArrowRecordBatches
+ */
+private[sql] class ArrowStaticPayload(batches: ArrowRecordBatch*) extends 
ArrowPayload {
+  private val iter = batches.iterator
+  override def next(): ArrowRecordBatch = iter.next()
+  override def hasNext: Boolean = iter.hasNext
+}
+
+/**
+ * Class that wraps an Arrow RootAllocator used in conversion
+ */
+private[sql] class ArrowConverters {
+  private val _allocator = new RootAllocator(Long.MaxValue)
+
+  private[sql] def allocator: RootAllocator = _allocator
+
+  /**
+   * Iterate over the rows and convert to an ArrowPayload, using 
RootAllocator from this class
+   */
+  def interalRowIterToPayload(rowIter: Iterator[InternalRow], schema: 
StructType): ArrowPayload = {
+val batch = ArrowConverters.internalRowIterToArrowBatch(rowIter, 
schema, _allocator)
+new ArrowStaticPayload(batch)
+  }
+
+  /**
+   * Read an Array of Arrow Record batches as byte Arrays into an 
ArrowPayload, using
+   * RootAllocator from this class
+   */
+  def readPayloadByteArrays(payloadByteArrays: Array[Array[Byte]]): 
ArrowPayload = {
+val batches = 
scala.collection.mutable.ArrayBuffer.empty[ArrowRecordBatch]
+var i = 0
+while (i < payloadByteArrays.length) {
+  val payloadBytes = payloadByteArrays(i)
+  val in = new ByteArrayReadableSeekable

[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-04-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15821#discussion_r112373805
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/ArrowConverters.scala ---
@@ -0,0 +1,432 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql
+
+import java.io.ByteArrayOutputStream
+import java.nio.ByteBuffer
+import java.nio.channels.{Channels, SeekableByteChannel}
+
+import scala.collection.JavaConverters._
+
+import io.netty.buffer.ArrowBuf
+import org.apache.arrow.memory.{BaseAllocator, RootAllocator}
+import org.apache.arrow.vector._
+import org.apache.arrow.vector.BaseValueVector.BaseMutator
+import org.apache.arrow.vector.file.{ArrowReader, ArrowWriter}
+import org.apache.arrow.vector.schema.{ArrowFieldNode, ArrowRecordBatch}
+import org.apache.arrow.vector.types.{FloatingPointPrecision, TimeUnit}
+import org.apache.arrow.vector.types.pojo.{ArrowType, Field, Schema}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.types._
+import org.apache.spark.util.Utils
+
+
+/**
+ * ArrowReader requires a seekable byte channel.
+ * TODO: This is available in arrow-vector now with ARROW-615, to be 
included in 0.2.1 release
+ */
+private[sql] class ByteArrayReadableSeekableByteChannel(var byteArray: 
Array[Byte])
+  extends SeekableByteChannel {
+  var _position: Long = 0L
+
+  override def isOpen: Boolean = {
+byteArray != null
+  }
+
+  override def close(): Unit = {
+byteArray = null
+  }
+
+  override def read(dst: ByteBuffer): Int = {
+val remainingBuf = byteArray.length - _position
+val length = Math.min(dst.remaining(), remainingBuf).toInt
+dst.put(byteArray, _position.toInt, length)
+_position += length
+length
+  }
+
+  override def position(): Long = _position
+
+  override def position(newPosition: Long): SeekableByteChannel = {
+_position = newPosition.toLong
+this
+  }
+
+  override def size: Long = {
+byteArray.length.toLong
+  }
+
+  override def write(src: ByteBuffer): Int = {
+throw new UnsupportedOperationException("Read Only")
+  }
+
+  override def truncate(size: Long): SeekableByteChannel = {
+throw new UnsupportedOperationException("Read Only")
+  }
+}
+
+/**
+ * Intermediate data structure returned from Arrow conversions
+ */
+private[sql] abstract class ArrowPayload extends Iterator[ArrowRecordBatch]
+
+/**
+ * Build a payload from existing ArrowRecordBatches
+ */
+private[sql] class ArrowStaticPayload(batches: ArrowRecordBatch*) extends 
ArrowPayload {
+  private val iter = batches.iterator
+  override def next(): ArrowRecordBatch = iter.next()
+  override def hasNext: Boolean = iter.hasNext
+}
+
+/**
+ * Class that wraps an Arrow RootAllocator used in conversion
+ */
+private[sql] class ArrowConverters {
+  private val _allocator = new RootAllocator(Long.MaxValue)
+
+  private[sql] def allocator: RootAllocator = _allocator
+
+  /**
+   * Iterate over the rows and convert to an ArrowPayload, using 
RootAllocator from this class
+   */
+  def interalRowIterToPayload(rowIter: Iterator[InternalRow], schema: 
StructType): ArrowPayload = {
+val batch = ArrowConverters.internalRowIterToArrowBatch(rowIter, 
schema, _allocator)
+new ArrowStaticPayload(batch)
+  }
+
+  /**
+   * Read an Array of Arrow Record batches as byte Arrays into an 
ArrowPayload, using
+   * RootAllocator from this class
+   */
+  def readPayloadByteArrays(payloadByteArrays: Array[Array[Byte]]): 
ArrowPayload = {
+val batches = 
scala.collection.mutable.ArrayBuffer.empty[ArrowRecordBatch]
+var i = 0
+while (i < payloadByteArrays.length) {
+  val payloadBytes = payloadByteArrays(i)
+  val in = new ByteArrayReadableSeekable

[GitHub] spark issue #17495: [SPARK-20172][Core] Add file permission check when listi...

2017-04-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17495
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75965/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17655: [SPARK-20156] [SQL] [FOLLOW-UP] Java String toLow...

2017-04-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17655#discussion_r112372754
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala ---
@@ -229,6 +229,32 @@ private[sql] trait SQLTestUtils
   }
 
   /**
+   * Drops database `dbName` after calling `f`.
+   */
+  protected def withDatabase(dbNames: String*)(f: => Unit): Unit = {
--- End diff --

In the future, we will use it more when refactoring the test cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17495: [SPARK-20172][Core] Add file permission check when listi...

2017-04-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17495
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17655: [SPARK-20156] [SQL] [FOLLOW-UP] Java String toLowerCase ...

2017-04-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17655
  
Yeah. It sounds the Avro issues are not caused by us. Thus, we are unable 
to fix it. : (

Fixed the issues you mentioned above. They look right to me, but adding 
test cases for them might not be simple. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17495: [SPARK-20172][Core] Add file permission check when listi...

2017-04-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17495
  
**[Test build #75965 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75965/testReport)**
 for PR 17495 at commit 
[`b36fa75`](https://github.com/apache/spark/commit/b36fa75e4b2414d3df9bf6921081147176e4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17655: [SPARK-20156] [SQL] [FOLLOW-UP] Java String toLowerCase ...

2017-04-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17655
  
**[Test build #75972 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75972/testReport)**
 for PR 17655 at commit 
[`aeeaba5`](https://github.com/apache/spark/commit/aeeaba5112e37b705887a7f7e56291b8654fd500).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17670: [SPARK-20281][SQL] Print the identical Range parameters ...

2017-04-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17670
  
**[Test build #75971 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75971/testReport)**
 for PR 17670 at commit 
[`a65d5c6`](https://github.com/apache/spark/commit/a65d5c629a381ea2af54a2198a250590a5e43437).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17670: [SPARK-20281][SQL] Print the identical Range parameters ...

2017-04-19 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/17670
  
Looking around the related code, I think we cannot easily set 
`defaultParallelism` inside `catalyst.plans.logical` because there is no 
obvious way to access `SQLContext` there. So, IIUC, we cannot easily set the 
value at `numSlice` in `ResolveTableValuedFunctions `. Another approach to 
share the default Range `numSlice` of SparkContext APIs and SQL is that [we do 
not set the default value in 
`SparkSession`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L517)
 and we set the default value in `RangeExec` for both cases. Thought? 
@jaceklaskowski @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17631: [SPARK-20319][SQL] Already quoted identifiers are gettin...

2017-04-19 Thread umesh9794

Github user umesh9794 commented on the issue:

https://github.com/apache/spark/pull/17631
  
Thanks @squito for the review. I believe having single quotes in name can 
cause issues on peer systems e.g. databases (JDBC), SERDE types etc. So 
stripping them all would be the preferred way to get rid of all mismatched and 
incomplete names. Waiting on suggestions from the team though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-04-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15821#discussion_r112370906
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/ArrowConverters.scala ---
@@ -0,0 +1,432 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql
+
+import java.io.ByteArrayOutputStream
+import java.nio.ByteBuffer
+import java.nio.channels.{Channels, SeekableByteChannel}
+
+import scala.collection.JavaConverters._
+
+import io.netty.buffer.ArrowBuf
+import org.apache.arrow.memory.{BaseAllocator, RootAllocator}
+import org.apache.arrow.vector._
+import org.apache.arrow.vector.BaseValueVector.BaseMutator
+import org.apache.arrow.vector.file.{ArrowReader, ArrowWriter}
+import org.apache.arrow.vector.schema.{ArrowFieldNode, ArrowRecordBatch}
+import org.apache.arrow.vector.types.{FloatingPointPrecision, TimeUnit}
+import org.apache.arrow.vector.types.pojo.{ArrowType, Field, Schema}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.types._
+import org.apache.spark.util.Utils
+
+
+/**
+ * ArrowReader requires a seekable byte channel.
+ * TODO: This is available in arrow-vector now with ARROW-615, to be 
included in 0.2.1 release
+ */
+private[sql] class ByteArrayReadableSeekableByteChannel(var byteArray: 
Array[Byte])
+  extends SeekableByteChannel {
+  var _position: Long = 0L
+
+  override def isOpen: Boolean = {
+byteArray != null
+  }
+
+  override def close(): Unit = {
+byteArray = null
+  }
+
+  override def read(dst: ByteBuffer): Int = {
+val remainingBuf = byteArray.length - _position
+val length = Math.min(dst.remaining(), remainingBuf).toInt
+dst.put(byteArray, _position.toInt, length)
+_position += length
+length
+  }
+
+  override def position(): Long = _position
+
+  override def position(newPosition: Long): SeekableByteChannel = {
+_position = newPosition.toLong
+this
+  }
+
+  override def size: Long = {
+byteArray.length.toLong
+  }
+
+  override def write(src: ByteBuffer): Int = {
+throw new UnsupportedOperationException("Read Only")
+  }
+
+  override def truncate(size: Long): SeekableByteChannel = {
+throw new UnsupportedOperationException("Read Only")
+  }
+}
+
+/**
+ * Intermediate data structure returned from Arrow conversions
+ */
+private[sql] abstract class ArrowPayload extends Iterator[ArrowRecordBatch]
+
+/**
+ * Build a payload from existing ArrowRecordBatches
+ */
+private[sql] class ArrowStaticPayload(batches: ArrowRecordBatch*) extends 
ArrowPayload {
+  private val iter = batches.iterator
+  override def next(): ArrowRecordBatch = iter.next()
+  override def hasNext: Boolean = iter.hasNext
+}
+
+/**
+ * Class that wraps an Arrow RootAllocator used in conversion
+ */
+private[sql] class ArrowConverters {
+  private val _allocator = new RootAllocator(Long.MaxValue)
+
+  private[sql] def allocator: RootAllocator = _allocator
+
+  /**
+   * Iterate over the rows and convert to an ArrowPayload, using 
RootAllocator from this class
+   */
+  def interalRowIterToPayload(rowIter: Iterator[InternalRow], schema: 
StructType): ArrowPayload = {
+val batch = ArrowConverters.internalRowIterToArrowBatch(rowIter, 
schema, _allocator)
+new ArrowStaticPayload(batch)
+  }
+
+  /**
+   * Read an Array of Arrow Record batches as byte Arrays into an 
ArrowPayload, using
+   * RootAllocator from this class
+   */
+  def readPayloadByteArrays(payloadByteArrays: Array[Array[Byte]]): 
ArrowPayload = {
+val batches = 
scala.collection.mutable.ArrayBuffer.empty[ArrowRecordBatch]
+var i = 0
+while (i < payloadByteArrays.length) {
+  val payloadBytes = payloadByteArrays(i)
+  val in = new ByteArrayReadableSeekable

[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-04-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15821#discussion_r112370321
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/ArrowConverters.scala ---
@@ -0,0 +1,432 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql
+
+import java.io.ByteArrayOutputStream
+import java.nio.ByteBuffer
+import java.nio.channels.{Channels, SeekableByteChannel}
+
+import scala.collection.JavaConverters._
+
+import io.netty.buffer.ArrowBuf
+import org.apache.arrow.memory.{BaseAllocator, RootAllocator}
+import org.apache.arrow.vector._
+import org.apache.arrow.vector.BaseValueVector.BaseMutator
+import org.apache.arrow.vector.file.{ArrowReader, ArrowWriter}
+import org.apache.arrow.vector.schema.{ArrowFieldNode, ArrowRecordBatch}
+import org.apache.arrow.vector.types.{FloatingPointPrecision, TimeUnit}
+import org.apache.arrow.vector.types.pojo.{ArrowType, Field, Schema}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.types._
+import org.apache.spark.util.Utils
+
+
+/**
+ * ArrowReader requires a seekable byte channel.
+ * TODO: This is available in arrow-vector now with ARROW-615, to be 
included in 0.2.1 release
+ */
+private[sql] class ByteArrayReadableSeekableByteChannel(var byteArray: 
Array[Byte])
+  extends SeekableByteChannel {
+  var _position: Long = 0L
+
+  override def isOpen: Boolean = {
+byteArray != null
+  }
+
+  override def close(): Unit = {
+byteArray = null
+  }
+
+  override def read(dst: ByteBuffer): Int = {
+val remainingBuf = byteArray.length - _position
+val length = Math.min(dst.remaining(), remainingBuf).toInt
+dst.put(byteArray, _position.toInt, length)
+_position += length
+length
+  }
+
+  override def position(): Long = _position
+
+  override def position(newPosition: Long): SeekableByteChannel = {
+_position = newPosition.toLong
+this
+  }
+
+  override def size: Long = {
+byteArray.length.toLong
+  }
+
+  override def write(src: ByteBuffer): Int = {
+throw new UnsupportedOperationException("Read Only")
+  }
+
+  override def truncate(size: Long): SeekableByteChannel = {
+throw new UnsupportedOperationException("Read Only")
+  }
+}
+
+/**
+ * Intermediate data structure returned from Arrow conversions
+ */
+private[sql] abstract class ArrowPayload extends Iterator[ArrowRecordBatch]
+
+/**
+ * Build a payload from existing ArrowRecordBatches
+ */
+private[sql] class ArrowStaticPayload(batches: ArrowRecordBatch*) extends 
ArrowPayload {
+  private val iter = batches.iterator
+  override def next(): ArrowRecordBatch = iter.next()
+  override def hasNext: Boolean = iter.hasNext
+}
+
+/**
+ * Class that wraps an Arrow RootAllocator used in conversion
+ */
+private[sql] class ArrowConverters {
+  private val _allocator = new RootAllocator(Long.MaxValue)
+
+  private[sql] def allocator: RootAllocator = _allocator
+
+  /**
+   * Iterate over the rows and convert to an ArrowPayload, using 
RootAllocator from this class
+   */
+  def interalRowIterToPayload(rowIter: Iterator[InternalRow], schema: 
StructType): ArrowPayload = {
--- End diff --

why even have this function? just change the signature of 
ArrowConverters.internalRowIterToArrowBatch and call that directly. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: r

[GitHub] spark issue #17684: [SPARK-20341][SQL] Support BigInt's value that does not ...

2017-04-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17684
  
Ok, could you write a comment in the code to explain it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-04-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15821#discussion_r112368956
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/ArrowConverters.scala ---
@@ -0,0 +1,432 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql
+
+import java.io.ByteArrayOutputStream
+import java.nio.ByteBuffer
+import java.nio.channels.{Channels, SeekableByteChannel}
+
+import scala.collection.JavaConverters._
+
+import io.netty.buffer.ArrowBuf
+import org.apache.arrow.memory.{BaseAllocator, RootAllocator}
+import org.apache.arrow.vector._
+import org.apache.arrow.vector.BaseValueVector.BaseMutator
+import org.apache.arrow.vector.file.{ArrowReader, ArrowWriter}
+import org.apache.arrow.vector.schema.{ArrowFieldNode, ArrowRecordBatch}
+import org.apache.arrow.vector.types.{FloatingPointPrecision, TimeUnit}
+import org.apache.arrow.vector.types.pojo.{ArrowType, Field, Schema}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.types._
+import org.apache.spark.util.Utils
+
+
+/**
+ * ArrowReader requires a seekable byte channel.
+ * TODO: This is available in arrow-vector now with ARROW-615, to be 
included in 0.2.1 release
+ */
+private[sql] class ByteArrayReadableSeekableByteChannel(var byteArray: 
Array[Byte])
+  extends SeekableByteChannel {
+  var _position: Long = 0L
+
+  override def isOpen: Boolean = {
+byteArray != null
+  }
+
+  override def close(): Unit = {
+byteArray = null
+  }
+
+  override def read(dst: ByteBuffer): Int = {
+val remainingBuf = byteArray.length - _position
+val length = Math.min(dst.remaining(), remainingBuf).toInt
+dst.put(byteArray, _position.toInt, length)
+_position += length
+length
+  }
+
+  override def position(): Long = _position
+
+  override def position(newPosition: Long): SeekableByteChannel = {
+_position = newPosition.toLong
+this
+  }
+
+  override def size: Long = {
+byteArray.length.toLong
+  }
+
+  override def write(src: ByteBuffer): Int = {
+throw new UnsupportedOperationException("Read Only")
+  }
+
+  override def truncate(size: Long): SeekableByteChannel = {
+throw new UnsupportedOperationException("Read Only")
+  }
+}
+
+/**
+ * Intermediate data structure returned from Arrow conversions
+ */
+private[sql] abstract class ArrowPayload extends Iterator[ArrowRecordBatch]
--- End diff --

will there be other payload types that are not ArrowStaticPayload?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17695: [SPARK-20400][DOCS] Remove References to 3rd Party Vendo...

2017-04-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17695
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75969/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17695: [SPARK-20400][DOCS] Remove References to 3rd Party Vendo...

2017-04-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17695
  
**[Test build #75969 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75969/testReport)**
 for PR 17695 at commit 
[`5692124`](https://github.com/apache/spark/commit/56921247c5fff8cb113ce2c07a3ba74a4069ab51).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17695: [SPARK-20400][DOCS] Remove References to 3rd Party Vendo...

2017-04-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17695
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-04-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15821#discussion_r112368367
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/ArrowConverters.scala ---
@@ -0,0 +1,432 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql
+
+import java.io.ByteArrayOutputStream
+import java.nio.ByteBuffer
+import java.nio.channels.{Channels, SeekableByteChannel}
+
+import scala.collection.JavaConverters._
+
+import io.netty.buffer.ArrowBuf
+import org.apache.arrow.memory.{BaseAllocator, RootAllocator}
+import org.apache.arrow.vector._
+import org.apache.arrow.vector.BaseValueVector.BaseMutator
+import org.apache.arrow.vector.file.{ArrowReader, ArrowWriter}
+import org.apache.arrow.vector.schema.{ArrowFieldNode, ArrowRecordBatch}
+import org.apache.arrow.vector.types.{FloatingPointPrecision, TimeUnit}
+import org.apache.arrow.vector.types.pojo.{ArrowType, Field, Schema}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.types._
+import org.apache.spark.util.Utils
+
+
+/**
+ * ArrowReader requires a seekable byte channel.
+ * TODO: This is available in arrow-vector now with ARROW-615, to be 
included in 0.2.1 release
+ */
+private[sql] class ByteArrayReadableSeekableByteChannel(var byteArray: 
Array[Byte])
+  extends SeekableByteChannel {
+  var _position: Long = 0L
+
+  override def isOpen: Boolean = {
+byteArray != null
+  }
+
+  override def close(): Unit = {
+byteArray = null
+  }
+
+  override def read(dst: ByteBuffer): Int = {
+val remainingBuf = byteArray.length - _position
+val length = Math.min(dst.remaining(), remainingBuf).toInt
+dst.put(byteArray, _position.toInt, length)
+_position += length
+length
+  }
+
+  override def position(): Long = _position
+
+  override def position(newPosition: Long): SeekableByteChannel = {
+_position = newPosition.toLong
+this
+  }
+
+  override def size: Long = {
+byteArray.length.toLong
+  }
+
+  override def write(src: ByteBuffer): Int = {
+throw new UnsupportedOperationException("Read Only")
+  }
+
+  override def truncate(size: Long): SeekableByteChannel = {
+throw new UnsupportedOperationException("Read Only")
+  }
+}
+
+/**
+ * Intermediate data structure returned from Arrow conversions
+ */
+private[sql] abstract class ArrowPayload extends Iterator[ArrowRecordBatch]
+
+/**
+ * Build a payload from existing ArrowRecordBatches
+ */
+private[sql] class ArrowStaticPayload(batches: ArrowRecordBatch*) extends 
ArrowPayload {
+  private val iter = batches.iterator
+  override def next(): ArrowRecordBatch = iter.next()
+  override def hasNext: Boolean = iter.hasNext
+}
+
+/**
+ * Class that wraps an Arrow RootAllocator used in conversion
+ */
+private[sql] class ArrowConverters {
+  private val _allocator = new RootAllocator(Long.MaxValue)
+
+  private[sql] def allocator: RootAllocator = _allocator
+
+  /**
+   * Iterate over the rows and convert to an ArrowPayload, using 
RootAllocator from this class
+   */
+  def interalRowIterToPayload(rowIter: Iterator[InternalRow], schema: 
StructType): ArrowPayload = {
+val batch = ArrowConverters.internalRowIterToArrowBatch(rowIter, 
schema, _allocator)
+new ArrowStaticPayload(batch)
+  }
+
+  /**
+   * Read an Array of Arrow Record batches as byte Arrays into an 
ArrowPayload, using
+   * RootAllocator from this class
+   */
+  def readPayloadByteArrays(payloadByteArrays: Array[Array[Byte]]): 
ArrowPayload = {
+val batches = 
scala.collection.mutable.ArrayBuffer.empty[ArrowRecordBatch]
+var i = 0
+while (i < payloadByteArrays.length) {
+  val payloadBytes = payloadByteArrays(i)
+  val in = new ByteArrayReadableSeekable

[GitHub] spark pull request #17670: [SPARK-20281][SQL] Print the identical Range para...

2017-04-19 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17670#discussion_r112368162
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2606,4 +2607,15 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   case ae: AnalysisException => assert(ae.plan == null && 
ae.getMessage == ae.getSimpleMessage)
 }
   }
+
+  test("SPARK-20281 Print the identical range parameters of SparkContext 
and SQL in EXPLAIN") {
+def explainStr(df: DataFrame): String = {
+  val explain = ExplainCommand(df.queryExecution.logical, extended = 
false)
+  val sparkPlan = spark.sessionState.executePlan(explain).executedPlan
+  
sparkPlan.executeCollect().map(_.getString(0).trim).headOption.getOrElse("")
+}
+val scRange = sqlContext.range(10)
+val sqlRange = sqlContext.sql("SELECT * FROM range(10)")
+assert(explainStr(scRange) === explainStr(sqlRange))
+  }
--- End diff --

I'll revert


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17670: [SPARK-20281][SQL] Print the identical Range parameters ...

2017-04-19 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/17670
  
okay, I'll fix soon. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17582: [SPARK-20239][Core] Improve HistoryServer's ACL mechanis...

2017-04-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17582
  
**[Test build #75970 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75970/testReport)**
 for PR 17582 at commit 
[`4b3781f`](https://github.com/apache/spark/commit/4b3781ff6dce571130538a3f29a7e386f3e3fb9b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17631: [SPARK-20319][SQL] Already quoted identifiers are gettin...

2017-04-19 Thread squito

Github user squito commented on the issue:

https://github.com/apache/spark/pull/17631
  
this doesn't seem safe in general.  spark treats quotes as a valid part of 
the name, eg:

```scala
scala> val df = sc.parallelize(1 to 10).map { x => (x, (x + 1).toLong, (x - 
1).toString)}.toDF("foo", "\"foo\"", "\"foo")
scala> df.select("\"foo")
res15: org.apache.spark.sql.DataFrame = ["foo: string]
```
you could have conflicts if you removed the quotes (though I admit it would 
be pretty weird).  And then there are cases like only one quote, mismatched, 
etc., do you want to strip them all?

I'm not really sure what Spark can do about this, but also not my area of 
expertise so I will defer to others.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17695: [SPARK-20400][DOCS] Remove References to 3rd Party Vendo...

2017-04-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17695
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17695: [SPARK-20400][DOCS] Remove References to 3rd Party Vendo...

2017-04-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17695
  
**[Test build #75969 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75969/testReport)**
 for PR 17695 at commit 
[`5692124`](https://github.com/apache/spark/commit/56921247c5fff8cb113ce2c07a3ba74a4069ab51).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17695: [SPARK-20400][DOCS] Remove References to 3rd Party Vendo...

2017-04-19 Thread anabranch

Github user anabranch commented on the issue:

https://github.com/apache/spark/pull/17695
  
This should be on hold until a JIRA resolution, I'd like to hear what 
others say.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17695: [SPARK-20400][DOCS] Remove References to 3rd Part...

2017-04-19 Thread anabranch

GitHub user anabranch opened a pull request:

https://github.com/apache/spark/pull/17695

[SPARK-20400][DOCS] Remove References to 3rd Party Vendor Tools

## What changes were proposed in this pull request?

Simple documentation change to remove explicit vendor references.

## How was this patch tested?

NA

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/anabranch/spark remove-vendor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17695.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17695


commit 56921247c5fff8cb113ce2c07a3ba74a4069ab51
Author: anabranch 
Date:   2017-04-20T05:13:20Z

remove references to vendors




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-04-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15125
  
**[Test build #75968 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75968/testReport)**
 for PR 15125 at commit 
[`9a6fd1f`](https://github.com/apache/spark/commit/9a6fd1ffb4c73e9b12c70ba5b1ea952a89c374b6).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-04-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15125
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75968/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-04-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15125
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17582: [SPARK-20239][Core] Improve HistoryServer's ACL mechanis...

2017-04-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17582
  
**[Test build #75967 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75967/testReport)**
 for PR 17582 at commit 
[`68c9d83`](https://github.com/apache/spark/commit/68c9d83a48751e57988f09a46c8e61a073c7d582).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-04-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15125
  
**[Test build #75968 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75968/testReport)**
 for PR 15125 at commit 
[`9a6fd1f`](https://github.com/apache/spark/commit/9a6fd1ffb4c73e9b12c70ba5b1ea952a89c374b6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17670: [SPARK-20281][SQL] Print the identical Range parameters ...

2017-04-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17670
  
As @jaceklaskowski said, it would be good to fill `numSlices` in the rule 
`ResolveTableValuedFunctions`, if it is None.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17680: [SPARK-20364][SQL] Support Parquet predicate pushdown on...

2017-04-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17680
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17680: [SPARK-20364][SQL] Support Parquet predicate pushdown on...

2017-04-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17680
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75964/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17680: [SPARK-20364][SQL] Support Parquet predicate pushdown on...

2017-04-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17680
  
**[Test build #75964 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75964/testReport)**
 for PR 17680 at commit 
[`973e9b8`](https://github.com/apache/spark/commit/973e9b86879187e75163f4b30f941cda60e45d5f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17670: [SPARK-20281][SQL] Print the identical Range para...

2017-04-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17670#discussion_r112366125
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2606,4 +2607,15 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   case ae: AnalysisException => assert(ae.plan == null && 
ae.getMessage == ae.getSimpleMessage)
 }
   }
+
+  test("SPARK-20281 Print the identical range parameters of SparkContext 
and SQL in EXPLAIN") {
+def explainStr(df: DataFrame): String = {
+  val explain = ExplainCommand(df.queryExecution.logical, extended = 
false)
+  val sparkPlan = spark.sessionState.executePlan(explain).executedPlan
+  
sparkPlan.executeCollect().map(_.getString(0).trim).headOption.getOrElse("")
+}
+val scRange = sqlContext.range(10)
+val sqlRange = sqlContext.sql("SELECT * FROM range(10)")
+assert(explainStr(scRange) === explainStr(sqlRange))
+  }
--- End diff --

I think this test case is not needed. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-04-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15821#discussion_r112365872
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1635,21 +1636,49 @@ def toDF(self, *cols):
 return DataFrame(jdf, self.sql_ctx)
 
 @since(1.3)
-def toPandas(self):
-"""Returns the contents of this :class:`DataFrame` as Pandas 
``pandas.DataFrame``.
+def toPandas(self, useArrow=False):
+"""
+Returns the contents of this :class:`DataFrame` as Pandas 
``pandas.DataFrame``.
 
 This is only available if Pandas is installed and available.
 
+:param useArrow: Make use of Apache Arrow for conversion, pyarrow 
must be installed
+and available on the calling Python process (Experimental).
+
 .. note:: This method should only be used if the resulting 
Pandas's DataFrame is expected
 to be small, as all the data is loaded into the driver's 
memory.
 
+.. note:: Using pyarrow is experimental and currently supports the 
following data types:
--- End diff --

remove this after you moved the control to a SQLConf.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-04-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15821#discussion_r112365773
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1635,21 +1636,49 @@ def toDF(self, *cols):
 return DataFrame(jdf, self.sql_ctx)
 
 @since(1.3)
-def toPandas(self):
-"""Returns the contents of this :class:`DataFrame` as Pandas 
``pandas.DataFrame``.
+def toPandas(self, useArrow=False):
--- End diff --

rather than having a flag here, I'd add a SQLConf to control this.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17682: [SPARK-20385][WEB-UI]'Submitted Time' field, the date fo...

2017-04-19 Thread guoxiaolongzte

Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/17682
  
Can I use the Jenkins test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-04-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15821#discussion_r112365501
  
--- Diff: python/pyspark/serializers.py ---
@@ -182,6 +182,23 @@ def loads(self, obj):
 raise NotImplementedError
 
 
+class ArrowSerializer(FramedSerializer):
+"""
+Serializes an Arrow stream.
+"""
+
+def dumps(self, obj):
+raise NotImplementedError
+
+def loads(self, obj):
+from pyarrow import FileReader, BufferReader
--- End diff --

is there a way for us to package that in spark directly?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17649: [SPARK-20380][SQL] Output table comment for DESC FORMATT...

2017-04-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17649
  
Any update? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17636: [SPARK-20334][SQL] Return a better error message when co...

2017-04-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17636
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-04-19 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15821
  
@BryanCutler Are you going to update this for arrow 0.3?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-04-19 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15821
  
Please move ArrowConverters.scala somewhere else that's not top level, e.g. 
execution.arrow


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17680: [SPARK-20364][SQL] Support Parquet predicate pushdown on...

2017-04-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17680
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75963/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17680: [SPARK-20364][SQL] Support Parquet predicate pushdown on...

2017-04-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17680
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17680: [SPARK-20364][SQL] Support Parquet predicate pushdown on...

2017-04-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17680
  
**[Test build #75963 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75963/testReport)**
 for PR 17680 at commit 
[`2c29a6e`](https://github.com/apache/spark/commit/2c29a6e46f13e39f7b98f70ec449713fc788fbd3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17623: [SPARK-20292][SQL] Clean up string representation of Tre...

2017-04-19 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17623
  
@ericl Ok. Thanks for pointing that. I will try to add the test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map

2017-04-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17674


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17674: [SPARK-20375][R] R wrappers for array and map

2017-04-19 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17674
  
merged to master. thanks! one step closer to parity


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17087: [SPARK-19372][SQL] Fix throwing a Java exception at df.f...

2017-04-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17087
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17087: [SPARK-19372][SQL] Fix throwing a Java exception at df.f...

2017-04-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17087
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75961/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16781: [SPARK-12297][SQL] Hive compatibility for Parquet...

2017-04-19 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/16781#discussion_r112362704
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/ParquetHiveCompatibilitySuite.scala
 ---
@@ -141,4 +160,326 @@ class ParquetHiveCompatibilitySuite extends 
ParquetCompatibilityTest with TestHi
   Row(Seq(Row(1))),
   "ARRAY>")
   }
+
+  val testTimezones = Seq(
+"UTC" -> "UTC",
+"LA" -> "America/Los_Angeles",
+"Berlin" -> "Europe/Berlin"
+  )
+  // Check creating parquet tables with timestamps, writing data into 
them, and reading it back out
+  // under a variety of conditions:
+  // * tables with explicit tz and those without
+  // * altering table properties directly
+  // * variety of timezones, local & non-local
+  val sessionTimezones = testTimezones.map(_._2).map(Some(_)) ++ Seq(None)
+  sessionTimezones.foreach { sessionTzOpt =>
+val sparkSession = spark.newSession()
+sessionTzOpt.foreach { tz => 
sparkSession.conf.set(SQLConf.SESSION_LOCAL_TIMEZONE.key, tz) }
+testCreateWriteRead(sparkSession, "no_tz", None, sessionTzOpt)
+val localTz = TimeZone.getDefault.getID()
+testCreateWriteRead(sparkSession, "local", Some(localTz), sessionTzOpt)
+// check with a variety of timezones.  The unit tests currently are 
configured to always use
+// America/Los_Angeles, but even if they didn't, we'd be sure to cover 
a non-local timezone.
+Seq(
+  "UTC" -> "UTC",
+  "LA" -> "America/Los_Angeles",
+  "Berlin" -> "Europe/Berlin"
+).foreach { case (tableName, zone) =>
+  if (zone != localTz) {
+testCreateWriteRead(sparkSession, tableName, Some(zone), 
sessionTzOpt)
+  }
+}
+  }
+
+  private def testCreateWriteRead(
+  sparkSession: SparkSession,
+  baseTable: String,
+  explicitTz: Option[String],
+  sessionTzOpt: Option[String]): Unit = {
+testCreateAlterTablesWithTimezone(sparkSession, baseTable, explicitTz, 
sessionTzOpt)
+testWriteTablesWithTimezone(sparkSession, baseTable, explicitTz, 
sessionTzOpt)
+testReadTablesWithTimezone(sparkSession, baseTable, explicitTz, 
sessionTzOpt)
+  }
+
+  private def checkHasTz(table: String, tz: Option[String]): Unit = {
+val tableMetadata = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier(table))
+
assert(tableMetadata.properties.get(ParquetFileFormat.PARQUET_TIMEZONE_TABLE_PROPERTY)
 === tz)
+  }
+
+  private def testCreateAlterTablesWithTimezone(
+  spark: SparkSession,
+  baseTable: String,
+  explicitTz: Option[String],
+  sessionTzOpt: Option[String]): Unit = {
+test(s"SPARK-12297: Create and Alter Parquet tables and timezones; 
explicitTz = $explicitTz; " +
+  s"sessionTzOpt = $sessionTzOpt") {
+  val key = ParquetFileFormat.PARQUET_TIMEZONE_TABLE_PROPERTY
+  withTable(baseTable, s"like_$baseTable", s"select_$baseTable") {
+val localTz = TimeZone.getDefault()
+val localTzId = localTz.getID()
+// If we ever add a property to set the table timezone by default, 
defaultTz would change
+val defaultTz = None
+// check that created tables have correct TBLPROPERTIES
+val tblProperties = explicitTz.map {
+  tz => raw"""TBLPROPERTIES ($key="$tz")"""
+}.getOrElse("")
+spark.sql(
+  raw"""CREATE TABLE $baseTable (
+|  x int
+| )
+| STORED AS PARQUET
+| $tblProperties
+""".stripMargin)
+val expectedTableTz = explicitTz.orElse(defaultTz)
+checkHasTz(baseTable, expectedTableTz)
+spark.sql(s"CREATE TABLE like_$baseTable LIKE $baseTable")
+checkHasTz(s"like_$baseTable", expectedTableTz)
+spark.sql(
+  raw"""CREATE TABLE select_$baseTable
+| STORED AS PARQUET
+| AS
+| SELECT * from $baseTable
+""".stripMargin)
+checkHasTz(s"select_$baseTable", defaultTz)
+
+// check alter table, setting, unsetting, resetting the property
+spark.sql(
+  raw"""ALTER TABLE $baseTable SET TBLPROPERTIES 
($key="America/Los_Angeles")""")
+checkHasTz(baseTable, Some("America/Los_Angeles"))
+spark.sql( raw"""ALTER TABLE $baseTable SET TBLPROPERTIES 
($key="UTC")""")
+checkHasTz(baseTable, Some("UTC"))
+spark.sql( raw"""ALTER TABLE $baseTable UNSET TBLPROPERTIES 
($key)""")
+checkHasTz(baseTable, None)
+explicitTz.foreach { tz =>
+  spark.sql( raw"""ALTER TABLE $baseTable SET TBLP

[GitHub] spark issue #17087: [SPARK-19372][SQL] Fix throwing a Java exception at df.f...

2017-04-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17087
  
**[Test build #75961 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75961/testReport)**
 for PR 17087 at commit 
[`c2e6b8c`](https://github.com/apache/spark/commit/c2e6b8cabc820d24e882d3441d1738ab0368cecb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-04-19 Thread squito

Github user squito commented on the issue:

https://github.com/apache/spark/pull/16781
  
@ueshin I've pushed an update which addresses your comments.  I also 
realized that partitioned tables weren't handled correctly!  I fixed that as 
well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17672: [SPARK-20371][R] Add wrappers for collect_list an...

2017-04-19 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17672#discussion_r112362340
  
--- Diff: R/pkg/R/functions.R ---
@@ -3652,3 +3652,43 @@ setMethod("posexplode",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"posexplode", x@jc)
 column(jc)
   })
+
+#' collect_list
+#'
+#' Creates a list of objects with duplicates.
+#'
+#' @param x Column to compute on
+#'
+#' @rdname collect_list
+#' @name collect_list
+#' @family agg_func
--- End diff --

it has an 's' though `agg_funcs` - not to nit pick but it needs to match 
exactly


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-04-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16781
  
**[Test build #75966 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75966/testReport)**
 for PR 16781 at commit 
[`44a8bbb`](https://github.com/apache/spark/commit/44a8bbb17484a61dd984fcb451e3b1be8c539e9f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17680: [SPARK-20364][SQL] Support Parquet predicate pushdown on...

2017-04-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17680
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17680: [SPARK-20364][SQL] Support Parquet predicate pushdown on...

2017-04-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17680
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75962/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17680: [SPARK-20364][SQL] Support Parquet predicate pushdown on...

2017-04-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17680
  
**[Test build #75962 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75962/testReport)**
 for PR 17680 at commit 
[`365da42`](https://github.com/apache/spark/commit/365da4218b77004634253415be2b009c86552340).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class TestFileFormatWithNestedSchema extends TestFileFormat `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17631: [SPARK-20319][SQL] Already quoted identifiers are gettin...

2017-04-19 Thread umesh9794

Github user umesh9794 commented on the issue:

https://github.com/apache/spark/pull/17631
  
@marmbrus @squito @vanzin , may I ask you to review this PR ?   


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17631: [SPARK-20319][SQL] Already quoted identifiers are gettin...

2017-04-19 Thread umesh9794

Github user umesh9794 commented on the issue:

https://github.com/apache/spark/pull/17631
  
@marmbrus @squito @vanzin , may I ask you to review this PR ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17678: [SPARK-20381][SQL] Add SQL metrics of numOutputRows for ...

2017-04-19 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/17678
  
Jenkins, test this please.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17582: [SPARK-20239][Core] Improve HistoryServer's ACL m...

2017-04-19 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/17582#discussion_r112358891
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/api/v1/ApiRootResource.scala ---
@@ -184,14 +184,27 @@ private[v1] class ApiRootResource extends 
ApiRequestContext {
   @Path("applications/{appId}/logs")
   def getEventLogs(
   @PathParam("appId") appId: String): EventLogDownloadResource = {
-new EventLogDownloadResource(uiRoot, appId, None)
+try {
+  // withSparkUI will throw NotFoundException if attemptId is existed 
for this application.
+  // So we need to try again with attempt id "1".
+  withSparkUI(appId, None) { _ =>
--- End diff --

Yes, that's true. Either we could get application acls in listing 
applications like what did before.

I assume download is not used very frequently, so workaround probably 
should be OK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17636: [SPARK-20334][SQL] Return a better error message when co...

2017-04-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17636
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17636: [SPARK-20334][SQL] Return a better error message when co...

2017-04-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17636
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75960/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17636: [SPARK-20334][SQL] Return a better error message when co...

2017-04-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17636
  
**[Test build #75960 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75960/testReport)**
 for PR 17636 at commit 
[`2411f3e`](https://github.com/apache/spark/commit/2411f3ee21d1e5b53793432708e84c094e04e976).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17689: [SPARK-20378][CORE][SQL][SS] StreamSinkProvider s...

2017-04-19 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17689#discussion_r112358516
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -151,6 +151,7 @@ trait StreamSourceProvider {
 @InterfaceStability.Unstable
 trait StreamSinkProvider {
   def createSink(
+  schema: StructType,
--- End diff --

The interface can't just be changed like this. This breaks backwards 
compatibility ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 434 matches

Mail list logo