date:20160619

[GitHub] spark issue #10292: SPARK-11882: Custom scheduler support

2016-06-19 Thread cerisier

Github user cerisier commented on the issue:

https://github.com/apache/spark/pull/10292
  
This is pure awesome. Any chance of this being revisited someday ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13769: [SPARK-16030] [SQL] Allow specifying static parti...

2016-06-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13769#discussion_r67638513
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -313,13 +313,32 @@ trait CheckAnalysis extends PredicateHelper {
  |${s.catalogTable.identifier}
""".stripMargin)
 
+  // TODO: We need to consolidate this kind of checks for 
InsertIntoTable
+  // with the rule of PreWriteCheck defined in extendedCheckRules.
   case InsertIntoTable(s: SimpleCatalogRelation, _, _, _, _) =>
 failAnalysis(
   s"""
  |Hive support is required to insert into the following 
tables:
  |${s.catalogTable.identifier}
""".stripMargin)
 
+  case InsertIntoTable(t, _, _, _, _)
--- End diff --

Why do we move these checks from `PreWriteCheck` to here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13769: [SPARK-16030] [SQL] Allow specifying static parti...

2016-06-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13769#discussion_r67638318
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
@@ -43,8 +43,127 @@ import org.apache.spark.unsafe.types.UTF8String
  * Replaces generic operations with specific variants that are designed to 
work with Spark
  * SQL Data Sources.
  */
-private[sql] object DataSourceAnalysis extends Rule[LogicalPlan] {
+private[sql] case class DataSourceAnalysis(conf: CatalystConf) extends 
Rule[LogicalPlan] {
+
+  def resolver: Resolver = {
+if (conf.caseSensitiveAnalysis) {
+  caseSensitiveResolution
+} else {
+  caseInsensitiveResolution
+}
+  }
+
+  // The access modifier is used to expose this method to tests.
+  private[sql] def convertStaticPartitions(
+sourceAttributes: Seq[Attribute],
+providedPartitions: Map[String, Option[String]],
+targetAttributes: Seq[Attribute],
+targetPartitionSchema: StructType): Seq[NamedExpression] = {
+
+assert(providedPartitions.exists(_._2.isDefined))
+
+val staticPartitions = providedPartitions.flatMap {
+  case (partKey, Some(partValue)) => (partKey, partValue) :: Nil
+  case (_, None) => Nil
+}
+
+// The sum of the number of static partition columns and columns 
provided in the SELECT
+// clause needs to match the number of columns of the target table.
+if (staticPartitions.size + sourceAttributes.size != 
targetAttributes.size) {
--- End diff --

in `PreprocessTableInsertion`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13769: [SPARK-16030] [SQL] Allow specifying static parti...

2016-06-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13769#discussion_r67638211
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
@@ -43,8 +43,127 @@ import org.apache.spark.unsafe.types.UTF8String
  * Replaces generic operations with specific variants that are designed to 
work with Spark
  * SQL Data Sources.
  */
-private[sql] object DataSourceAnalysis extends Rule[LogicalPlan] {
+private[sql] case class DataSourceAnalysis(conf: CatalystConf) extends 
Rule[LogicalPlan] {
+
+  def resolver: Resolver = {
+if (conf.caseSensitiveAnalysis) {
+  caseSensitiveResolution
+} else {
+  caseInsensitiveResolution
+}
+  }
+
+  // The access modifier is used to expose this method to tests.
+  private[sql] def convertStaticPartitions(
+sourceAttributes: Seq[Attribute],
+providedPartitions: Map[String, Option[String]],
+targetAttributes: Seq[Attribute],
+targetPartitionSchema: StructType): Seq[NamedExpression] = {
+
+assert(providedPartitions.exists(_._2.isDefined))
+
+val staticPartitions = providedPartitions.flatMap {
+  case (partKey, Some(partValue)) => (partKey, partValue) :: Nil
+  case (_, None) => Nil
+}
+
+// The sum of the number of static partition columns and columns 
provided in the SELECT
+// clause needs to match the number of columns of the target table.
+if (staticPartitions.size + sourceAttributes.size != 
targetAttributes.size) {
--- End diff --

Looks like we already have this check somewhere?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13769: [SPARK-16030] [SQL] Allow specifying static partitions w...

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13769
  
**[Test build #60833 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60833/consoleFull)**
 for PR 13769 at commit 
[`ba9c04c`](https://github.com/apache/spark/commit/ba9c04cfe46680e5145859b086357f3ed1a76ff1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13770
  
**[Test build #60832 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60832/consoleFull)**
 for PR 13770 at commit 
[`cd794cd`](https://github.com/apache/spark/commit/cd794cdfc7867e792f3db09504773d450ca6f8a9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13761: [SPARK-12197] [SparkCore] Kryo & Avro - Support Schema R...

2016-06-19 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/13761
  
Don't `Dataset`s and `Encoder`s make this less relevant? What would be the 
use case here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13776: [SPARK-16050][Tests]Remove the flaky test: ConsoleSinkSu...

2016-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13776
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60829/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13776: [SPARK-16050][Tests]Remove the flaky test: ConsoleSinkSu...

2016-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13776
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13776: [SPARK-16050][Tests]Remove the flaky test: ConsoleSinkSu...

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13776
  
**[Test build #60829 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60829/consoleFull)**
 for PR 13776 at commit 
[`6e136ff`](https://github.com/apache/spark/commit/6e136ff97ee47838cd15137f85747d61d2e148b2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...

2016-06-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/13770
  
@rxin So far, I think we do not provide such a capability for table-level 
configuration. For `parquet`, the `DataFrameReader`'s option `mergeSchema` has 
a higher priority than the global configuration 
`spark.sql.hive.convertMetastoreParquet.mergeSchema`. 

However, I agree. We definitely should do it in the near future. Thus, let 
me remove this checking now. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13769: [SPARK-16030] [SQL] Allow specifying static parti...

2016-06-19 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13769#discussion_r67637488
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
@@ -43,8 +43,128 @@ import org.apache.spark.unsafe.types.UTF8String
  * Replaces generic operations with specific variants that are designed to 
work with Spark
  * SQL Data Sources.
  */
-private[sql] object DataSourceAnalysis extends Rule[LogicalPlan] {
+private[sql] case class DataSourceAnalysis(conf: CatalystConf) extends 
Rule[LogicalPlan] {
+
+  def resolver: Resolver = {
+if (conf.caseSensitiveAnalysis) {
+  caseSensitiveResolution
+} else {
+  caseInsensitiveResolution
+}
+  }
+
+  // The access modifier is used to expose this method to tests.
+  private[sql] def convertStaticPartitions(
+sourceAttributes: Seq[Attribute],
+providedPartitions: Map[String, Option[String]],
+targetAttributes: Seq[Attribute],
+targetPartitionSchema: StructType): Seq[NamedExpression] = {
+
+assert(providedPartitions.exists(_._2.isDefined))
+
+val staticPartitions = providedPartitions.flatMap {
+  case (partKey, Some(partValue)) => (partKey, partValue) :: Nil
+  case (_, None) => Nil
+}
+
+// The sum of the number of static partition columns and columns 
provided in the SELECT
+// clause needs to match the number of columns of the target table.
+if (staticPartitions.size + sourceAttributes.size != 
targetAttributes.size) {
+  throw new AnalysisException(
+s"The data to be inserted needs to have the same number of " +
+  s"columns as the target table: target table has 
${targetAttributes.size} " +
+  s"column(s) but the inserted data has ${sourceAttributes.size + 
staticPartitions.size} " +
+  s"column(s), which contain ${staticPartitions.size} partition 
column(s) having " +
+  s"assigned constant values.")
+}
+
+if (providedPartitions.size != targetPartitionSchema.fields.size) {
+  throw new AnalysisException(
+s"The data to be inserted needs to have the same number of " +
+  s"partition columns as the target table: target table " +
+  s"has ${targetPartitionSchema.fields.size} partition column(s) 
but the inserted " +
+  s"data has ${providedPartitions.size} partition columns 
specified.")
+}
+
+staticPartitions.foreach {
+  case (partKey, partValue) =>
+if (!targetPartitionSchema.fields.exists(field => 
resolver(field.name, partKey))) {
+  throw new AnalysisException(
+s"$partKey is not a partition column. Partition columns are " +
+  s"${targetPartitionSchema.fields.map(_.name).mkString("[", 
",", "]")}")
+}
+}
+
+val partitionList = targetPartitionSchema.fields.map { field =>
+  val potentialSpecs = staticPartitions.filter {
+case (partKey, partValue) => resolver(field.name, partKey)
+  }
+  if (potentialSpecs.size == 0) {
+None
+  } else if (potentialSpecs.size == 1) {
+val partValue = potentialSpecs.head._2
+Some(Alias(Cast(Literal(partValue), field.dataType), 
"_staticPart")())
+  } else {
+throw new AnalysisException(
+  s"Partition column ${field.name} have multiple values specified, 
" +
+s"${potentialSpecs.mkString("[", ", ", "]")}. Please only 
specify a single value.")
+  }
+}
+
+partitionList.sliding(2).foreach { v =>
--- End diff --

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13676: [SPARK-15956] [SQL] When unwrapping ORC avoid pat...

2016-06-19 Thread dafrista

Github user dafrista commented on a diff in the pull request:

https://github.com/apache/spark/pull/13676#discussion_r67637381
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala ---
@@ -479,8 +340,299 @@ private[hive] trait HiveInspectors {
   }
 
   /**
-   * Builds specific unwrappers ahead of time according to object inspector
+   * Builds unwrappers ahead of time according to object inspector
* types to avoid pattern matching and branching costs per row.
+   *
+   * Strictly follows the following order in unwrapping (constant OI has 
the higher priority):
+   * Constant Null object inspector =>
+   *   return null
+   * Constant object inspector =>
+   *   extract the value from constant object inspector
+   * If object inspector prefers writable =>
+   *   extract writable from `data` and then get the catalyst type from 
the writable
+   * Extract the java object directly from the object inspector
+   *
+   * NOTICE: the complex data type requires recursive unwrapping.
+   *
+   * @param objectInspector the ObjectInspector used to create an 
unwrapper.
+   * @return A function that unwraps data objects.
+   * Use the overloaded HiveStructField version for in-place 
updating of a MutableRow.
+   */
+  def unwrapperFor(objectInspector: ObjectInspector): Any => Any =
+objectInspector match {
+  case coi: ConstantObjectInspector if coi.getWritableConstantValue == 
null =>
+data: Any => null
+  case poi: WritableConstantStringObjectInspector =>
+data: Any =>
+  UTF8String.fromString(poi.getWritableConstantValue.toString)
+  case poi: WritableConstantHiveVarcharObjectInspector =>
+data: Any =>
+  
UTF8String.fromString(poi.getWritableConstantValue.getHiveVarchar.getValue)
+  case poi: WritableConstantHiveCharObjectInspector =>
+data: Any =>
+  
UTF8String.fromString(poi.getWritableConstantValue.getHiveChar.getValue)
+  case poi: WritableConstantHiveDecimalObjectInspector =>
+data: Any =>
+  HiveShim.toCatalystDecimal(
+PrimitiveObjectInspectorFactory.javaHiveDecimalObjectInspector,
+poi.getWritableConstantValue.getHiveDecimal)
+  case poi: WritableConstantTimestampObjectInspector =>
+data: Any => {
+  val t = poi.getWritableConstantValue
+  t.getSeconds * 100L + t.getNanos / 1000L
+}
+  case poi: WritableConstantIntObjectInspector =>
+data: Any =>
+  poi.getWritableConstantValue.get()
--- End diff --

You're right, as the contract for a `ConstantObjectInspector` is that its 
object "represent constant values and can return them without an evaluation" 
[[1](https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ConstantObjectInspector.java)].
 I will make this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13769: [SPARK-16030] [SQL] Allow specifying static parti...

2016-06-19 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13769#discussion_r67637303
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
@@ -43,8 +43,128 @@ import org.apache.spark.unsafe.types.UTF8String
  * Replaces generic operations with specific variants that are designed to 
work with Spark
  * SQL Data Sources.
  */
-private[sql] object DataSourceAnalysis extends Rule[LogicalPlan] {
+private[sql] case class DataSourceAnalysis(conf: CatalystConf) extends 
Rule[LogicalPlan] {
+
+  def resolver: Resolver = {
+if (conf.caseSensitiveAnalysis) {
+  caseSensitiveResolution
+} else {
+  caseInsensitiveResolution
+}
+  }
+
+  // The access modifier is used to expose this method to tests.
+  private[sql] def convertStaticPartitions(
+sourceAttributes: Seq[Attribute],
+providedPartitions: Map[String, Option[String]],
+targetAttributes: Seq[Attribute],
+targetPartitionSchema: StructType): Seq[NamedExpression] = {
+
+assert(providedPartitions.exists(_._2.isDefined))
+
+val staticPartitions = providedPartitions.flatMap {
+  case (partKey, Some(partValue)) => (partKey, partValue) :: Nil
+  case (_, None) => Nil
+}
+
+// The sum of the number of static partition columns and columns 
provided in the SELECT
+// clause needs to match the number of columns of the target table.
+if (staticPartitions.size + sourceAttributes.size != 
targetAttributes.size) {
+  throw new AnalysisException(
+s"The data to be inserted needs to have the same number of " +
+  s"columns as the target table: target table has 
${targetAttributes.size} " +
+  s"column(s) but the inserted data has ${sourceAttributes.size + 
staticPartitions.size} " +
+  s"column(s), which contain ${staticPartitions.size} partition 
column(s) having " +
+  s"assigned constant values.")
+}
+
+if (providedPartitions.size != targetPartitionSchema.fields.size) {
+  throw new AnalysisException(
+s"The data to be inserted needs to have the same number of " +
+  s"partition columns as the target table: target table " +
+  s"has ${targetPartitionSchema.fields.size} partition column(s) 
but the inserted " +
+  s"data has ${providedPartitions.size} partition columns 
specified.")
+}
+
+staticPartitions.foreach {
+  case (partKey, partValue) =>
+if (!targetPartitionSchema.fields.exists(field => 
resolver(field.name, partKey))) {
+  throw new AnalysisException(
+s"$partKey is not a partition column. Partition columns are " +
+  s"${targetPartitionSchema.fields.map(_.name).mkString("[", 
",", "]")}")
+}
+}
+
+val partitionList = targetPartitionSchema.fields.map { field =>
+  val potentialSpecs = staticPartitions.filter {
+case (partKey, partValue) => resolver(field.name, partKey)
+  }
+  if (potentialSpecs.size == 0) {
+None
+  } else if (potentialSpecs.size == 1) {
+val partValue = potentialSpecs.head._2
+Some(Alias(Cast(Literal(partValue), field.dataType), 
"_staticPart")())
+  } else {
+throw new AnalysisException(
+  s"Partition column ${field.name} have multiple values specified, 
" +
+s"${potentialSpecs.mkString("[", ", ", "]")}. Please only 
specify a single value.")
+  }
+}
+
+partitionList.sliding(2).foreach { v =>
--- End diff --

We can use the following check instead:

```scala
partitionList.dropWhile(_.isDefined).collectFirst {
  case Some(_) =>
throw new AnalysisException("...")
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13777: [SPARK-16061][SQL][Minor] The property "spark.sql...

2016-06-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13777#discussion_r67637263
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala
 ---
@@ -115,7 +115,7 @@ case class KeyRemoved(key: UnsafeRow) extends 
StoreUpdate
  */
 private[sql] object StateStore extends Logging {
 
-  val MAINTENANCE_INTERVAL_CONFIG = 
"spark.streaming.stateStore.maintenanceInterval"
+  val MAINTENANCE_INTERVAL_CONFIG = 
"spark.sql.streaming.stateStore.maintenanceInterval"
--- End diff --

+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...

2016-06-19 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13770
  
hm - what if i want to specify specific options when reading data from a 
table?

e.g. whether to use the vectorized reader or not?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13676: [SPARK-15956] [SQL] When unwrapping ORC avoid pattern ma...

2016-06-19 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/13676
  
This looks pretty good. What I am thinking is that generating an encoder 
could create another nice performance speedup here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13769: [SPARK-16030] [SQL] Allow specifying static parti...

2016-06-19 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13769#discussion_r67637019
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
@@ -43,8 +43,128 @@ import org.apache.spark.unsafe.types.UTF8String
  * Replaces generic operations with specific variants that are designed to 
work with Spark
  * SQL Data Sources.
  */
-private[sql] object DataSourceAnalysis extends Rule[LogicalPlan] {
+private[sql] case class DataSourceAnalysis(conf: CatalystConf) extends 
Rule[LogicalPlan] {
+
+  def resolver: Resolver = {
+if (conf.caseSensitiveAnalysis) {
+  caseSensitiveResolution
+} else {
+  caseInsensitiveResolution
+}
+  }
+
+  // The access modifier is used to expose this method to tests.
+  private[sql] def convertStaticPartitions(
+sourceAttributes: Seq[Attribute],
+providedPartitions: Map[String, Option[String]],
+targetAttributes: Seq[Attribute],
+targetPartitionSchema: StructType): Seq[NamedExpression] = {
+
+assert(providedPartitions.exists(_._2.isDefined))
+
+val staticPartitions = providedPartitions.flatMap {
+  case (partKey, Some(partValue)) => (partKey, partValue) :: Nil
+  case (_, None) => Nil
+}
+
+// The sum of the number of static partition columns and columns 
provided in the SELECT
+// clause needs to match the number of columns of the target table.
+if (staticPartitions.size + sourceAttributes.size != 
targetAttributes.size) {
+  throw new AnalysisException(
+s"The data to be inserted needs to have the same number of " +
+  s"columns as the target table: target table has 
${targetAttributes.size} " +
+  s"column(s) but the inserted data has ${sourceAttributes.size + 
staticPartitions.size} " +
+  s"column(s), which contain ${staticPartitions.size} partition 
column(s) having " +
+  s"assigned constant values.")
+}
+
+if (providedPartitions.size != targetPartitionSchema.fields.size) {
+  throw new AnalysisException(
+s"The data to be inserted needs to have the same number of " +
+  s"partition columns as the target table: target table " +
+  s"has ${targetPartitionSchema.fields.size} partition column(s) 
but the inserted " +
+  s"data has ${providedPartitions.size} partition columns 
specified.")
+}
+
+staticPartitions.foreach {
+  case (partKey, partValue) =>
+if (!targetPartitionSchema.fields.exists(field => 
resolver(field.name, partKey))) {
+  throw new AnalysisException(
+s"$partKey is not a partition column. Partition columns are " +
+  s"${targetPartitionSchema.fields.map(_.name).mkString("[", 
",", "]")}")
+}
+}
+
+val partitionList = targetPartitionSchema.fields.map { field =>
+  val potentialSpecs = staticPartitions.filter {
+case (partKey, partValue) => resolver(field.name, partKey)
+  }
+  if (potentialSpecs.size == 0) {
+None
+  } else if (potentialSpecs.size == 1) {
+val partValue = potentialSpecs.head._2
+Some(Alias(Cast(Literal(partValue), field.dataType), 
"_staticPart")())
+  } else {
+throw new AnalysisException(
+  s"Partition column ${field.name} have multiple values specified, 
" +
+s"${potentialSpecs.mkString("[", ", ", "]")}. Please only 
specify a single value.")
+  }
+}
+
+partitionList.sliding(2).foreach { v =>
--- End diff --

`sliding(2)` can be dangerous for single-element collections:

```
scala> Seq(1).sliding(2).foreach(println)
List(1)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13777: [SPARK-16061][SQL][Minor] The property "spark.sql...

2016-06-19 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13777#discussion_r67636827
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala
 ---
@@ -115,7 +115,7 @@ case class KeyRemoved(key: UnsafeRow) extends 
StoreUpdate
  */
 private[sql] object StateStore extends Logging {
 
-  val MAINTENANCE_INTERVAL_CONFIG = 
"spark.streaming.stateStore.maintenanceInterval"
+  val MAINTENANCE_INTERVAL_CONFIG = 
"spark.sql.streaming.stateStore.maintenanceInterval"
--- End diff --

@tdas shouldn't this property be registered in `SQLConf`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2016-06-19 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r67636641
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -590,6 +592,53 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   }
 
   /**
+   * Create a [[CreateMacroCommand]] command.
+   *
+   * For example:
+   * {{{
+   *   CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) 
expression;
+   * }}}
+   */
+  override def visitCreateMacro(ctx: CreateMacroContext): LogicalPlan = 
withOrigin(ctx) {
+val arguments = Option(ctx.colTypeList).map(visitColTypeList(_))
+  .getOrElse(Seq.empty[StructField]).map { col =>
+  AttributeReference(col.name, col.dataType, col.nullable, 
col.metadata)() }
+val colToIndex: Map[String, Int] = 
arguments.map(_.name).zipWithIndex.toMap
--- End diff --

Ah, I see. You could also move this code into the companion object of the 
`CreateMacroCommand`. That woud also work. It is just that this code isn't 
parser specific.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2016-06-19 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r67636452
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/macros.scala ---
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import org.apache.spark.sql.{AnalysisException, Row, SparkSession}
+import org.apache.spark.sql.catalyst.expressions._
+
+/**
+ * The DDL command that creates a macro.
+ * To create a temporary macro, the syntax of using this command in SQL is:
+ * {{{
+ *CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) 
expression;
+ * }}}
+ */
+case class CreateMacroCommand(
+macroName: String,
+columns: Seq[AttributeReference],
+macroFunction: Expression)
+  extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val macroInfo = columns.mkString(",") + " -> " + macroFunction.toString
+val info = new ExpressionInfo(macroInfo, macroName)
+val builder = (children: Seq[Expression]) => {
+  if (children.size != columns.size) {
+throw new AnalysisException(s"Actual number of columns: 
${children.size} != " +
+  s"expected number of columns: ${columns.size} for Macro 
$macroName")
+  }
+  macroFunction.transformUp {
+case b: BoundReference => children(b.ordinal)
--- End diff --

Ok that is perfect.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader

2016-06-19 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13775
  
@hvanhovell @rxin Got it. Thanks! I will re-run the benchmark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13676: [SPARK-15956] [SQL] When unwrapping ORC avoid pat...

2016-06-19 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13676#discussion_r67636421
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala ---
@@ -479,8 +340,299 @@ private[hive] trait HiveInspectors {
   }
 
   /**
-   * Builds specific unwrappers ahead of time according to object inspector
+   * Builds unwrappers ahead of time according to object inspector
* types to avoid pattern matching and branching costs per row.
+   *
+   * Strictly follows the following order in unwrapping (constant OI has 
the higher priority):
+   * Constant Null object inspector =>
+   *   return null
+   * Constant object inspector =>
+   *   extract the value from constant object inspector
+   * If object inspector prefers writable =>
+   *   extract writable from `data` and then get the catalyst type from 
the writable
+   * Extract the java object directly from the object inspector
+   *
+   * NOTICE: the complex data type requires recursive unwrapping.
+   *
+   * @param objectInspector the ObjectInspector used to create an 
unwrapper.
+   * @return A function that unwraps data objects.
+   * Use the overloaded HiveStructField version for in-place 
updating of a MutableRow.
+   */
+  def unwrapperFor(objectInspector: ObjectInspector): Any => Any =
+objectInspector match {
+  case coi: ConstantObjectInspector if coi.getWritableConstantValue == 
null =>
+data: Any => null
+  case poi: WritableConstantStringObjectInspector =>
+data: Any =>
+  UTF8String.fromString(poi.getWritableConstantValue.toString)
+  case poi: WritableConstantHiveVarcharObjectInspector =>
+data: Any =>
+  
UTF8String.fromString(poi.getWritableConstantValue.getHiveVarchar.getValue)
+  case poi: WritableConstantHiveCharObjectInspector =>
+data: Any =>
+  
UTF8String.fromString(poi.getWritableConstantValue.getHiveChar.getValue)
+  case poi: WritableConstantHiveDecimalObjectInspector =>
+data: Any =>
+  HiveShim.toCatalystDecimal(
+PrimitiveObjectInspectorFactory.javaHiveDecimalObjectInspector,
+poi.getWritableConstantValue.getHiveDecimal)
+  case poi: WritableConstantTimestampObjectInspector =>
+data: Any => {
+  val t = poi.getWritableConstantValue
+  t.getSeconds * 100L + t.getNanos / 1000L
+}
+  case poi: WritableConstantIntObjectInspector =>
+data: Any =>
+  poi.getWritableConstantValue.get()
--- End diff --

Isn't it faster to call `poi.getWritableConstantValue.get()` outside of the 
function? And use the result in the function? Or am I missing something here? 
The same goes for all other constants. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader

2016-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13775
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader

2016-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13775
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60828/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13775
  
**[Test build #60828 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60828/consoleFull)**
 for PR 13775 at commit 
[`20b832e`](https://github.com/apache/spark/commit/20b832ee4e5ed4e794cc1bc8f2f67cce973759e0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13769: [SPARK-16030] [SQL] Allow specifying static partitions w...

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13769
  
**[Test build #60831 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60831/consoleFull)**
 for PR 13769 at commit 
[`6bd7b6f`](https://github.com/apache/spark/commit/6bd7b6fb992af794216fa7752d25747d64d82280).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2016-06-19 Thread lianhuiwang

Github user lianhuiwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r67635961
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -590,6 +592,53 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   }
 
   /**
+   * Create a [[CreateMacroCommand]] command.
+   *
+   * For example:
+   * {{{
+   *   CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) 
expression;
+   * }}}
+   */
+  override def visitCreateMacro(ctx: CreateMacroContext): LogicalPlan = 
withOrigin(ctx) {
+val arguments = Option(ctx.colTypeList).map(visitColTypeList(_))
+  .getOrElse(Seq.empty[StructField]).map { col =>
+  AttributeReference(col.name, col.dataType, col.nullable, 
col.metadata)() }
+val colToIndex: Map[String, Int] = 
arguments.map(_.name).zipWithIndex.toMap
--- End diff --

@hvanhovell So i think i will create a new Wrapper class to avoid 
unresolved exception in order to DataFrame can reuse this feature later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13766: [SPARK-16036][SPARK-16037][SPARK-16034][SQL] Foll...

2016-06-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13766


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2016-06-19 Thread lianhuiwang

Github user lianhuiwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r67635827
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -590,6 +592,53 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   }
 
   /**
+   * Create a [[CreateMacroCommand]] command.
+   *
+   * For example:
+   * {{{
+   *   CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) 
expression;
+   * }}}
+   */
+  override def visitCreateMacro(ctx: CreateMacroContext): LogicalPlan = 
withOrigin(ctx) {
+val arguments = Option(ctx.colTypeList).map(visitColTypeList(_))
+  .getOrElse(Seq.empty[StructField]).map { col =>
+  AttributeReference(col.name, col.dataType, col.nullable, 
col.metadata)() }
+val colToIndex: Map[String, Int] = 
arguments.map(_.name).zipWithIndex.toMap
--- End diff --

@hvanhovell  So i think i will create a new Wrapper class to avoid 
unresolved exception in order to DataFrame can reuse this feature later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13766: [SPARK-16036][SPARK-16037][SPARK-16034][SQL] Follow up c...

2016-06-19 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/13766
  
Thanks. I am merging this to master and branch 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2016-06-19 Thread lianhuiwang

Github user lianhuiwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r67635570
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/macros.scala ---
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import org.apache.spark.sql.{AnalysisException, Row, SparkSession}
+import org.apache.spark.sql.catalyst.expressions._
+
+/**
+ * The DDL command that creates a macro.
+ * To create a temporary macro, the syntax of using this command in SQL is:
+ * {{{
+ *CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) 
expression;
+ * }}}
+ */
+case class CreateMacroCommand(
+macroName: String,
+columns: Seq[AttributeReference],
+macroFunction: Expression)
+  extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val macroInfo = columns.mkString(",") + " -> " + macroFunction.toString
+val info = new ExpressionInfo(macroInfo, macroName)
+val builder = (children: Seq[Expression]) => {
+  if (children.size != columns.size) {
+throw new AnalysisException(s"Actual number of columns: 
${children.size} != " +
+  s"expected number of columns: ${columns.size} for Macro 
$macroName")
+  }
+  macroFunction.transformUp {
+case b: BoundReference => children(b.ordinal)
--- End diff --

@hvanhovell good points. Because Analyzer will check expression's 
checkInputDataTypes after ResolveFunctions, I think we do not validate input 
type here. Now i do not think it has benefits if we did casts, but it maybe 
cause unnecessary casts. I will add some comments for it. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13772: [SPARK-16049][SQL] Make InsertIntoTable's expectedColumn...

2016-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13772
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60826/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13772: [SPARK-16049][SQL] Make InsertIntoTable's expectedColumn...

2016-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13772
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13772: [SPARK-16049][SQL] Make InsertIntoTable's expectedColumn...

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13772
  
**[Test build #60826 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60826/consoleFull)**
 for PR 13772 at commit 
[`a472cc2`](https://github.com/apache/spark/commit/a472cc2a2f87ad1404b49ae2c2c75a769db6fc18).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13777: [SPARK-16061][SQL][Minor] The property "spark.sql.stateS...

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13777
  
**[Test build #60830 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60830/consoleFull)**
 for PR 13777 at commit 
[`72e8c80`](https://github.com/apache/spark/commit/72e8c809a90e718b10205820fd590b8a0041bdc8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13748: [SPARK-16031] Add debug-only socket source in Str...

2016-06-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13748


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13777: [SPARK-16061][SQL] The property "spark.sql.stateS...

2016-06-19 Thread sarutak

GitHub user sarutak opened a pull request:

https://github.com/apache/spark/pull/13777

[SPARK-16061][SQL] The property "spark.sql.stateStore.maintenanceInterval" 
should be renamed to "spark.streaming.stateStore.maintenanceInterval"

## What changes were proposed in this pull request?
The property spark.streaming.stateStore.maintenanceInterval should be 
renamed and harmonized with other properties related to Structured Streaming 
like spark.sql.streaming.stateStore.minDeltasForSnapshot.

## How was this patch tested?
Existing unit tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sarutak/spark SPARK-16061

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13777.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13777


commit 72e8c809a90e718b10205820fd590b8a0041bdc8
Author: Kousuke Saruta 
Date:   2016-06-20T04:17:16Z

Renamed spark.streaming.stateStore.maintenanceInterval to 
spark.sql.stateStore.maintenanceInterval




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader

2016-06-19 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13775
  
@viirya when you construct a performance benchmark, you would want to 
minimize the overhead of things outside the code path you are testing. In this 
case, a lot of the time were spent in the collect operation.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13748: [SPARK-16031] Add debug-only socket source in Structured...

2016-06-19 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13748
  
LGTM - merging in master/2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader

2016-06-19 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/13775
  
Would PR https://github.com/apache/spark/pull/13676 help to improve 
performance?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader

2016-06-19 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/13775
  
@viirya could you re-run the benchmarks without calling collect(). Do a 
count or a simple aggregate instead, collect spends a tonne of time in 
serializing results from `InternalRow` to `Row`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2016-06-19 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r67634631
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/macros.scala ---
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import org.apache.spark.sql.{AnalysisException, Row, SparkSession}
+import org.apache.spark.sql.catalyst.expressions._
+
+/**
+ * The DDL command that creates a macro.
+ * To create a temporary macro, the syntax of using this command in SQL is:
+ * {{{
+ *CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) 
expression;
+ * }}}
+ */
+case class CreateMacroCommand(
+macroName: String,
+columns: Seq[AttributeReference],
+macroFunction: Expression)
+  extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val macroInfo = columns.mkString(",") + " -> " + macroFunction.toString
+val info = new ExpressionInfo(macroInfo, macroName)
+val builder = (children: Seq[Expression]) => {
+  if (children.size != columns.size) {
+throw new AnalysisException(s"Actual number of columns: 
${children.size} != " +
+  s"expected number of columns: ${columns.size} for Macro 
$macroName")
+  }
+  macroFunction.transformUp {
+case b: BoundReference => children(b.ordinal)
--- End diff --

We do not validate the input type here. This would be entirely fine if 
macro arguments were defined without a `DataType`. I am not sure what we need 
to do here though. We have two options:
- Ignore the DataType and rely on the expressions `inputTypes` to get 
casting done. This must be documented though. 
- Introduce casts to make sure the input conforms to the required input.

What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2016-06-19 Thread lianhuiwang

Github user lianhuiwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r67634511
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -590,6 +592,53 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   }
 
   /**
+   * Create a [[CreateMacroCommand]] command.
+   *
+   * For example:
+   * {{{
+   *   CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) 
expression;
+   * }}}
+   */
+  override def visitCreateMacro(ctx: CreateMacroContext): LogicalPlan = 
withOrigin(ctx) {
+val arguments = Option(ctx.colTypeList).map(visitColTypeList(_))
+  .getOrElse(Seq.empty[StructField]).map { col =>
+  AttributeReference(col.name, col.dataType, col.nullable, 
col.metadata)() }
+val colToIndex: Map[String, Int] = 
arguments.map(_.name).zipWithIndex.toMap
--- End diff --

Why i do not to move this into the CreateMacroCommand? Because 
analyzer.checkAnalysis() will check if  macroFunction of CreateMacroCommand is 
invalid. macroFunction has UnresolvedAttributes, So analyzer.checkAnalysis() 
will throw a unresolved exception. If it resolved UnresolvedAttributes before,  
analyzer.checkAnalysis()  does not throw a exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13706: [SPARK-15988] [SQL] Implement DDL commands: Create/Drop ...

2016-06-19 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/13706
  
@lianhuiwang thanks for updating the PR. Could you implement the Macro 
removal by pattern matching on a (to be created) `MacroFunctionBuilder` class. 
I feel this is simpler, and doesn't touch as much of the API's.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...

2016-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13770
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60825/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...

2016-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13770
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13770
  
**[Test build #60825 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60825/consoleFull)**
 for PR 13770 at commit 
[`07b6863`](https://github.com/apache/spark/commit/07b68630247597a679dee01483a09cf7176d5724).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13776: [SPARK-16050][Tests]Remove the flaky test: ConsoleSinkSu...

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13776
  
**[Test build #60829 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60829/consoleFull)**
 for PR 13776 at commit 
[`6e136ff`](https://github.com/apache/spark/commit/6e136ff97ee47838cd15137f85747d61d2e148b2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2016-06-19 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r67633556
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -590,6 +592,53 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   }
 
   /**
+   * Create a [[CreateMacroCommand]] command.
+   *
+   * For example:
+   * {{{
+   *   CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) 
expression;
+   * }}}
+   */
+  override def visitCreateMacro(ctx: CreateMacroContext): LogicalPlan = 
withOrigin(ctx) {
+val arguments = Option(ctx.colTypeList).map(visitColTypeList(_))
+  .getOrElse(Seq.empty[StructField]).map { col =>
+  AttributeReference(col.name, col.dataType, col.nullable, 
col.metadata)() }
+val colToIndex: Map[String, Int] = 
arguments.map(_.name).zipWithIndex.toMap
+if (colToIndex.size != arguments.size) {
+  throw operationNotAllowed(
+s"Cannot support duplicate colNames for CREATE TEMPORARY MACRO ", 
ctx)
+}
+val macroFunction = expression(ctx.expression).transformUp {
--- End diff --

Ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2016-06-19 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r67633550
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -590,6 +592,53 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   }
 
   /**
+   * Create a [[CreateMacroCommand]] command.
+   *
+   * For example:
+   * {{{
+   *   CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) 
expression;
+   * }}}
+   */
+  override def visitCreateMacro(ctx: CreateMacroContext): LogicalPlan = 
withOrigin(ctx) {
+val arguments = Option(ctx.colTypeList).map(visitColTypeList(_))
+  .getOrElse(Seq.empty[StructField]).map { col =>
+  AttributeReference(col.name, col.dataType, col.nullable, 
col.metadata)() }
+val colToIndex: Map[String, Int] = 
arguments.map(_.name).zipWithIndex.toMap
--- End diff --

Move this into the `CreateMacroCommand ` command. This would also be 
relevant if we were to offer a different API for creating macro's.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13776: [SPARK-16050][Tests]Remove the flaky test: Consol...

2016-06-19 Thread zsxwing

GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/13776

[SPARK-16050][Tests]Remove the flaky test: ConsoleSinkSuite

## What changes were proposed in this pull request?

ConsoleSinkSuite just collects content from stdout and compare them with 
the expected string. However, because Spark may not stop some background 
threads at once, there is a race condition that other threads are outputting 
logs while ConsoleSinkSuite is running. Then it will make ConsoleSinkSuite fail.

Therefore, I just deleted `ConsoleSinkSuite`. If we want to test 
ConsoleSinkSuite in future, we should refactoring ConsoleSink to make it 
testable instead of depending on stdout, 

## How was this patch tested?

Just removed a flaky test.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark SPARK-16050

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13776.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13776


commit 6e136ff97ee47838cd15137f85747d61d2e148b2
Author: Shixiong Zhu 
Date:   2016-06-20T03:42:05Z

Remove the flaky test: ConsoleSinkSuite




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13776: [SPARK-16050][Tests]Remove the flaky test: ConsoleSinkSu...

2016-06-19 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/13776
  
/cc @marmbrus @brkyvz 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...

2016-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13770
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...

2016-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13770
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60824/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13770
  
**[Test build #60824 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60824/consoleFull)**
 for PR 13770 at commit 
[`6a6fbc0`](https://github.com/apache/spark/commit/6a6fbc0fc27d381d5f1220630b207194f08676ac).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13775
  
**[Test build #60828 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60828/consoleFull)**
 for PR 13775 at commit 
[`20b832e`](https://github.com/apache/spark/commit/20b832ee4e5ed4e794cc1bc8f2f67cce973759e0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13775
  
**[Test build #60827 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60827/consoleFull)**
 for PR 13775 at commit 
[`7e7bb6c`](https://github.com/apache/spark/commit/7e7bb6c57860187f391f66ca82cdd715d0b2be43).
 * This patch **fails RAT tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader

2016-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13775
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60827/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader

2016-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13775
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13775
  
**[Test build #60827 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60827/consoleFull)**
 for PR 13775 at commit 
[`7e7bb6c`](https://github.com/apache/spark/commit/7e7bb6c57860187f391f66ca82cdd715d0b2be43).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13775: [SPARK-16060][SQL] Vectorized Orc reader

2016-06-19 Thread viirya

GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/13775

[SPARK-16060][SQL] Vectorized Orc reader

## What changes were proposed in this pull request?

Currently Orc reader in Spark SQL doesn't support vectorized reading. As 
Hive Orc already support vectorization, we can add this support to improve Orc 
reading performance.

### Benchmark

Benchmark code:

test("Benchmark for Orc") {
  val N = 500 << 12
withOrcTable((0 until N).map(i => (i, i.toString, i.toLong, 
i.toDouble)), "t") {
  val benchmark = new Benchmark("Orc reader", N)
  benchmark.addCase("reading Orc file", 10) { iter =>
sql("SELECT * FROM t").collect()
  }
  benchmark.run()
  }
}

Before this patch:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_71-b15 on Linux 
3.19.0-25-generic
Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
Orc reader:  Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


reading Orc file  4750 / 5266  0.4  
  2319.1   1.0X

After this patch:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_71-b15 on Linux 
3.19.0-25-generic
Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
Orc reader:  Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


reading Orc file  3550 / 3824  0.6  
  1733.2   1.0X



## How was this patch tested?
Existing tests.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 vectorized-orc-reader3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13775.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13775


commit 2861ac2a5136c065ec38cfc24bf9f979d5b7ae07
Author: Liang-Chi Hsieh 
Date:   2016-06-16T02:31:23Z

Add vectorized Orc reader support.

commit eee8eca70920d624becb43c8510d217ce4d9820b
Author: Liang-Chi Hsieh 
Date:   2016-06-17T09:44:11Z

import.

commit b753d09e3e369fc91a17d9632123dbe40d7d9dfb
Author: Liang-Chi Hsieh 
Date:   2016-06-18T10:00:00Z

If column is repeating, always using row id 0.

commit 7d26f5ed785269299b324df8bfc1c64c2d4a2b48
Author: Liang-Chi Hsieh 
Date:   2016-06-19T04:16:49Z

Fix bugs of getBinary and numFields.

commit 74fe936e522a827384461e445b9ba44f96ce29fe
Author: Liang-Chi Hsieh 
Date:   2016-06-20T02:44:07Z

Remove unnecessary change.

commit 7e7bb6c57860187f391f66ca82cdd715d0b2be43
Author: Liang-Chi Hsieh 
Date:   2016-06-20T02:48:11Z

Remove unnecessary change.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13772: [SPARK-16049][SQL] Make InsertIntoTable's expectedColumn...

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13772
  
**[Test build #60826 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60826/consoleFull)**
 for PR 13772 at commit 
[`a472cc2`](https://github.com/apache/spark/commit/a472cc2a2f87ad1404b49ae2c2c75a769db6fc18).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13196: [SPARK-15395][Core]Use getHostString to create RpcAddres...

2016-06-19 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/13196
  
@zzcclp see https://issues.apache.org/jira/browse/SPARK-16017


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13631: [SPARK-15911][SQL] Remove the additional Project to be c...

2016-06-19 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13631
  
ping @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13196: [SPARK-15395][Core]Use getHostString to create RpcAddres...

2016-06-19 Thread zzcclp

Github user zzcclp commented on the issue:

https://github.com/apache/spark/pull/13196
  
@zsxwing , why does this pr be reverted in branch-1.6?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13770
  
**[Test build #60825 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60825/consoleFull)**
 for PR 13770 at commit 
[`07b6863`](https://github.com/apache/spark/commit/07b68630247597a679dee01483a09cf7176d5724).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...

2016-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13770
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60823/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...

2016-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13770
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13770
  
**[Test build #60823 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60823/consoleFull)**
 for PR 13770 at commit 
[`e0c65d8`](https://github.com/apache/spark/commit/e0c65d82914dab9b6c0b855485c29ec010951a27).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13770
  
**[Test build #60824 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60824/consoleFull)**
 for PR 13770 at commit 
[`6a6fbc0`](https://github.com/apache/spark/commit/6a6fbc0fc27d381d5f1220630b207194f08676ac).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13766: [SPARK-16036][SPARK-16037][SPARK-16034][SQL] Follow up c...

2016-06-19 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13766
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13770
  
**[Test build #60823 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60823/consoleFull)**
 for PR 13770 at commit 
[`e0c65d8`](https://github.com/apache/spark/commit/e0c65d82914dab9b6c0b855485c29ec010951a27).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...

2016-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13770
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60822/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...

2016-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13770
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13770
  
**[Test build #60822 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60822/consoleFull)**
 for PR 13770 at commit 
[`a86151c`](https://github.com/apache/spark/commit/a86151c6ce75412cfe880f188af17d0e9489eefb).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13770
  
**[Test build #60822 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60822/consoleFull)**
 for PR 13770 at commit 
[`a86151c`](https://github.com/apache/spark/commit/a86151c6ce75412cfe880f188af17d0e9489eefb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...

2016-06-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/13770
  
Maybe I should move all the JDBC related API misuse issues into this PR. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13748: [SPARK-16031] Add debug-only socket source in Structured...

2016-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13748
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60819/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13748: [SPARK-16031] Add debug-only socket source in Structured...

2016-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13748
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13748: [SPARK-16031] Add debug-only socket source in Structured...

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13748
  
**[Test build #60819 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60819/consoleFull)**
 for PR 13748 at commit 
[`e61adfd`](https://github.com/apache/spark/commit/e61adfd81cbac57ed3c9ceae958b46f7f1943393).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class TextSocketStreamSuite extends StreamTest with SharedSQLContext 
with BeforeAndAfterEach `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13773: [SPARK-16056] [SPARK-16057] [SPARK-16058] [SQL] Fix Mult...

2016-06-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/13773
  
@rxin @liancheng @clockfly @yhuai Could you please also review this PR? I 
found all of you recently reviewed the JDBC-related PRs. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13773: [SPARK-16056] [SPARK-16057] [SPARK-16058] [SQL] Fix Mult...

2016-06-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/13773
  
@srowen Will submit more PRs about `JDBC`. The interface of 
`DataFrameReader` and `DataFrameWriter` are not designed for `JDBC` data 
sources. For Spark SQL beginners, they might hit various strange errors.

Anyway, will try to create less JIRAs, but, to be honest, in my previous 
team, the JIRA-like defect tracking system is used to record the defects. We 
always create multiple defects when they have different external impacts. It is 
very bad for us to combine multiple issues into the same one. When each fixpack 
or release is published, our customers, L2 and L3 might use it to know what are 
included in the specific fixpack. Below is an example: 
http://www-01.ibm.com/support/docview.wss?uid=swg21633303 There are a long 
list. In Spark, all the JDBC related JIRAs can be classified into the same 
group, but we should not combine multiple defects into the same one. In my 
previous team, we always have to provide very clear titles for each 
JIRA/defect. Users might not be patient to click the link to read the details. 
I think the same logics is also applicable to Spark. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12675: [SPARK-14894][PySpark] Add result summary api to Gaussia...

2016-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12675
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60821/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12675: [SPARK-14894][PySpark] Add result summary api to Gaussia...

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12675
  
**[Test build #60821 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60821/consoleFull)**
 for PR 12675 at commit 
[`3bc75a3`](https://github.com/apache/spark/commit/3bc75a3f413d1c4bdfd774cafbd1034ef50d216c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12675: [SPARK-14894][PySpark] Add result summary api to Gaussia...

2016-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12675
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13760: [SPARK-16012][SparkR] GapplyCollect - applies a R functi...

2016-06-19 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/13760
  
Thanks @NarineK -- cc @sun-rui for review


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12675: [SPARK-14894][PySpark] Add result summary api to Gaussia...

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12675
  
**[Test build #60821 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60821/consoleFull)**
 for PR 12675 at commit 
[`3bc75a3`](https://github.com/apache/spark/commit/3bc75a3f413d1c4bdfd774cafbd1034ef50d216c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13760: [SPARK-16012][SparkR] GapplyCollect - applies a R functi...

2016-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13760
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60818/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13760: [SPARK-16012][SparkR] GapplyCollect - applies a R functi...

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13760
  
**[Test build #60818 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60818/consoleFull)**
 for PR 13760 at commit 
[`11c7cd6`](https://github.com/apache/spark/commit/11c7cd6d4bcbff86492e4e996f3317d98bf64901).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13760: [SPARK-16012][SparkR] GapplyCollect - applies a R functi...

2016-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13760
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12675: [SPARK-14894][PySpark] Add result summary api to Gaussia...

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12675
  
**[Test build #60820 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60820/consoleFull)**
 for PR 12675 at commit 
[`c2b1aef`](https://github.com/apache/spark/commit/c2b1aef45d98110c263f8ae53e6402871724b8d2).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12675: [SPARK-14894][PySpark] Add result summary api to Gaussia...

2016-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12675
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60820/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12675: [SPARK-14894][PySpark] Add result summary api to Gaussia...

2016-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12675
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12675: [SPARK-14894][PySpark] Add result summary api to Gaussia...

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12675
  
**[Test build #60820 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60820/consoleFull)**
 for PR 12675 at commit 
[`c2b1aef`](https://github.com/apache/spark/commit/c2b1aef45d98110c263f8ae53e6402871724b8d2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13748: [SPARK-16031] Add debug-only socket source in Structured...

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13748
  
**[Test build #60819 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60819/consoleFull)**
 for PR 13748 at commit 
[`e61adfd`](https://github.com/apache/spark/commit/e61adfd81cbac57ed3c9ceae958b46f7f1943393).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13737: [SPARK-15954][SQL][PySpark][TEST] Fix TestHiveContext in...

2016-06-19 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13737
  
Why does Python need to load these test resources? I think the proper fix 
is to get rid of that dependency. Otherwise we are making the test harness more 
and more complicated and tighter coupling.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-19 Thread NarineK

Github user NarineK commented on the issue:

https://github.com/apache/spark/pull/12836
  
Thanks for the quick response. I'll create one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13760: [SPARK-16012][SparkR] GapplyCollect - applies a R functi...

2016-06-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13760
  
**[Test build #60818 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60818/consoleFull)**
 for PR 13760 at commit 
[`11c7cd6`](https://github.com/apache/spark/commit/11c7cd6d4bcbff86492e4e996f3317d98bf64901).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-19 Thread vectorijk

Github user vectorijk commented on the issue:

https://github.com/apache/spark/pull/12836
  
@NarineK I am not quite sure. Maybe you could create a new JIRA for 
gapply's programming guide.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 245 matches

Mail list logo