[GitHub] spark issue #10292: SPARK-11882: Custom scheduler support
Github user cerisier commented on the issue: https://github.com/apache/spark/pull/10292 This is pure awesome. Any chance of this being revisited someday ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13769: [SPARK-16030] [SQL] Allow specifying static parti...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13769#discussion_r67638513 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -313,13 +313,32 @@ trait CheckAnalysis extends PredicateHelper { |${s.catalogTable.identifier} """.stripMargin) + // TODO: We need to consolidate this kind of checks for InsertIntoTable + // with the rule of PreWriteCheck defined in extendedCheckRules. case InsertIntoTable(s: SimpleCatalogRelation, _, _, _, _) => failAnalysis( s""" |Hive support is required to insert into the following tables: |${s.catalogTable.identifier} """.stripMargin) + case InsertIntoTable(t, _, _, _, _) --- End diff -- Why do we move these checks from `PreWriteCheck` to here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13769: [SPARK-16030] [SQL] Allow specifying static parti...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13769#discussion_r67638318 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -43,8 +43,127 @@ import org.apache.spark.unsafe.types.UTF8String * Replaces generic operations with specific variants that are designed to work with Spark * SQL Data Sources. */ -private[sql] object DataSourceAnalysis extends Rule[LogicalPlan] { +private[sql] case class DataSourceAnalysis(conf: CatalystConf) extends Rule[LogicalPlan] { + + def resolver: Resolver = { +if (conf.caseSensitiveAnalysis) { + caseSensitiveResolution +} else { + caseInsensitiveResolution +} + } + + // The access modifier is used to expose this method to tests. + private[sql] def convertStaticPartitions( +sourceAttributes: Seq[Attribute], +providedPartitions: Map[String, Option[String]], +targetAttributes: Seq[Attribute], +targetPartitionSchema: StructType): Seq[NamedExpression] = { + +assert(providedPartitions.exists(_._2.isDefined)) + +val staticPartitions = providedPartitions.flatMap { + case (partKey, Some(partValue)) => (partKey, partValue) :: Nil + case (_, None) => Nil +} + +// The sum of the number of static partition columns and columns provided in the SELECT +// clause needs to match the number of columns of the target table. +if (staticPartitions.size + sourceAttributes.size != targetAttributes.size) { --- End diff -- in `PreprocessTableInsertion` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13769: [SPARK-16030] [SQL] Allow specifying static parti...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13769#discussion_r67638211 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -43,8 +43,127 @@ import org.apache.spark.unsafe.types.UTF8String * Replaces generic operations with specific variants that are designed to work with Spark * SQL Data Sources. */ -private[sql] object DataSourceAnalysis extends Rule[LogicalPlan] { +private[sql] case class DataSourceAnalysis(conf: CatalystConf) extends Rule[LogicalPlan] { + + def resolver: Resolver = { +if (conf.caseSensitiveAnalysis) { + caseSensitiveResolution +} else { + caseInsensitiveResolution +} + } + + // The access modifier is used to expose this method to tests. + private[sql] def convertStaticPartitions( +sourceAttributes: Seq[Attribute], +providedPartitions: Map[String, Option[String]], +targetAttributes: Seq[Attribute], +targetPartitionSchema: StructType): Seq[NamedExpression] = { + +assert(providedPartitions.exists(_._2.isDefined)) + +val staticPartitions = providedPartitions.flatMap { + case (partKey, Some(partValue)) => (partKey, partValue) :: Nil + case (_, None) => Nil +} + +// The sum of the number of static partition columns and columns provided in the SELECT +// clause needs to match the number of columns of the target table. +if (staticPartitions.size + sourceAttributes.size != targetAttributes.size) { --- End diff -- Looks like we already have this check somewhere? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13769: [SPARK-16030] [SQL] Allow specifying static partitions w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13769 **[Test build #60833 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60833/consoleFull)** for PR 13769 at commit [`ba9c04c`](https://github.com/apache/spark/commit/ba9c04cfe46680e5145859b086357f3ed1a76ff1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13770 **[Test build #60832 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60832/consoleFull)** for PR 13770 at commit [`cd794cd`](https://github.com/apache/spark/commit/cd794cdfc7867e792f3db09504773d450ca6f8a9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13761: [SPARK-12197] [SparkCore] Kryo & Avro - Support Schema R...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/13761 Don't `Dataset`s and `Encoder`s make this less relevant? What would be the use case here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13776: [SPARK-16050][Tests]Remove the flaky test: ConsoleSinkSu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13776 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60829/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13776: [SPARK-16050][Tests]Remove the flaky test: ConsoleSinkSu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13776 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13776: [SPARK-16050][Tests]Remove the flaky test: ConsoleSinkSu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13776 **[Test build #60829 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60829/consoleFull)** for PR 13776 at commit [`6e136ff`](https://github.com/apache/spark/commit/6e136ff97ee47838cd15137f85747d61d2e148b2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/13770 @rxin So far, I think we do not provide such a capability for table-level configuration. For `parquet`, the `DataFrameReader`'s option `mergeSchema` has a higher priority than the global configuration `spark.sql.hive.convertMetastoreParquet.mergeSchema`. However, I agree. We definitely should do it in the near future. Thus, let me remove this checking now. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13769: [SPARK-16030] [SQL] Allow specifying static parti...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/13769#discussion_r67637488 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -43,8 +43,128 @@ import org.apache.spark.unsafe.types.UTF8String * Replaces generic operations with specific variants that are designed to work with Spark * SQL Data Sources. */ -private[sql] object DataSourceAnalysis extends Rule[LogicalPlan] { +private[sql] case class DataSourceAnalysis(conf: CatalystConf) extends Rule[LogicalPlan] { + + def resolver: Resolver = { +if (conf.caseSensitiveAnalysis) { + caseSensitiveResolution +} else { + caseInsensitiveResolution +} + } + + // The access modifier is used to expose this method to tests. + private[sql] def convertStaticPartitions( +sourceAttributes: Seq[Attribute], +providedPartitions: Map[String, Option[String]], +targetAttributes: Seq[Attribute], +targetPartitionSchema: StructType): Seq[NamedExpression] = { + +assert(providedPartitions.exists(_._2.isDefined)) + +val staticPartitions = providedPartitions.flatMap { + case (partKey, Some(partValue)) => (partKey, partValue) :: Nil + case (_, None) => Nil +} + +// The sum of the number of static partition columns and columns provided in the SELECT +// clause needs to match the number of columns of the target table. +if (staticPartitions.size + sourceAttributes.size != targetAttributes.size) { + throw new AnalysisException( +s"The data to be inserted needs to have the same number of " + + s"columns as the target table: target table has ${targetAttributes.size} " + + s"column(s) but the inserted data has ${sourceAttributes.size + staticPartitions.size} " + + s"column(s), which contain ${staticPartitions.size} partition column(s) having " + + s"assigned constant values.") +} + +if (providedPartitions.size != targetPartitionSchema.fields.size) { + throw new AnalysisException( +s"The data to be inserted needs to have the same number of " + + s"partition columns as the target table: target table " + + s"has ${targetPartitionSchema.fields.size} partition column(s) but the inserted " + + s"data has ${providedPartitions.size} partition columns specified.") +} + +staticPartitions.foreach { + case (partKey, partValue) => +if (!targetPartitionSchema.fields.exists(field => resolver(field.name, partKey))) { + throw new AnalysisException( +s"$partKey is not a partition column. Partition columns are " + + s"${targetPartitionSchema.fields.map(_.name).mkString("[", ",", "]")}") +} +} + +val partitionList = targetPartitionSchema.fields.map { field => + val potentialSpecs = staticPartitions.filter { +case (partKey, partValue) => resolver(field.name, partKey) + } + if (potentialSpecs.size == 0) { +None + } else if (potentialSpecs.size == 1) { +val partValue = potentialSpecs.head._2 +Some(Alias(Cast(Literal(partValue), field.dataType), "_staticPart")()) + } else { +throw new AnalysisException( + s"Partition column ${field.name} have multiple values specified, " + +s"${potentialSpecs.mkString("[", ", ", "]")}. Please only specify a single value.") + } +} + +partitionList.sliding(2).foreach { v => --- End diff -- Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13676: [SPARK-15956] [SQL] When unwrapping ORC avoid pat...
Github user dafrista commented on a diff in the pull request: https://github.com/apache/spark/pull/13676#discussion_r67637381 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala --- @@ -479,8 +340,299 @@ private[hive] trait HiveInspectors { } /** - * Builds specific unwrappers ahead of time according to object inspector + * Builds unwrappers ahead of time according to object inspector * types to avoid pattern matching and branching costs per row. + * + * Strictly follows the following order in unwrapping (constant OI has the higher priority): + * Constant Null object inspector => + * return null + * Constant object inspector => + * extract the value from constant object inspector + * If object inspector prefers writable => + * extract writable from `data` and then get the catalyst type from the writable + * Extract the java object directly from the object inspector + * + * NOTICE: the complex data type requires recursive unwrapping. + * + * @param objectInspector the ObjectInspector used to create an unwrapper. + * @return A function that unwraps data objects. + * Use the overloaded HiveStructField version for in-place updating of a MutableRow. + */ + def unwrapperFor(objectInspector: ObjectInspector): Any => Any = +objectInspector match { + case coi: ConstantObjectInspector if coi.getWritableConstantValue == null => +data: Any => null + case poi: WritableConstantStringObjectInspector => +data: Any => + UTF8String.fromString(poi.getWritableConstantValue.toString) + case poi: WritableConstantHiveVarcharObjectInspector => +data: Any => + UTF8String.fromString(poi.getWritableConstantValue.getHiveVarchar.getValue) + case poi: WritableConstantHiveCharObjectInspector => +data: Any => + UTF8String.fromString(poi.getWritableConstantValue.getHiveChar.getValue) + case poi: WritableConstantHiveDecimalObjectInspector => +data: Any => + HiveShim.toCatalystDecimal( +PrimitiveObjectInspectorFactory.javaHiveDecimalObjectInspector, +poi.getWritableConstantValue.getHiveDecimal) + case poi: WritableConstantTimestampObjectInspector => +data: Any => { + val t = poi.getWritableConstantValue + t.getSeconds * 100L + t.getNanos / 1000L +} + case poi: WritableConstantIntObjectInspector => +data: Any => + poi.getWritableConstantValue.get() --- End diff -- You're right, as the contract for a `ConstantObjectInspector` is that its object "represent constant values and can return them without an evaluation" [[1](https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ConstantObjectInspector.java)]. I will make this change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13769: [SPARK-16030] [SQL] Allow specifying static parti...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13769#discussion_r67637303 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -43,8 +43,128 @@ import org.apache.spark.unsafe.types.UTF8String * Replaces generic operations with specific variants that are designed to work with Spark * SQL Data Sources. */ -private[sql] object DataSourceAnalysis extends Rule[LogicalPlan] { +private[sql] case class DataSourceAnalysis(conf: CatalystConf) extends Rule[LogicalPlan] { + + def resolver: Resolver = { +if (conf.caseSensitiveAnalysis) { + caseSensitiveResolution +} else { + caseInsensitiveResolution +} + } + + // The access modifier is used to expose this method to tests. + private[sql] def convertStaticPartitions( +sourceAttributes: Seq[Attribute], +providedPartitions: Map[String, Option[String]], +targetAttributes: Seq[Attribute], +targetPartitionSchema: StructType): Seq[NamedExpression] = { + +assert(providedPartitions.exists(_._2.isDefined)) + +val staticPartitions = providedPartitions.flatMap { + case (partKey, Some(partValue)) => (partKey, partValue) :: Nil + case (_, None) => Nil +} + +// The sum of the number of static partition columns and columns provided in the SELECT +// clause needs to match the number of columns of the target table. +if (staticPartitions.size + sourceAttributes.size != targetAttributes.size) { + throw new AnalysisException( +s"The data to be inserted needs to have the same number of " + + s"columns as the target table: target table has ${targetAttributes.size} " + + s"column(s) but the inserted data has ${sourceAttributes.size + staticPartitions.size} " + + s"column(s), which contain ${staticPartitions.size} partition column(s) having " + + s"assigned constant values.") +} + +if (providedPartitions.size != targetPartitionSchema.fields.size) { + throw new AnalysisException( +s"The data to be inserted needs to have the same number of " + + s"partition columns as the target table: target table " + + s"has ${targetPartitionSchema.fields.size} partition column(s) but the inserted " + + s"data has ${providedPartitions.size} partition columns specified.") +} + +staticPartitions.foreach { + case (partKey, partValue) => +if (!targetPartitionSchema.fields.exists(field => resolver(field.name, partKey))) { + throw new AnalysisException( +s"$partKey is not a partition column. Partition columns are " + + s"${targetPartitionSchema.fields.map(_.name).mkString("[", ",", "]")}") +} +} + +val partitionList = targetPartitionSchema.fields.map { field => + val potentialSpecs = staticPartitions.filter { +case (partKey, partValue) => resolver(field.name, partKey) + } + if (potentialSpecs.size == 0) { +None + } else if (potentialSpecs.size == 1) { +val partValue = potentialSpecs.head._2 +Some(Alias(Cast(Literal(partValue), field.dataType), "_staticPart")()) + } else { +throw new AnalysisException( + s"Partition column ${field.name} have multiple values specified, " + +s"${potentialSpecs.mkString("[", ", ", "]")}. Please only specify a single value.") + } +} + +partitionList.sliding(2).foreach { v => --- End diff -- We can use the following check instead: ```scala partitionList.dropWhile(_.isDefined).collectFirst { case Some(_) => throw new AnalysisException("...") } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13777: [SPARK-16061][SQL][Minor] The property "spark.sql...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/13777#discussion_r67637263 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala --- @@ -115,7 +115,7 @@ case class KeyRemoved(key: UnsafeRow) extends StoreUpdate */ private[sql] object StateStore extends Logging { - val MAINTENANCE_INTERVAL_CONFIG = "spark.streaming.stateStore.maintenanceInterval" + val MAINTENANCE_INTERVAL_CONFIG = "spark.sql.streaming.stateStore.maintenanceInterval" --- End diff -- +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/13770 hm - what if i want to specify specific options when reading data from a table? e.g. whether to use the vectorized reader or not? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13676: [SPARK-15956] [SQL] When unwrapping ORC avoid pattern ma...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/13676 This looks pretty good. What I am thinking is that generating an encoder could create another nice performance speedup here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13769: [SPARK-16030] [SQL] Allow specifying static parti...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13769#discussion_r67637019 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -43,8 +43,128 @@ import org.apache.spark.unsafe.types.UTF8String * Replaces generic operations with specific variants that are designed to work with Spark * SQL Data Sources. */ -private[sql] object DataSourceAnalysis extends Rule[LogicalPlan] { +private[sql] case class DataSourceAnalysis(conf: CatalystConf) extends Rule[LogicalPlan] { + + def resolver: Resolver = { +if (conf.caseSensitiveAnalysis) { + caseSensitiveResolution +} else { + caseInsensitiveResolution +} + } + + // The access modifier is used to expose this method to tests. + private[sql] def convertStaticPartitions( +sourceAttributes: Seq[Attribute], +providedPartitions: Map[String, Option[String]], +targetAttributes: Seq[Attribute], +targetPartitionSchema: StructType): Seq[NamedExpression] = { + +assert(providedPartitions.exists(_._2.isDefined)) + +val staticPartitions = providedPartitions.flatMap { + case (partKey, Some(partValue)) => (partKey, partValue) :: Nil + case (_, None) => Nil +} + +// The sum of the number of static partition columns and columns provided in the SELECT +// clause needs to match the number of columns of the target table. +if (staticPartitions.size + sourceAttributes.size != targetAttributes.size) { + throw new AnalysisException( +s"The data to be inserted needs to have the same number of " + + s"columns as the target table: target table has ${targetAttributes.size} " + + s"column(s) but the inserted data has ${sourceAttributes.size + staticPartitions.size} " + + s"column(s), which contain ${staticPartitions.size} partition column(s) having " + + s"assigned constant values.") +} + +if (providedPartitions.size != targetPartitionSchema.fields.size) { + throw new AnalysisException( +s"The data to be inserted needs to have the same number of " + + s"partition columns as the target table: target table " + + s"has ${targetPartitionSchema.fields.size} partition column(s) but the inserted " + + s"data has ${providedPartitions.size} partition columns specified.") +} + +staticPartitions.foreach { + case (partKey, partValue) => +if (!targetPartitionSchema.fields.exists(field => resolver(field.name, partKey))) { + throw new AnalysisException( +s"$partKey is not a partition column. Partition columns are " + + s"${targetPartitionSchema.fields.map(_.name).mkString("[", ",", "]")}") +} +} + +val partitionList = targetPartitionSchema.fields.map { field => + val potentialSpecs = staticPartitions.filter { +case (partKey, partValue) => resolver(field.name, partKey) + } + if (potentialSpecs.size == 0) { +None + } else if (potentialSpecs.size == 1) { +val partValue = potentialSpecs.head._2 +Some(Alias(Cast(Literal(partValue), field.dataType), "_staticPart")()) + } else { +throw new AnalysisException( + s"Partition column ${field.name} have multiple values specified, " + +s"${potentialSpecs.mkString("[", ", ", "]")}. Please only specify a single value.") + } +} + +partitionList.sliding(2).foreach { v => --- End diff -- `sliding(2)` can be dangerous for single-element collections: ``` scala> Seq(1).sliding(2).foreach(println) List(1) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13777: [SPARK-16061][SQL][Minor] The property "spark.sql...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13777#discussion_r67636827 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala --- @@ -115,7 +115,7 @@ case class KeyRemoved(key: UnsafeRow) extends StoreUpdate */ private[sql] object StateStore extends Logging { - val MAINTENANCE_INTERVAL_CONFIG = "spark.streaming.stateStore.maintenanceInterval" + val MAINTENANCE_INTERVAL_CONFIG = "spark.sql.streaming.stateStore.maintenanceInterval" --- End diff -- @tdas shouldn't this property be registered in `SQLConf`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r67636641 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -590,6 +592,53 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { } /** + * Create a [[CreateMacroCommand]] command. + * + * For example: + * {{{ + * CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression; + * }}} + */ + override def visitCreateMacro(ctx: CreateMacroContext): LogicalPlan = withOrigin(ctx) { +val arguments = Option(ctx.colTypeList).map(visitColTypeList(_)) + .getOrElse(Seq.empty[StructField]).map { col => + AttributeReference(col.name, col.dataType, col.nullable, col.metadata)() } +val colToIndex: Map[String, Int] = arguments.map(_.name).zipWithIndex.toMap --- End diff -- Ah, I see. You could also move this code into the companion object of the `CreateMacroCommand`. That woud also work. It is just that this code isn't parser specific. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r67636452 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/macros.scala --- @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command + +import org.apache.spark.sql.{AnalysisException, Row, SparkSession} +import org.apache.spark.sql.catalyst.expressions._ + +/** + * The DDL command that creates a macro. + * To create a temporary macro, the syntax of using this command in SQL is: + * {{{ + *CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression; + * }}} + */ +case class CreateMacroCommand( +macroName: String, +columns: Seq[AttributeReference], +macroFunction: Expression) + extends RunnableCommand { + + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +val macroInfo = columns.mkString(",") + " -> " + macroFunction.toString +val info = new ExpressionInfo(macroInfo, macroName) +val builder = (children: Seq[Expression]) => { + if (children.size != columns.size) { +throw new AnalysisException(s"Actual number of columns: ${children.size} != " + + s"expected number of columns: ${columns.size} for Macro $macroName") + } + macroFunction.transformUp { +case b: BoundReference => children(b.ordinal) --- End diff -- Ok that is perfect. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13775 @hvanhovell @rxin Got it. Thanks! I will re-run the benchmark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13676: [SPARK-15956] [SQL] When unwrapping ORC avoid pat...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13676#discussion_r67636421 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala --- @@ -479,8 +340,299 @@ private[hive] trait HiveInspectors { } /** - * Builds specific unwrappers ahead of time according to object inspector + * Builds unwrappers ahead of time according to object inspector * types to avoid pattern matching and branching costs per row. + * + * Strictly follows the following order in unwrapping (constant OI has the higher priority): + * Constant Null object inspector => + * return null + * Constant object inspector => + * extract the value from constant object inspector + * If object inspector prefers writable => + * extract writable from `data` and then get the catalyst type from the writable + * Extract the java object directly from the object inspector + * + * NOTICE: the complex data type requires recursive unwrapping. + * + * @param objectInspector the ObjectInspector used to create an unwrapper. + * @return A function that unwraps data objects. + * Use the overloaded HiveStructField version for in-place updating of a MutableRow. + */ + def unwrapperFor(objectInspector: ObjectInspector): Any => Any = +objectInspector match { + case coi: ConstantObjectInspector if coi.getWritableConstantValue == null => +data: Any => null + case poi: WritableConstantStringObjectInspector => +data: Any => + UTF8String.fromString(poi.getWritableConstantValue.toString) + case poi: WritableConstantHiveVarcharObjectInspector => +data: Any => + UTF8String.fromString(poi.getWritableConstantValue.getHiveVarchar.getValue) + case poi: WritableConstantHiveCharObjectInspector => +data: Any => + UTF8String.fromString(poi.getWritableConstantValue.getHiveChar.getValue) + case poi: WritableConstantHiveDecimalObjectInspector => +data: Any => + HiveShim.toCatalystDecimal( +PrimitiveObjectInspectorFactory.javaHiveDecimalObjectInspector, +poi.getWritableConstantValue.getHiveDecimal) + case poi: WritableConstantTimestampObjectInspector => +data: Any => { + val t = poi.getWritableConstantValue + t.getSeconds * 100L + t.getNanos / 1000L +} + case poi: WritableConstantIntObjectInspector => +data: Any => + poi.getWritableConstantValue.get() --- End diff -- Isn't it faster to call `poi.getWritableConstantValue.get()` outside of the function? And use the result in the function? Or am I missing something here? The same goes for all other constants. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13775 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13775 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60828/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13775 **[Test build #60828 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60828/consoleFull)** for PR 13775 at commit [`20b832e`](https://github.com/apache/spark/commit/20b832ee4e5ed4e794cc1bc8f2f67cce973759e0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13769: [SPARK-16030] [SQL] Allow specifying static partitions w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13769 **[Test build #60831 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60831/consoleFull)** for PR 13769 at commit [`6bd7b6f`](https://github.com/apache/spark/commit/6bd7b6fb992af794216fa7752d25747d64d82280). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r67635961 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -590,6 +592,53 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { } /** + * Create a [[CreateMacroCommand]] command. + * + * For example: + * {{{ + * CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression; + * }}} + */ + override def visitCreateMacro(ctx: CreateMacroContext): LogicalPlan = withOrigin(ctx) { +val arguments = Option(ctx.colTypeList).map(visitColTypeList(_)) + .getOrElse(Seq.empty[StructField]).map { col => + AttributeReference(col.name, col.dataType, col.nullable, col.metadata)() } +val colToIndex: Map[String, Int] = arguments.map(_.name).zipWithIndex.toMap --- End diff -- @hvanhovell So i think i will create a new Wrapper class to avoid unresolved exception in order to DataFrame can reuse this feature later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13766: [SPARK-16036][SPARK-16037][SPARK-16034][SQL] Foll...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13766 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r67635827 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -590,6 +592,53 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { } /** + * Create a [[CreateMacroCommand]] command. + * + * For example: + * {{{ + * CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression; + * }}} + */ + override def visitCreateMacro(ctx: CreateMacroContext): LogicalPlan = withOrigin(ctx) { +val arguments = Option(ctx.colTypeList).map(visitColTypeList(_)) + .getOrElse(Seq.empty[StructField]).map { col => + AttributeReference(col.name, col.dataType, col.nullable, col.metadata)() } +val colToIndex: Map[String, Int] = arguments.map(_.name).zipWithIndex.toMap --- End diff -- @hvanhovell So i think i will create a new Wrapper class to avoid unresolved exception in order to DataFrame can reuse this feature later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13766: [SPARK-16036][SPARK-16037][SPARK-16034][SQL] Follow up c...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/13766 Thanks. I am merging this to master and branch 2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r67635570 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/macros.scala --- @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command + +import org.apache.spark.sql.{AnalysisException, Row, SparkSession} +import org.apache.spark.sql.catalyst.expressions._ + +/** + * The DDL command that creates a macro. + * To create a temporary macro, the syntax of using this command in SQL is: + * {{{ + *CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression; + * }}} + */ +case class CreateMacroCommand( +macroName: String, +columns: Seq[AttributeReference], +macroFunction: Expression) + extends RunnableCommand { + + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +val macroInfo = columns.mkString(",") + " -> " + macroFunction.toString +val info = new ExpressionInfo(macroInfo, macroName) +val builder = (children: Seq[Expression]) => { + if (children.size != columns.size) { +throw new AnalysisException(s"Actual number of columns: ${children.size} != " + + s"expected number of columns: ${columns.size} for Macro $macroName") + } + macroFunction.transformUp { +case b: BoundReference => children(b.ordinal) --- End diff -- @hvanhovell good points. Because Analyzer will check expression's checkInputDataTypes after ResolveFunctions, I think we do not validate input type here. Now i do not think it has benefits if we did casts, but it maybe cause unnecessary casts. I will add some comments for it. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13772: [SPARK-16049][SQL] Make InsertIntoTable's expectedColumn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13772 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60826/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13772: [SPARK-16049][SQL] Make InsertIntoTable's expectedColumn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13772 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13772: [SPARK-16049][SQL] Make InsertIntoTable's expectedColumn...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13772 **[Test build #60826 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60826/consoleFull)** for PR 13772 at commit [`a472cc2`](https://github.com/apache/spark/commit/a472cc2a2f87ad1404b49ae2c2c75a769db6fc18). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13777: [SPARK-16061][SQL][Minor] The property "spark.sql.stateS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13777 **[Test build #60830 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60830/consoleFull)** for PR 13777 at commit [`72e8c80`](https://github.com/apache/spark/commit/72e8c809a90e718b10205820fd590b8a0041bdc8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13748: [SPARK-16031] Add debug-only socket source in Str...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13748 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13777: [SPARK-16061][SQL] The property "spark.sql.stateS...
GitHub user sarutak opened a pull request: https://github.com/apache/spark/pull/13777 [SPARK-16061][SQL] The property "spark.sql.stateStore.maintenanceInterval" should be renamed to "spark.streaming.stateStore.maintenanceInterval" ## What changes were proposed in this pull request? The property spark.streaming.stateStore.maintenanceInterval should be renamed and harmonized with other properties related to Structured Streaming like spark.sql.streaming.stateStore.minDeltasForSnapshot. ## How was this patch tested? Existing unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sarutak/spark SPARK-16061 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13777.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13777 commit 72e8c809a90e718b10205820fd590b8a0041bdc8 Author: Kousuke SarutaDate: 2016-06-20T04:17:16Z Renamed spark.streaming.stateStore.maintenanceInterval to spark.sql.stateStore.maintenanceInterval --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader
Github user rxin commented on the issue: https://github.com/apache/spark/pull/13775 @viirya when you construct a performance benchmark, you would want to minimize the overhead of things outside the code path you are testing. In this case, a lot of the time were spent in the collect operation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13748: [SPARK-16031] Add debug-only socket source in Structured...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/13748 LGTM - merging in master/2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/13775 Would PR https://github.com/apache/spark/pull/13676 help to improve performance? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/13775 @viirya could you re-run the benchmarks without calling collect(). Do a count or a simple aggregate instead, collect spends a tonne of time in serializing results from `InternalRow` to `Row`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r67634631 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/macros.scala --- @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command + +import org.apache.spark.sql.{AnalysisException, Row, SparkSession} +import org.apache.spark.sql.catalyst.expressions._ + +/** + * The DDL command that creates a macro. + * To create a temporary macro, the syntax of using this command in SQL is: + * {{{ + *CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression; + * }}} + */ +case class CreateMacroCommand( +macroName: String, +columns: Seq[AttributeReference], +macroFunction: Expression) + extends RunnableCommand { + + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +val macroInfo = columns.mkString(",") + " -> " + macroFunction.toString +val info = new ExpressionInfo(macroInfo, macroName) +val builder = (children: Seq[Expression]) => { + if (children.size != columns.size) { +throw new AnalysisException(s"Actual number of columns: ${children.size} != " + + s"expected number of columns: ${columns.size} for Macro $macroName") + } + macroFunction.transformUp { +case b: BoundReference => children(b.ordinal) --- End diff -- We do not validate the input type here. This would be entirely fine if macro arguments were defined without a `DataType`. I am not sure what we need to do here though. We have two options: - Ignore the DataType and rely on the expressions `inputTypes` to get casting done. This must be documented though. - Introduce casts to make sure the input conforms to the required input. What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r67634511 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -590,6 +592,53 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { } /** + * Create a [[CreateMacroCommand]] command. + * + * For example: + * {{{ + * CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression; + * }}} + */ + override def visitCreateMacro(ctx: CreateMacroContext): LogicalPlan = withOrigin(ctx) { +val arguments = Option(ctx.colTypeList).map(visitColTypeList(_)) + .getOrElse(Seq.empty[StructField]).map { col => + AttributeReference(col.name, col.dataType, col.nullable, col.metadata)() } +val colToIndex: Map[String, Int] = arguments.map(_.name).zipWithIndex.toMap --- End diff -- Why i do not to move this into the CreateMacroCommand? Because analyzer.checkAnalysis() will check if macroFunction of CreateMacroCommand is invalid. macroFunction has UnresolvedAttributes, So analyzer.checkAnalysis() will throw a unresolved exception. If it resolved UnresolvedAttributes before, analyzer.checkAnalysis() does not throw a exception. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13706: [SPARK-15988] [SQL] Implement DDL commands: Create/Drop ...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/13706 @lianhuiwang thanks for updating the PR. Could you implement the Macro removal by pattern matching on a (to be created) `MacroFunctionBuilder` class. I feel this is simpler, and doesn't touch as much of the API's. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13770 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60825/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13770 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13770 **[Test build #60825 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60825/consoleFull)** for PR 13770 at commit [`07b6863`](https://github.com/apache/spark/commit/07b68630247597a679dee01483a09cf7176d5724). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13776: [SPARK-16050][Tests]Remove the flaky test: ConsoleSinkSu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13776 **[Test build #60829 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60829/consoleFull)** for PR 13776 at commit [`6e136ff`](https://github.com/apache/spark/commit/6e136ff97ee47838cd15137f85747d61d2e148b2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r67633556 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -590,6 +592,53 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { } /** + * Create a [[CreateMacroCommand]] command. + * + * For example: + * {{{ + * CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression; + * }}} + */ + override def visitCreateMacro(ctx: CreateMacroContext): LogicalPlan = withOrigin(ctx) { +val arguments = Option(ctx.colTypeList).map(visitColTypeList(_)) + .getOrElse(Seq.empty[StructField]).map { col => + AttributeReference(col.name, col.dataType, col.nullable, col.metadata)() } +val colToIndex: Map[String, Int] = arguments.map(_.name).zipWithIndex.toMap +if (colToIndex.size != arguments.size) { + throw operationNotAllowed( +s"Cannot support duplicate colNames for CREATE TEMPORARY MACRO ", ctx) +} +val macroFunction = expression(ctx.expression).transformUp { --- End diff -- Ditto --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r67633550 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -590,6 +592,53 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { } /** + * Create a [[CreateMacroCommand]] command. + * + * For example: + * {{{ + * CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression; + * }}} + */ + override def visitCreateMacro(ctx: CreateMacroContext): LogicalPlan = withOrigin(ctx) { +val arguments = Option(ctx.colTypeList).map(visitColTypeList(_)) + .getOrElse(Seq.empty[StructField]).map { col => + AttributeReference(col.name, col.dataType, col.nullable, col.metadata)() } +val colToIndex: Map[String, Int] = arguments.map(_.name).zipWithIndex.toMap --- End diff -- Move this into the `CreateMacroCommand ` command. This would also be relevant if we were to offer a different API for creating macro's. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13776: [SPARK-16050][Tests]Remove the flaky test: Consol...
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/13776 [SPARK-16050][Tests]Remove the flaky test: ConsoleSinkSuite ## What changes were proposed in this pull request? ConsoleSinkSuite just collects content from stdout and compare them with the expected string. However, because Spark may not stop some background threads at once, there is a race condition that other threads are outputting logs while ConsoleSinkSuite is running. Then it will make ConsoleSinkSuite fail. Therefore, I just deleted `ConsoleSinkSuite`. If we want to test ConsoleSinkSuite in future, we should refactoring ConsoleSink to make it testable instead of depending on stdout, ## How was this patch tested? Just removed a flaky test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark SPARK-16050 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13776.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13776 commit 6e136ff97ee47838cd15137f85747d61d2e148b2 Author: Shixiong ZhuDate: 2016-06-20T03:42:05Z Remove the flaky test: ConsoleSinkSuite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13776: [SPARK-16050][Tests]Remove the flaky test: ConsoleSinkSu...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/13776 /cc @marmbrus @brkyvz --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13770 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13770 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60824/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13770 **[Test build #60824 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60824/consoleFull)** for PR 13770 at commit [`6a6fbc0`](https://github.com/apache/spark/commit/6a6fbc0fc27d381d5f1220630b207194f08676ac). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13775 **[Test build #60828 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60828/consoleFull)** for PR 13775 at commit [`20b832e`](https://github.com/apache/spark/commit/20b832ee4e5ed4e794cc1bc8f2f67cce973759e0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13775 **[Test build #60827 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60827/consoleFull)** for PR 13775 at commit [`7e7bb6c`](https://github.com/apache/spark/commit/7e7bb6c57860187f391f66ca82cdd715d0b2be43). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13775 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60827/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13775 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13775 **[Test build #60827 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60827/consoleFull)** for PR 13775 at commit [`7e7bb6c`](https://github.com/apache/spark/commit/7e7bb6c57860187f391f66ca82cdd715d0b2be43). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13775: [SPARK-16060][SQL] Vectorized Orc reader
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/13775 [SPARK-16060][SQL] Vectorized Orc reader ## What changes were proposed in this pull request? Currently Orc reader in Spark SQL doesn't support vectorized reading. As Hive Orc already support vectorization, we can add this support to improve Orc reading performance. ### Benchmark Benchmark code: test("Benchmark for Orc") { val N = 500 << 12 withOrcTable((0 until N).map(i => (i, i.toString, i.toLong, i.toDouble)), "t") { val benchmark = new Benchmark("Orc reader", N) benchmark.addCase("reading Orc file", 10) { iter => sql("SELECT * FROM t").collect() } benchmark.run() } } Before this patch: Java HotSpot(TM) 64-Bit Server VM 1.8.0_71-b15 on Linux 3.19.0-25-generic Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz Orc reader: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative reading Orc file 4750 / 5266 0.4 2319.1 1.0X After this patch: Java HotSpot(TM) 64-Bit Server VM 1.8.0_71-b15 on Linux 3.19.0-25-generic Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz Orc reader: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative reading Orc file 3550 / 3824 0.6 1733.2 1.0X ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 vectorized-orc-reader3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13775.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13775 commit 2861ac2a5136c065ec38cfc24bf9f979d5b7ae07 Author: Liang-Chi HsiehDate: 2016-06-16T02:31:23Z Add vectorized Orc reader support. commit eee8eca70920d624becb43c8510d217ce4d9820b Author: Liang-Chi Hsieh Date: 2016-06-17T09:44:11Z import. commit b753d09e3e369fc91a17d9632123dbe40d7d9dfb Author: Liang-Chi Hsieh Date: 2016-06-18T10:00:00Z If column is repeating, always using row id 0. commit 7d26f5ed785269299b324df8bfc1c64c2d4a2b48 Author: Liang-Chi Hsieh Date: 2016-06-19T04:16:49Z Fix bugs of getBinary and numFields. commit 74fe936e522a827384461e445b9ba44f96ce29fe Author: Liang-Chi Hsieh Date: 2016-06-20T02:44:07Z Remove unnecessary change. commit 7e7bb6c57860187f391f66ca82cdd715d0b2be43 Author: Liang-Chi Hsieh Date: 2016-06-20T02:48:11Z Remove unnecessary change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13772: [SPARK-16049][SQL] Make InsertIntoTable's expectedColumn...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13772 **[Test build #60826 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60826/consoleFull)** for PR 13772 at commit [`a472cc2`](https://github.com/apache/spark/commit/a472cc2a2f87ad1404b49ae2c2c75a769db6fc18). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13196: [SPARK-15395][Core]Use getHostString to create RpcAddres...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/13196 @zzcclp see https://issues.apache.org/jira/browse/SPARK-16017 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13631: [SPARK-15911][SQL] Remove the additional Project to be c...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13631 ping @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13196: [SPARK-15395][Core]Use getHostString to create RpcAddres...
Github user zzcclp commented on the issue: https://github.com/apache/spark/pull/13196 @zsxwing , why does this pr be reverted in branch-1.6? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13770 **[Test build #60825 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60825/consoleFull)** for PR 13770 at commit [`07b6863`](https://github.com/apache/spark/commit/07b68630247597a679dee01483a09cf7176d5724). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13770 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60823/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13770 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13770 **[Test build #60823 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60823/consoleFull)** for PR 13770 at commit [`e0c65d8`](https://github.com/apache/spark/commit/e0c65d82914dab9b6c0b855485c29ec010951a27). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13770 **[Test build #60824 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60824/consoleFull)** for PR 13770 at commit [`6a6fbc0`](https://github.com/apache/spark/commit/6a6fbc0fc27d381d5f1220630b207194f08676ac). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13766: [SPARK-16036][SPARK-16037][SPARK-16034][SQL] Follow up c...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13766 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13770 **[Test build #60823 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60823/consoleFull)** for PR 13770 at commit [`e0c65d8`](https://github.com/apache/spark/commit/e0c65d82914dab9b6c0b855485c29ec010951a27). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13770 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60822/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13770 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13770 **[Test build #60822 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60822/consoleFull)** for PR 13770 at commit [`a86151c`](https://github.com/apache/spark/commit/a86151c6ce75412cfe880f188af17d0e9489eefb). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13770 **[Test build #60822 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60822/consoleFull)** for PR 13770 at commit [`a86151c`](https://github.com/apache/spark/commit/a86151c6ce75412cfe880f188af17d0e9489eefb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13770: [SPARK-16054] [SQL] Verification of Multiple DataFrameRe...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/13770 Maybe I should move all the JDBC related API misuse issues into this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13748: [SPARK-16031] Add debug-only socket source in Structured...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13748 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60819/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13748: [SPARK-16031] Add debug-only socket source in Structured...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13748 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13748: [SPARK-16031] Add debug-only socket source in Structured...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13748 **[Test build #60819 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60819/consoleFull)** for PR 13748 at commit [`e61adfd`](https://github.com/apache/spark/commit/e61adfd81cbac57ed3c9ceae958b46f7f1943393). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class TextSocketStreamSuite extends StreamTest with SharedSQLContext with BeforeAndAfterEach ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13773: [SPARK-16056] [SPARK-16057] [SPARK-16058] [SQL] Fix Mult...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/13773 @rxin @liancheng @clockfly @yhuai Could you please also review this PR? I found all of you recently reviewed the JDBC-related PRs. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13773: [SPARK-16056] [SPARK-16057] [SPARK-16058] [SQL] Fix Mult...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/13773 @srowen Will submit more PRs about `JDBC`. The interface of `DataFrameReader` and `DataFrameWriter` are not designed for `JDBC` data sources. For Spark SQL beginners, they might hit various strange errors. Anyway, will try to create less JIRAs, but, to be honest, in my previous team, the JIRA-like defect tracking system is used to record the defects. We always create multiple defects when they have different external impacts. It is very bad for us to combine multiple issues into the same one. When each fixpack or release is published, our customers, L2 and L3 might use it to know what are included in the specific fixpack. Below is an example: http://www-01.ibm.com/support/docview.wss?uid=swg21633303 There are a long list. In Spark, all the JDBC related JIRAs can be classified into the same group, but we should not combine multiple defects into the same one. In my previous team, we always have to provide very clear titles for each JIRA/defect. Users might not be patient to click the link to read the details. I think the same logics is also applicable to Spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12675: [SPARK-14894][PySpark] Add result summary api to Gaussia...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12675 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60821/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12675: [SPARK-14894][PySpark] Add result summary api to Gaussia...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12675 **[Test build #60821 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60821/consoleFull)** for PR 12675 at commit [`3bc75a3`](https://github.com/apache/spark/commit/3bc75a3f413d1c4bdfd774cafbd1034ef50d216c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12675: [SPARK-14894][PySpark] Add result summary api to Gaussia...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12675 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13760: [SPARK-16012][SparkR] GapplyCollect - applies a R functi...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/13760 Thanks @NarineK -- cc @sun-rui for review --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12675: [SPARK-14894][PySpark] Add result summary api to Gaussia...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12675 **[Test build #60821 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60821/consoleFull)** for PR 12675 at commit [`3bc75a3`](https://github.com/apache/spark/commit/3bc75a3f413d1c4bdfd774cafbd1034ef50d216c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13760: [SPARK-16012][SparkR] GapplyCollect - applies a R functi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13760 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60818/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13760: [SPARK-16012][SparkR] GapplyCollect - applies a R functi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13760 **[Test build #60818 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60818/consoleFull)** for PR 13760 at commit [`11c7cd6`](https://github.com/apache/spark/commit/11c7cd6d4bcbff86492e4e996f3317d98bf64901). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13760: [SPARK-16012][SparkR] GapplyCollect - applies a R functi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13760 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12675: [SPARK-14894][PySpark] Add result summary api to Gaussia...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12675 **[Test build #60820 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60820/consoleFull)** for PR 12675 at commit [`c2b1aef`](https://github.com/apache/spark/commit/c2b1aef45d98110c263f8ae53e6402871724b8d2). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12675: [SPARK-14894][PySpark] Add result summary api to Gaussia...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12675 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60820/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12675: [SPARK-14894][PySpark] Add result summary api to Gaussia...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12675 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12675: [SPARK-14894][PySpark] Add result summary api to Gaussia...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12675 **[Test build #60820 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60820/consoleFull)** for PR 12675 at commit [`c2b1aef`](https://github.com/apache/spark/commit/c2b1aef45d98110c263f8ae53e6402871724b8d2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13748: [SPARK-16031] Add debug-only socket source in Structured...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13748 **[Test build #60819 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60819/consoleFull)** for PR 13748 at commit [`e61adfd`](https://github.com/apache/spark/commit/e61adfd81cbac57ed3c9ceae958b46f7f1943393). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13737: [SPARK-15954][SQL][PySpark][TEST] Fix TestHiveContext in...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/13737 Why does Python need to load these test resources? I think the proper fix is to get rid of that dependency. Otherwise we are making the test harness more and more complicated and tighter coupling. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/12836 Thanks for the quick response. I'll create one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13760: [SPARK-16012][SparkR] GapplyCollect - applies a R functi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13760 **[Test build #60818 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60818/consoleFull)** for PR 13760 at commit [`11c7cd6`](https://github.com/apache/spark/commit/11c7cd6d4bcbff86492e4e996f3317d98bf64901). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...
Github user vectorijk commented on the issue: https://github.com/apache/spark/pull/12836 @NarineK I am not quite sure. Maybe you could create a new JIRA for gapply's programming guide. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org