subject:"\[GitHub\] spark pull request #18064\: \[SPARK\-20213\]\[SQL\] Fix DataFrameWriter operations..."

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

2017-05-30 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18064


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

2017-05-26 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18064#discussion_r118723698
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -242,9 +242,12 @@ class Dataset[T] private[sql](
* @param vertical If set to true, prints output rows vertically (one 
line per column value).
--- End diff --

Seems to add a `@param` for `isInternal` is better?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

2017-05-26 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18064#discussion_r118723009
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -426,36 +418,35 @@ case class DataSource(
 // For partitioned relation r, r.schema's column ordering can be 
different from the column
 // ordering of data.logicalPlan (partition columns are all moved after 
data column).  This
 // will be adjusted within InsertIntoHadoopFsRelation.
-val plan =
-  InsertIntoHadoopFsRelationCommand(
-outputPath = outputPath,
-staticPartitions = Map.empty,
-ifPartitionNotExists = false,
-partitionColumns = partitionAttributes,
-bucketSpec = bucketSpec,
-fileFormat = format,
-options = options,
-query = data.logicalPlan,
-mode = mode,
-catalogTable = catalogTable,
-fileIndex = fileIndex)
-  sparkSession.sessionState.executePlan(plan).toRdd
+InsertIntoHadoopFsRelationCommand(
+  outputPath = outputPath,
+  staticPartitions = Map.empty,
+  ifPartitionNotExists = false,
+  partitionColumns = partitionColumns.map(UnresolvedAttribute.quoted),
--- End diff --

Ok. I see.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

2017-05-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18064#discussion_r118622212
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -965,14 +965,20 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
   }
 
   test("sanity test for SPARK-6618") {
-(1 to 100).par.map { i =>
--- End diff --

The weird thing is I can't reproduce it locally, and I'd say it's not 
introduced by this PR, as previously no execution id is set for commands.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

2017-05-25 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/18064#discussion_r118607514
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -965,14 +965,20 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
   }
 
   test("sanity test for SPARK-6618") {
-(1 to 100).par.map { i =>
--- End diff --

Oh, my hunch is we probably leak execution id  to some thread in a thread 
pool. It's better to fix it rather than changing the test to hide it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

2017-05-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18064#discussion_r118474770
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala ---
@@ -294,32 +294,31 @@ private[hive] class TestHiveSparkSession(
 "CREATE TABLE src1 (key INT, value STRING)".cmd,
 s"LOAD DATA LOCAL INPATH '${quoteHiveFile("data/files/kv3.txt")}' 
INTO TABLE src1".cmd),
   TestTable("srcpart", () => {
-sql(
-  "CREATE TABLE srcpart (key INT, value STRING) PARTITIONED BY (ds 
STRING, hr STRING)")
+"CREATE TABLE srcpart (key INT, value STRING) PARTITIONED BY (ds 
STRING, hr STRING)"
+  .cmd.apply()
--- End diff --

use `xxx.cmd.apply` instead of `sql("xx")` here, to avoid setting the 
execution id.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

2017-05-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18064#discussion_r118473002
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriter.scala
 ---
@@ -86,12 +86,10 @@ private[kafka010] object KafkaWriter extends Logging {
   topic: Option[String] = None): Unit = {
 val schema = queryExecution.analyzed.output
 validateQuery(queryExecution, kafkaParameters, topic)
-SQLExecution.withNewExecutionId(sparkSession, queryExecution) {
--- End diff --

`KafkaSink.addBath` is also wrapped with `SQLExecution.withNewExecutionId` 
in `StreamExecution`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

2017-05-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18064#discussion_r118468240
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -426,36 +418,35 @@ case class DataSource(
 // For partitioned relation r, r.schema's column ordering can be 
different from the column
 // ordering of data.logicalPlan (partition columns are all moved after 
data column).  This
 // will be adjusted within InsertIntoHadoopFsRelation.
-val plan =
-  InsertIntoHadoopFsRelationCommand(
-outputPath = outputPath,
-staticPartitions = Map.empty,
-ifPartitionNotExists = false,
-partitionColumns = partitionAttributes,
-bucketSpec = bucketSpec,
-fileFormat = format,
-options = options,
-query = data.logicalPlan,
-mode = mode,
-catalogTable = catalogTable,
-fileIndex = fileIndex)
-  sparkSession.sessionState.executePlan(plan).toRdd
+InsertIntoHadoopFsRelationCommand(
+  outputPath = outputPath,
+  staticPartitions = Map.empty,
+  ifPartitionNotExists = false,
+  partitionColumns = partitionColumns.map(UnresolvedAttribute.quoted),
--- End diff --

Analyzer will resolve them in `InsertIntoHadoopFsRelationCommand`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

2017-05-25 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18064#discussion_r118443224
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -426,36 +418,35 @@ case class DataSource(
 // For partitioned relation r, r.schema's column ordering can be 
different from the column
 // ordering of data.logicalPlan (partition columns are all moved after 
data column).  This
 // will be adjusted within InsertIntoHadoopFsRelation.
-val plan =
-  InsertIntoHadoopFsRelationCommand(
-outputPath = outputPath,
-staticPartitions = Map.empty,
-ifPartitionNotExists = false,
-partitionColumns = partitionAttributes,
-bucketSpec = bucketSpec,
-fileFormat = format,
-options = options,
-query = data.logicalPlan,
-mode = mode,
-catalogTable = catalogTable,
-fileIndex = fileIndex)
-  sparkSession.sessionState.executePlan(plan).toRdd
+InsertIntoHadoopFsRelationCommand(
+  outputPath = outputPath,
+  staticPartitions = Map.empty,
+  ifPartitionNotExists = false,
+  partitionColumns = partitionColumns.map(UnresolvedAttribute.quoted),
--- End diff --

`InsertIntoHadoopFsRelationCommand` will pass `partitionColumns` to 
`FileFormatWriter`. Seems `FileFormatWriter` won't resolve those 
`partitionColumns`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

2017-05-25 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18064#discussion_r118440922
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -426,36 +418,35 @@ case class DataSource(
 // For partitioned relation r, r.schema's column ordering can be 
different from the column
 // ordering of data.logicalPlan (partition columns are all moved after 
data column).  This
 // will be adjusted within InsertIntoHadoopFsRelation.
-val plan =
-  InsertIntoHadoopFsRelationCommand(
-outputPath = outputPath,
-staticPartitions = Map.empty,
-ifPartitionNotExists = false,
-partitionColumns = partitionAttributes,
-bucketSpec = bucketSpec,
-fileFormat = format,
-options = options,
-query = data.logicalPlan,
-mode = mode,
-catalogTable = catalogTable,
-fileIndex = fileIndex)
-  sparkSession.sessionState.executePlan(plan).toRdd
+InsertIntoHadoopFsRelationCommand(
+  outputPath = outputPath,
+  staticPartitions = Map.empty,
+  ifPartitionNotExists = false,
+  partitionColumns = partitionColumns.map(UnresolvedAttribute.quoted),
--- End diff --

Is it safe to use unresolved attributes here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

2017-05-24 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18064#discussion_r118410964
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriter.scala
 ---
@@ -86,12 +86,10 @@ private[kafka010] object KafkaWriter extends Logging {
   topic: Option[String] = None): Unit = {
 val schema = queryExecution.analyzed.output
 validateQuery(queryExecution, kafkaParameters, topic)
-SQLExecution.withNewExecutionId(sparkSession, queryExecution) {
--- End diff --

If you mean `KafkaSourceProvider`, is it the same code path as `KafkaSink`? 
In `KafkaSink.addBath`, `KafkaWriter.write` is also called to write into Kafka. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

2017-05-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18064#discussion_r118403738
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -965,14 +965,20 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
   }
 
   test("sanity test for SPARK-6618") {
-(1 to 100).par.map { i =>
--- End diff --

this test was failing and complained that the execution id is already set. 
I don't know the detail about how we schedule the 100 tasks on threads with 
`par`, so I changed it to use `Thread` directly to be more clear. I'll reduce 
the number of threads.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

2017-05-24 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/18064#discussion_r118391216
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
 ---
@@ -161,50 +161,50 @@ object FileFormatWriter extends Logging {
   }
 }
 
-SQLExecution.withNewExecutionId(sparkSession, queryExecution) {
-  // This call shouldn't be put into the `try` block below because it 
only initializes and
-  // prepares the job, any exception thrown from here shouldn't cause 
abortJob() to be called.
-  committer.setupJob(job)
-
-  try {
-val rdd = if (orderingMatched) {
-  queryExecution.toRdd
-} else {
-  SortExec(
-requiredOrdering.map(SortOrder(_, Ascending)),
-global = false,
-child = queryExecution.executedPlan).execute()
-}
-val ret = new Array[WriteTaskResult](rdd.partitions.length)
-sparkSession.sparkContext.runJob(
-  rdd,
-  (taskContext: TaskContext, iter: Iterator[InternalRow]) => {
-executeTask(
-  description = description,
-  sparkStageId = taskContext.stageId(),
-  sparkPartitionId = taskContext.partitionId(),
-  sparkAttemptNumber = taskContext.attemptNumber(),
-  committer,
-  iterator = iter)
-  },
-  0 until rdd.partitions.length,
-  (index, res: WriteTaskResult) => {
-committer.onTaskCommit(res.commitMsg)
-ret(index) = res
-  })
-
-val commitMsgs = ret.map(_.commitMsg)
-val updatedPartitions = ret.flatMap(_.updatedPartitions)
-  .distinct.map(PartitioningUtils.parsePathFragment)
-
-committer.commitJob(job, commitMsgs)
-logInfo(s"Job ${job.getJobID} committed.")
-refreshFunction(updatedPartitions)
-  } catch { case cause: Throwable =>
-logError(s"Aborting job ${job.getJobID}.", cause)
-committer.abortJob(job)
-throw new SparkException("Job aborted.", cause)
+// TODO: make sure there is execution id when we reach here.
--- End diff --

Is there any case that execution id is missing? Otherwise, we can just add 
an assertion here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

2017-05-24 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/18064#discussion_r118391357
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -965,14 +965,20 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
   }
 
   test("sanity test for SPARK-6618") {
-(1 to 100).par.map { i =>
--- End diff --

Why changed this test? Using 100 threads will be much slower.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

2017-05-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18064#discussion_r118198979
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriter.scala
 ---
@@ -86,12 +86,10 @@ private[kafka010] object KafkaWriter extends Logging {
   topic: Option[String] = None): Unit = {
 val schema = queryExecution.analyzed.output
 validateQuery(queryExecution, kafkaParameters, topic)
-SQLExecution.withNewExecutionId(sparkSession, queryExecution) {
--- End diff --

yea, because it's used in `KafkaProvider` which is run with 
`SaveIntoDataSourceCommand`, and all commands will be wrapped by 
`SQLExecution.withNewExecutionId`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

2017-05-24 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18064#discussion_r118193935
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -464,18 +455,18 @@ case class DataSource(
   }
 
   /**
-   * Writes the given [[DataFrame]] out to this [[DataSource]].
+   * Returns a logical plan to write the given [[DataFrame]] out to this 
[[DataSource]].
--- End diff --

the give [[LogicalPlan]].


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

2017-05-24 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18064#discussion_r118193454
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -390,7 +391,8 @@ case class DataSource(
   /**
* Writes the given [[DataFrame]] out in this [[FileFormat]].
*/
-  private def writeInFileFormat(format: FileFormat, mode: SaveMode, data: 
DataFrame): Unit = {
+  private def planForWritingFileFormat(
+  format: FileFormat, mode: SaveMode, data: LogicalPlan): LogicalPlan 
= {
--- End diff --

given [[LogicalPlan]].


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

2017-05-24 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18064#discussion_r118193391
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -426,36 +418,35 @@ case class DataSource(
 // For partitioned relation r, r.schema's column ordering can be 
different from the column
 // ordering of data.logicalPlan (partition columns are all moved after 
data column).  This
 // will be adjusted within InsertIntoHadoopFsRelation.
-val plan =
-  InsertIntoHadoopFsRelationCommand(
-outputPath = outputPath,
-staticPartitions = Map.empty,
-ifPartitionNotExists = false,
-partitionColumns = partitionAttributes,
-bucketSpec = bucketSpec,
-fileFormat = format,
-options = options,
-query = data.logicalPlan,
-mode = mode,
-catalogTable = catalogTable,
-fileIndex = fileIndex)
-  sparkSession.sessionState.executePlan(plan).toRdd
+InsertIntoHadoopFsRelationCommand(
+  outputPath = outputPath,
+  staticPartitions = Map.empty,
+  ifPartitionNotExists = false,
+  partitionColumns = partitionColumns.map(UnresolvedAttribute.quoted),
+  bucketSpec = bucketSpec,
+  fileFormat = format,
+  options = options,
+  query = data,
+  mode = mode,
+  catalogTable = catalogTable,
+  fileIndex = fileIndex)
   }
 
   /**
* Writes the given [[DataFrame]] out to this [[DataSource]] and returns 
a [[BaseRelation]] for
--- End diff --

Comment needs to change: given [[LogicalPlan]].


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

2017-05-24 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18064#discussion_r118191647
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriter.scala
 ---
@@ -86,12 +86,10 @@ private[kafka010] object KafkaWriter extends Logging {
   topic: Option[String] = None): Unit = {
 val schema = queryExecution.analyzed.output
 validateQuery(queryExecution, kafkaParameters, topic)
-SQLExecution.withNewExecutionId(sparkSession, queryExecution) {
--- End diff --

After removing this, can `KafkaSink` still track its streaming writing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

2017-05-23 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18064#discussion_r117919814
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -96,11 +96,12 @@ case class AnalyzeColumnCommand(
   attributesToAnalyze.map(ColumnStat.statExprs(_, ndvMaxErr))
 
 val namedExpressions = expressions.map(e => Alias(e, e.toString)())
-val statsRow = Dataset.ofRows(sparkSession, Aggregate(Nil, 
namedExpressions, relation)).head()
+val statsRow = Dataset.ofRows(sparkSession, Aggregate(Nil, 
namedExpressions, relation))
+  .queryExecution.executedPlan.executeTake(1).head
--- End diff --

```
new QueryExecution(sparkSession, Aggregate(Nil, namedExpressions, 
relation)).executedPlan.executeTake(1).head
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

2017-05-22 Thread cloud-fan

GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/18064

[SPARK-20213][SQL] Fix DataFrameWriter operations in SQL UI tab

## What changes were proposed in this pull request?

Currently the `DataFrameWriter` operations have several problems:

1. non-file-format data source writing action doesn't show up in the SQL 
tab in Spark UI
2. file-format data source writing action shows a scan node in the SQL tab, 
without saying anything about writing.
3. Spark SQL CLI actions don't show up in the SQL tab.

This PR fixes all of them, by refactoring the `ExecuteCommandExec` to make 
it have children.

 close https://github.com/apache/spark/pull/17540

## How was this patch tested?

existing tests.

Also test the UI manually. For a simple command: `Seq(1 -> "a").toDF("i", 
"j").write.parquet("/tmp/qwe")`

before this PR:
https://cloud.githubusercontent.com/assets/3182036/26326050/24e18ba2-3f6c-11e7-8817-6dd275bf6ac5.png;>
after this PR:
https://cloud.githubusercontent.com/assets/3182036/26326054/2ad7f460-3f6c-11e7-8053-d68325beb28f.png;>


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark execution

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18064.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18064


commit 5c24f5ac2b1740a0cf08346a85111a2c06920309
Author: Wenchen Fan 
Date:   2017-05-22T19:51:13Z

Fix DataFrameWriter operations in SQL UI tab




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

21 matches

Site Navigation

Mail list logo

Footer information