[GitHub] spark pull request: [SPARK-5775] [SQL] BugFix: GenericRow cannot b...

2015-02-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4792


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5775] [SQL] BugFix: GenericRow cannot b...

2015-02-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4792#issuecomment-76516145
  
  [Test build #28117 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28117/consoleFull)
 for   PR 4792 at commit 
[`538f506`](https://github.com/apache/spark/commit/538f506851d7e2eba6a20d0ad4a5909486bf8516).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5775] [SQL] BugFix: GenericRow cannot b...

2015-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4792#issuecomment-76516146
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28117/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5775] [SQL] BugFix: GenericRow cannot b...

2015-02-27 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/4792#issuecomment-76514046
  
@yhuai Thanks for the review! I've addressed the comments. Will merge this 
to master and branch-1.3 after Jenkins approves.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5775] [SQL] BugFix: GenericRow cannot b...

2015-02-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4792#issuecomment-76513950
  
  [Test build #28117 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28117/consoleFull)
 for   PR 4792 at commit 
[`538f506`](https://github.com/apache/spark/commit/538f506851d7e2eba6a20d0ad4a5909486bf8516).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5775] [SQL] BugFix: GenericRow cannot b...

2015-02-27 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/4792#discussion_r25554933
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala ---
@@ -483,16 +487,42 @@ private[sql] case class ParquetRelation2(
 
 val requiredPartOrdinal = partitionKeyLocations.keys.toSeq
 
-iterator.map { pair =>
-  val row = pair._2.asInstanceOf[SpecificMutableRow]
-  var i = 0
-  while (i < requiredPartOrdinal.size) {
-// TODO Avoids boxing cost here!
-val partOrdinal = requiredPartOrdinal(i)
-row.update(partitionKeyLocations(partOrdinal), 
partValues(partOrdinal))
-i += 1
+if (primitiveRow) {
+  iterator.map { pair =>
+// We are using CatalystPrimitiveRowConverter and it returns a 
SpecificMutableRow.
+val row = pair._2.asInstanceOf[SpecificMutableRow]
+var i = 0
+while (i < requiredPartOrdinal.size) {
+  // TODO Avoids boxing cost here!
+  val partOrdinal = requiredPartOrdinal(i)
+  row.update(partitionKeyLocations(partOrdinal), 
partValues(partOrdinal))
+  i += 1
+}
+row
+  }
+} else {
+  // Create a mutable row since we need to fill in values from 
partition columns.
+  val mutableRow = new GenericMutableRow(requestedSchema.size)
--- End diff --

This is not necessary. Unlike `output.size` in `ParquetTableScan`, 
`requestedSchema` here is already a local variable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5775] [SQL] BugFix: GenericRow cannot b...

2015-02-27 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4792#discussion_r25533656
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala
 ---
@@ -143,19 +146,47 @@ private[sql] case class ParquetTableScan(
   relation.partitioningAttributes
 .map(a => Cast(Literal(partValues(a.name)), 
a.dataType).eval(EmptyRow))
 
-new Iterator[Row] {
-  def hasNext = iter.hasNext
-  def next() = {
-val row = iter.next()._2.asInstanceOf[SpecificMutableRow]
-
-// Parquet will leave partitioning columns empty, so we fill 
them in here.
-var i = 0
-while (i < requestedPartitionOrdinals.size) {
-  row(requestedPartitionOrdinals(i)._2) =
-partitionRowValues(requestedPartitionOrdinals(i)._1)
-  i += 1
+if (primitiveRow) {
+  new Iterator[Row] {
+def hasNext = iter.hasNext
+def next() = {
+  // We are using CatalystPrimitiveRowConverter and it returns 
a SpecificMutableRow.
+  val row = iter.next()._2.asInstanceOf[SpecificMutableRow]
+
+  // Parquet will leave partitioning columns empty, so we fill 
them in here.
+  var i = 0
+  while (i < requestedPartitionOrdinals.size) {
+row(requestedPartitionOrdinals(i)._2) =
+  partitionRowValues(requestedPartitionOrdinals(i)._1)
+i += 1
+  }
+  row
+}
+  }
+} else {
+  // Create a mutable row since we need to fill in values from 
partition columns.
+  val mutableRow = new GenericMutableRow(output.size)
--- End diff --

Seems it will be safer if we get `output.size` out side of 
`mapPartitionsWithInputSplit`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5775] [SQL] BugFix: GenericRow cannot b...

2015-02-27 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4792#discussion_r25533640
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala ---
@@ -483,16 +487,42 @@ private[sql] case class ParquetRelation2(
 
 val requiredPartOrdinal = partitionKeyLocations.keys.toSeq
 
-iterator.map { pair =>
-  val row = pair._2.asInstanceOf[SpecificMutableRow]
-  var i = 0
-  while (i < requiredPartOrdinal.size) {
-// TODO Avoids boxing cost here!
-val partOrdinal = requiredPartOrdinal(i)
-row.update(partitionKeyLocations(partOrdinal), 
partValues(partOrdinal))
-i += 1
+if (primitiveRow) {
+  iterator.map { pair =>
+// We are using CatalystPrimitiveRowConverter and it returns a 
SpecificMutableRow.
+val row = pair._2.asInstanceOf[SpecificMutableRow]
+var i = 0
+while (i < requiredPartOrdinal.size) {
+  // TODO Avoids boxing cost here!
+  val partOrdinal = requiredPartOrdinal(i)
+  row.update(partitionKeyLocations(partOrdinal), 
partValues(partOrdinal))
+  i += 1
+}
+row
+  }
+} else {
+  // Create a mutable row since we need to fill in values from 
partition columns.
+  val mutableRow = new GenericMutableRow(requestedSchema.size)
--- End diff --

Seems it will be safer if we get `requestedSchema.size` out side of 
`mapPartitionsWithInputSplit`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5775] [SQL] BugFix: GenericRow cannot b...

2015-02-27 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4792#discussion_r25533533
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala ---
@@ -476,6 +476,10 @@ private[sql] case class ParquetRelation2(
 // When the data does not include the key and the key is requested 
then we must fill it in
 // based on information from the input split.
 if (!partitionKeysIncludedInDataSchema && 
partitionKeyLocations.nonEmpty) {
+  // This check if based on CatalystConverter.createRootConverter.
--- End diff --

same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5775] [SQL] BugFix: GenericRow cannot b...

2015-02-27 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4792#discussion_r25533517
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala
 ---
@@ -126,6 +126,9 @@ private[sql] case class ParquetTableScan(
 conf)
 
 if (requestedPartitionOrdinals.nonEmpty) {
+  // This check if based on CatalystConverter.createRootConverter.
--- End diff --

I made a typo at here... if => is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5775] [SQL] BugFix: GenericRow cannot b...

2015-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4792#issuecomment-76328427
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28039/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5775] [SQL] BugFix: GenericRow cannot b...

2015-02-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4792#issuecomment-76328421
  
  [Test build #28039 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28039/consoleFull)
 for   PR 4792 at commit 
[`cee55cf`](https://github.com/apache/spark/commit/cee55cfb849355e21363e6a4fdc155d27db88401).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5775] [SQL] BugFix: GenericRow cannot b...

2015-02-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4792#issuecomment-76323022
  
  [Test build #28039 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28039/consoleFull)
 for   PR 4792 at commit 
[`cee55cf`](https://github.com/apache/spark/commit/cee55cfb849355e21363e6a4fdc155d27db88401).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5775] [SQL] BugFix: GenericRow cannot b...

2015-02-26 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/4792#issuecomment-76322637
  
@marmbrus @yhuai Thanks for the suggestion and PR, much better!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5775] [SQL] BugFix: GenericRow cannot b...

2015-02-26 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/4792#issuecomment-76307166
  
https://github.com/liancheng/spark/pull/4


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5775] [SQL] BugFix: GenericRow cannot b...

2015-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4792#issuecomment-76270205
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28012/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5775] [SQL] BugFix: GenericRow cannot b...

2015-02-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4792#issuecomment-76270178
  
  [Test build #28012 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28012/consoleFull)
 for   PR 4792 at commit 
[`ca6e038`](https://github.com/apache/spark/commit/ca6e0384f30b8b6d07840a2c555cfc6069126679).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5775] [SQL] BugFix: GenericRow cannot b...

2015-02-26 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/4792#issuecomment-76264043
  
Can't we do the same check that we do inside of parquet to see if the row 
is all primitives once and switch on that, instead of doing per tuple 
extraction and run time type checking?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5775] [SQL] BugFix: GenericRow cannot b...

2015-02-26 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/4792#issuecomment-76251578
  
@marmbrus As discussed offline, I tried to use `SpecificMutableRow` 
throughout the whole Parquet support implementation, but it touches to many 
places, which I think is too risky as 1.3 release is so close. So falled back 
to current implementation. I can do the refactoring later on master (mainly 
focus on the `CatalystConverter` class hierarchy).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5775] [SQL] BugFix: GenericRow cannot b...

2015-02-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4792#issuecomment-76250658
  
  [Test build #28012 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28012/consoleFull)
 for   PR 4792 at commit 
[`ca6e038`](https://github.com/apache/spark/commit/ca6e0384f30b8b6d07840a2c555cfc6069126679).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5775] [SQL] BugFix: GenericRow cannot b...

2015-02-26 Thread liancheng
GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/4792

[SPARK-5775] [SQL] BugFix: GenericRow cannot be cast to SpecificMutableRow 
when nested data and partitioned table

This PR adapts @anselmevignon's #4697 to master and branch-1.3. Please 
refer to PR description of #4697 for details.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark spark-5775

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4792.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4792


commit ca6e0384f30b8b6d07840a2c555cfc6069126679
Author: Cheng Lian 
Date:   2015-02-26T19:24:37Z

Fixes SPARK-5775




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org