[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-28 Thread viirya
Github user viirya closed the pull request at:

https://github.com/apache/spark/pull/4729


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-28 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-76519493
  
I don't see any difference. `DataType.alwaysNullable` just does the same 
think as in this pr...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-27 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-76512714
  
@viirya Thank you for working on it! Our discussions helped me clearly 
understand the problem. After discussions with @liancheng, I am proposing a 
different approach to address this issue in 
https://github.com/apache/spark/pull/4826. Please feel free to leave comments 
at there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-76420655
  
  [Test build #28070 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28070/consoleFull)
 for   PR 4729 at commit 
[`2949324`](https://github.com/apache/spark/commit/2949324222c0e37d6291f9a6b95a383676408ce9).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `logError("User class threw exception: " + 
cause.getMessage, cause)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-76420679
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28070/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-76403689
  
  [Test build #28070 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28070/consoleFull)
 for   PR 4729 at commit 
[`2949324`](https://github.com/apache/spark/commit/2949324222c0e37d6291f9a6b95a383676408ce9).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-26 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-76315917
  
I can't tell if SPARK-5508 is using InsertIntoHive or not. I didn't see if 
`spark.sql.parquet.useDataSourceApi` is turning on or off in that JIRA.

If you simple replace `InsertIntoTable`'s relation to `ParquetConversions`, 
then you will get `org.apache.spark.sql.AnalysisException`. So I don't know why 
you said the test is passed.

For SPARK-5950, there are few issues:

1 It the problem of `ParquetConversions`. As you did in #4782, 
`InsertIntoTable`'s table is never replaced. 
2 `AnalysisException`. That is why I use `InsertIntoHiveTable` to replace 
`InsertIntoTable` in `ParquetConversions`. Because `InsertIntoHiveTable` 
doesn't check the equality of `containsNull`.
3 Since the `containsNull` of `ArrayType`, `MapType`, `StructType` is set 
to true by default, the schema of created Parquet table always has 
`containsNull` as true. Later, when you try to insert data that has same schema 
but only with different `containsNull` value, Parquet library will complain 
that the schema is different. So the reading will fail.

This pr has solved all the three problems (I will update for `MapType`, 
`StructType`). #4782 just considers the first one.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-26 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-76313868
  
OK. Now I understand what's going on. For SPARK-5950, we cannot do insert 
because `InsertIntoTable` will not be resolved and you saw an 
org.apache.spark.sql.AnalysisException, right? For SPARK-5508, the problem is 
data is inserted through InsertIntoHive and we cannot read it from our data 
source API write path. Are you trying to resolve both in this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-26 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-76310964
  
@yhuai Yes, I know that. I know there are two bugs. And I reported them in 
this pr and fixed them in the commits. You should read the description of this 
pr and my commits first.

You just solved part of the first issue. As I said, the unit test I added 
is still failed on the master now. That is because your commit is just part of 
my commits in this pr. Because of that, I don't know why you want to open 
another pr, instead of just using my commits.

I have said, the second issue is not caused by "Hive's parquet serde may 
not be able to read by data source parquet table". Because I create the parquet 
table using data source api not Hive parquet serde.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-26 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/4729#discussion_r25480833
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/parquet/parquetSuites.scala ---
@@ -299,6 +301,37 @@ class ParquetDataSourceOnSourceSuite extends 
ParquetSourceSuiteBase {
 super.afterAll()
 setConf(SQLConf.PARQUET_USE_DATA_SOURCE_API, originalConf.toString)
   }
+
+  test("insert array into parquet hive table using data source api") {
--- End diff --

`spark.sql.parquet.useDataSourceApi` is turn on already in the unit test I 
added. It failed on the master I just pulled.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-26 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-76310259
  
Seems Hive's parquet serde always values are nullable. Can you double check 
it? Also, we need to check if `StructType` and `MapType` are affected by this 
bug.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-26 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4729#discussion_r25480542
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ---
@@ -254,4 +254,13 @@ private[hive] trait HiveStrategies {
   case _ => Nil
 }
   }
+
+  object HiveDataSourceStrategy extends Strategy {
--- End diff --

Seems we do not need it. When we want to insert into a data source table, 
`logical.InsertIntoTable` will be used instead of `logical.InsertIntoHiveTable`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-26 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4729#discussion_r25480454
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/parquet/parquetSuites.scala ---
@@ -299,6 +301,37 @@ class ParquetDataSourceOnSourceSuite extends 
ParquetSourceSuiteBase {
 super.afterAll()
 setConf(SQLConf.PARQUET_USE_DATA_SOURCE_API, originalConf.toString)
   }
+
+  test("insert array into parquet hive table using data source api") {
--- End diff --

I just tried this test with our master, it did not fail. I think you need 
to first turn off the conversion for the write path and then turn on the 
conversion for the read path. You can use `spark.sql.parquet.useDataSourceApi` 
to control it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-26 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-76309465
  
Maybe I did not explain it clearly. SPARK-6023 and SPARK-5950 are two bugs, 
the first one is that we failed to replace the destination MetastoreRelation in 
InsertIntoTable even we ask Spark SQL to convert all MetastoreRelations 
associated with parquet tables to our data source parquet tables. The root 
cause for this one was clear and the fix is pretty simple. The second bug is 
arrays (maybe maps and structs?) written by Hive's parquet serde may not be 
able to read by data source parquet table. SPARK-5950 is for this bug. Since 
this pr is not ready (I will leave comments later), I made #4782 and we checked 
in it first to fix SPARK-6023.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-26 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-76306767
  
@liancheng @yhuai Actually I don't know why you opened #4782 in order to 
fix the first issue. Because as I see, the commits of #4782 is just part of my 
commits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-26 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-76304040
  
In fact, even #4782 doesn't solve the issue I reported in this pr. The unit 
test is failed before hitting the data insertion issue...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-26 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-76303227
  
Can you try your unit test (without any other change) with master? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-26 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-76302646
  
@yhuai I see. That issue was first fixed by this pr. You can see the 
commits before. Even the destination table in replaced, the issue of array (or 
map) is still there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-26 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-76301583
  
When the jira was created, we did not correctly replace the destination 
table in insert into to our data source table. We were actually calling 
InsertIntoHive to do the work. f02394d06473889d0d7897c4583239e6e136ff46 fixed 
this problem. Now, you need to turn off our metastore conversion to see the 
problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-26 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-76301308
  
@yhuai That problem is not caused by hive parquet serde. You can see the 
unit test I added. The table is created using data source api.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-26 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-76296385
  
@viirya I think the issue at here is that the data written by hive parquet 
serde may not be read back by our own data source parquet. I have changed the 
title of the jira. It will be great if you can change your PR title.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-26 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-76295694
  
@liancheng Unlike the issue of `ParquetConversions`, I think the array 
insertion issue may not be just a Hive specific one. The problem is when we 
create Parquet table that includes array (or map, struct), by default we use a 
schema that sets `containsNull` as true. But actually later we want to insert 
data, the data schema could have `containsNull` as true or false. In Hive, 
seems that it solves this problem by only supporting these fields containing 
null elements. So no matter the inserting data contains null or not, we set its 
schema to have `containsNull` as true before inserting into Parquet file. Since 
I think we don't want to explicitly change the data schema and affect other 
parts, doing it in `RowWriteSupport` should be ok, except you have other 
concerns.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-26 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-76256448
  
Hey @viirya, this PR actually fixes two issues, the `ParquetConvertions` 
one and the array insertion one. However, both fixes need some tweaks. As 1.3 
release is really close, @yhuai opened #4782 based on your work to fix the 
first issue. As for the array insertion issue, I feel hesitant to add the fix 
in `RowWriteSupport`, since this should be a Hive specific issue. Also, map and 
struct should also suffer the same issue, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-25 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/4729#discussion_r25400611
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -424,7 +424,7 @@ private[hive] class HiveMetastoreCatalog(hive: 
HiveContext) extends Catalog with
   // Collects all `MetastoreRelation`s which should be replaced
   val toBeReplaced = plan.collect {
 // Write path
-case InsertIntoTable(relation: MetastoreRelation, _, _, _)
+case InsertIntoHiveTable(relation: MetastoreRelation, _, _, _)
--- End diff --

Oh sorry, I mistook this for the physical plan with the same name...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-25 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/4729#discussion_r2539
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -424,7 +424,7 @@ private[hive] class HiveMetastoreCatalog(hive: 
HiveContext) extends Catalog with
   // Collects all `MetastoreRelation`s which should be replaced
   val toBeReplaced = plan.collect {
 // Write path
-case InsertIntoTable(relation: MetastoreRelation, _, _, _)
+case InsertIntoHiveTable(relation: MetastoreRelation, _, _, _)
--- End diff --

`InsertIntoHiveTable` is a `LogicalPlan` defined in `HiveMetastoreCatalog`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-25 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/4729#discussion_r25362155
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -458,6 +458,9 @@ private[hive] class HiveMetastoreCatalog(hive: 
HiveContext) extends Catalog with
 
   withAlias
 }
+case InsertIntoHiveTable(r: MetastoreRelation, p, c, o) if 
relationMap.contains(r) =>
+  val parquetRelation = relationMap(r)
+  InsertIntoHiveTable(parquetRelation, p, c, o) 
--- End diff --

Same here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-25 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/4729#discussion_r25362135
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -424,7 +424,7 @@ private[hive] class HiveMetastoreCatalog(hive: 
HiveContext) extends Catalog with
   // Collects all `MetastoreRelation`s which should be replaced
   val toBeReplaced = plan.collect {
 // Write path
-case InsertIntoTable(relation: MetastoreRelation, _, _, _)
+case InsertIntoHiveTable(relation: MetastoreRelation, _, _, _)
--- End diff --

I don't think this is right here. `ParquetConversions` is an analysis rule, 
which only processes logical plans. However, `InsertIntoHiveTable` is a 
physical plan node.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-24 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-75852468
  
/cc @liancheng


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-24 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-75758412
  
cc @marmbrus.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-75718241
  
  [Test build #27883 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27883/consoleFull)
 for   PR 4729 at commit 
[`175966f`](https://github.com/apache/spark/commit/175966f4e275beaf21363db196102dcb1a4b1d3e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-75718250
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27883/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-75709019
  
  [Test build #27883 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27883/consoleFull)
 for   PR 4729 at commit 
[`175966f`](https://github.com/apache/spark/commit/175966f4e275beaf21363db196102dcb1a4b1d3e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-75686842
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27870/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-75686837
  
  [Test build #27870 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27870/consoleFull)
 for   PR 4729 at commit 
[`0e07bb8`](https://github.com/apache/spark/commit/0e07bb879d4d804b3c3f7823f8f7d19fdd71d83f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class Params(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-75676366
  
  [Test build #27870 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27870/consoleFull)
 for   PR 4729 at commit 
[`0e07bb8`](https://github.com/apache/spark/commit/0e07bb879d4d804b3c3f7823f8f7d19fdd71d83f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-75597588
  
  [Test build #27853 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27853/consoleFull)
 for   PR 4729 at commit 
[`4e3bd55`](https://github.com/apache/spark/commit/4e3bd5568e644bc81e2539a917329486ea968a92).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class Params(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-75597599
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27853/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-75588235
  
  [Test build #27853 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27853/consoleFull)
 for   PR 4729 at commit 
[`4e3bd55`](https://github.com/apache/spark/commit/4e3bd5568e644bc81e2539a917329486ea968a92).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-23 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/4729

[SPARK-5950][SQL] Enable inserting array into Hive table saved as Parquet 
using DataSource API

Currently `ParquetConversions` in `HiveMetastoreCatalog` does not really 
work. One reason is that table is not part of the children nodes of 
`InsertIntoTable`. So the replacing is not working.

When we create a Parquet table in Hive with ARRAY field. In default 
`ArrayType` has `containsNull` as true. It affects the table's schema. But when 
inserting data into the table later, the schema of inserting data can be  with 
`containsNull` as true or false. That makes the inserting/reading failed.

A similar problem is reported in 
https://issues.apache.org/jira/browse/SPARK-5508.

Hive seems only support null elements array. So this pr enables same 
behavior.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 hive_parquet

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4729.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4729


commit 4e3bd5568e644bc81e2539a917329486ea968a92
Author: Liang-Chi Hsieh 
Date:   2015-02-23T17:03:30Z

Enable inserting array into hive table saved as parquet using datasource.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org