[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-16 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-157027289
  
@marmbrus Is this one OK for branch-1.6?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-16 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-157027571
  
@HyukjinKwon Thanks! I've merged this one to master. And yes, please feel 
free to add the decimal test case(s).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/9060


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-16 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-157138358
  
Merging to branch-1.6.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-16 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-157108064
  
Sure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156879272
  
I saw accidently `TODO Adds test case for reading dictionary encoded 
decimals written as 'FIXED_LEN_BYTE_ARRAY'`.

I will also add this test in the following PR for using the overloaded 
`writeMetaFile`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156891545
  
**[Test build #45964 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45964/consoleFull)**
 for PR 9060 at commit 
[`cea5034`](https://github.com/apache/spark/commit/cea50348da091e5d83c14474a76d4f49e1ff3c9b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156891628
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45964/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156891627
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156879507
  
**[Test build #45964 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45964/consoleFull)**
 for PR 9060 at commit 
[`cea5034`](https://github.com/apache/spark/commit/cea50348da091e5d83c14474a76d4f49e1ff3c9b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-13 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/9060#discussion_r44765188
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala
 ---
@@ -513,6 +515,41 @@ class ParquetIOSuite extends QueryTest with 
ParquetTest with SharedSQLContext {
 }
   }
 
+  test("SPARK-11044 Parquet writer version fixed as version1 ") {
+
+// For dictionary encoding, Parquet changes the encoding types 
according to its writer version
+// So, this test checks the encoding types in order to ensure that the 
file is written with
+// writer version2.
+withTempPath { dir =>
+  val clonedConf = new Configuration(hadoopConfiguration)
+  try {
+
+// Write a Parquet file with writer version 2
+hadoopConfiguration.set(ParquetOutputFormat.WRITER_VERSION,
+  ParquetProperties.WriterVersion.PARQUET_2_0.toString)
+
+// By default, dictionary encoding is enabled from Parquet 1.2.0 
but
+// it is enabled just in case.
+
hadoopConfiguration.setBoolean(ParquetOutputFormat.ENABLE_DICTIONARY, true)
+val path = s"${dir.getCanonicalPath}/part-r-0.parquet"
+sqlContext.range(1 << 16).selectExpr("(id % 4) AS i")
+  .coalesce(1).write.mode("overwrite").parquet(path)
+
+val blockMetadata = readFooter(new Path(path), 
hadoopConfiguration).getBlocks.asScala.head
+val columnChunkMetadata = blockMetadata.getColumns.asScala.head
+
+// If the file is written with version 2, this should include
+// [[Encoding.RLE_DICTIONARY]] type. For version 1, it is 
Encoding.PLAIN_DICTIONARY
--- End diff --

BTW, the `[[...]]` notation is only useful when writing ScalaDoc. In case 
of inline comment s like this, you may either omit the brackets or use 
backquotes to emphasize that the quoted part is a Scala/Java entity.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-13 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/9060#discussion_r44764961
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala
 ---
@@ -513,6 +515,41 @@ class ParquetIOSuite extends QueryTest with 
ParquetTest with SharedSQLContext {
 }
   }
 
+  test("SPARK-11044 Parquet writer version fixed as version1 ") {
+
+// For dictionary encoding, Parquet changes the encoding types 
according to its writer version
+// So, this test checks the encoding types in order to ensure that the 
file is written with
+// writer version2.
+withTempPath { dir =>
+  val clonedConf = new Configuration(hadoopConfiguration)
+  try {
+
--- End diff --

Nit: Remove this empty line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-13 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/9060#discussion_r44764956
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala
 ---
@@ -513,6 +515,41 @@ class ParquetIOSuite extends QueryTest with 
ParquetTest with SharedSQLContext {
 }
   }
 
+  test("SPARK-11044 Parquet writer version fixed as version1 ") {
+
--- End diff --

Nit: Remove this empty line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-13 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156379942
  
LGTM except for a few minor styling issue. I can merge it right after you 
fix them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-12 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156072061
  
I think we can check for column encoding information, which is accessible 
from Parquet footers. For example, `PARQUET_2_0` uses `RLE_DICTIONARY` while 
`PARQUET_1_0` uses `PLAIN_DICTIONARY` (see [here][1]).

The [parquet-meta CLI tool][2] can be a reference for how to inspect 
related metadata.

[1]: 
https://github.com/apache/parquet-mr/blob/apache-parquet-1.7.0/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L116-L123
[2]: 
https://github.com/apache/parquet-mr/blob/master/parquet-tools/src/main/java/org/apache/parquet/tools/util/MetadataUtils.java#L139


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156076494
  
Thank toy very much. I will try in that way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-12 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156077334
  
You may construct a Parquet file consists of a single column with 
dictionary encoding using:

```scala
val path = "file:///tmp/parquet/dict"
sqlContext.range(1 << 16).selectExpr("(id % 4) AS 
i").coalesce(1).write.mode("overwrite").parquet(path)
```

And here are instructions of building and installing the parquet-tools CLI 
tool. Then you can inspect Parquet metadata using:

```
$ parquet-meta /tmp/parquet/dict

file:
file:/private/tmp/parquet/dict/part-r-0-88498608-9eed-4728-b96a-b60bc5ebc2a8.gz.parquet
creator: parquet-mr version 1.6.0
extra:   org.apache.spark.sql.parquet.row.metadata = 
{"type":"struct","fields":[{"name":"i","type":"long","nullable":true,"metadata":{}}]}

file schema: root

--
i:   OPTIONAL INT64 R:0 D:1

row group 1: RC:65536 TS:16615 OFFSET:4

--
i:INT64 GZIP DO:0 FPO:4 SZ:198/16615/83.91 VC:65536 
ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY
```

The `ENC:...` part in the last line is column encoding information.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156306727
  
Fortunately, I worked around parquet tools once and looked through Parquet 
codes several times :).

Thank you very much for your help. This could be dome much more easily than 
I though because of your help.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156306860
  
  [Test build #45810 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45810/consoleFull)
 for   PR 9060 at commit 
[`2d1d343`](https://github.com/apache/spark/commit/2d1d343ab4a0218cfcbc621c6fccb77397e7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156322308
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156322309
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45810/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156322233
  
  [Test build #45810 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45810/console)
 for   PR 9060 at commit 
[`2d1d343`](https://github.com/apache/spark/commit/2d1d343ab4a0218cfcbc621c6fccb77397e7).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156354310
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45831/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156327284
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156327273
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156354309
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156354224
  
**[Test build #45831 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45831/consoleFull)**
 for PR 9060 at commit 
[`78449ec`](https://github.com/apache/spark/commit/78449ec530007bbebf729c19e74364dd0e001b81).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`class TypedColumn[-T, U](`\n  * `class JavaTrackStateDStream[KeyType, 
ValueType, StateType, EmittedType](`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156099372
  
Thanks! I will follow the way. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156309719
  
**[Test build #45811 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45811/consoleFull)**
 for PR 9060 at commit 
[`7e80ad6`](https://github.com/apache/spark/commit/7e80ad6082a9f5b53f08800bfb519a2a80632ec8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156325563
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156325565
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45811/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156325499
  
**[Test build #45811 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45811/consoleFull)**
 for PR 9060 at commit 
[`7e80ad6`](https://github.com/apache/spark/commit/7e80ad6082a9f5b53f08800bfb519a2a80632ec8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156327584
  
**[Test build #45831 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45831/consoleFull)**
 for PR 9060 at commit 
[`78449ec`](https://github.com/apache/spark/commit/78449ec530007bbebf729c19e74364dd0e001b81).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156306712
  
Build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156306692
  
 Build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156309106
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156309116
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-11 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-155718158
  
@HyukjinKwon Oh yeah, sorry. Finally got sometime to clean my review queue 
:)

I wonder is there an easy way to add a test case for this? At first I 
thought `WriterVersion` corresponds to the the `version` field of the Thrift 
struct `FileMetaData` described in [parquet-format] [1], but it's not. I only 
found that when `WriterVersion` is set to v2, the Thrift field 
`PageHeader.type` is set to `DATA_PAGE_V2`.

[1]: https://github.com/apache/parquet-format#metadata


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-11 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-155718167
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-155718924
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-155718954
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-155720490
  
I will try to find and test them first tommorow before adding a commit!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-155719264
  
**[Test build #45626 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45626/consoleFull)**
 for PR 9060 at commit 
[`2eee7e3`](https://github.com/apache/spark/commit/2eee7e37b6f366336cbe19bd9545f07abb13f7db).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-155752417
  
**[Test build #45626 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45626/consoleFull)**
 for PR 9060 at commit 
[`2eee7e3`](https://github.com/apache/spark/commit/2eee7e37b6f366336cbe19bd9545f07abb13f7db).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-155753066
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-155753068
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45626/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-155973698
  
@liancheng I give some tries to figure out the version but.. as you said, 
it is pretty tricky to check the writer version as it only changes the version 
of data page which we could know only within the internal of Parquet.

Would this be too inappropriate if we write Parquet files with both 
version1 and version2 and then, check if the sizes of both are equal?

Since encoding types are different, the size should be also different.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-154597634
  
@liancheng I assume you missed this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-10-18 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-148994769
  
/cc @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-10-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/9060#discussion_r41705069
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystWriteSupport.scala
 ---
@@ -431,6 +431,7 @@ private[parquet] object CatalystWriteSupport {
 configuration.set(SPARK_ROW_SCHEMA, schema.json)
 configuration.set(
   ParquetOutputFormat.WRITER_VERSION,
-  ParquetProperties.WriterVersion.PARQUET_1_0.toString)
+  configuration.get(ParquetOutputFormat.WRITER_VERSION,
--- End diff --

Yeap I just updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-10-10 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9060#discussion_r41695242
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystWriteSupport.scala
 ---
@@ -431,6 +431,7 @@ private[parquet] object CatalystWriteSupport {
 configuration.set(SPARK_ROW_SCHEMA, schema.json)
 configuration.set(
   ParquetOutputFormat.WRITER_VERSION,
-  ParquetProperties.WriterVersion.PARQUET_1_0.toString)
+  configuration.get(ParquetOutputFormat.WRITER_VERSION,
--- End diff --

Can you just use `setIfUnset` here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-10-10 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/9060

[SPARK-11044][SQL] Parquet writer version fixed as version1

https://issues.apache.org/jira/browse/SPARK-11044

Spark only writes the parquet file with writer version1 ignoring the given 
writer version by user.

So, in this PR, it keeps the writer version if given and sets version1 as 
default.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-11044

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9060.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9060


commit 5e72fbc93ec0783d5a440f8f70c7653f8fc39d9a
Author: HyukjinKwon 
Date:   2015-10-10T06:59:52Z

[SPARK-11044][SQL] Apply the writer version if given.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-147047845
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org