subject:"\[GitHub\] spark pull request #14304\: \[SPARK\-16668\]\[TEST\] Test parquet reader for row g..."

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

2016-07-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14304


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

2016-07-22 Thread sameeragarwal

Github user sameeragarwal commented on a diff in the pull request:

https://github.com/apache/spark/pull/14304#discussion_r71954942
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetEncodingSuite.scala
 ---
@@ -78,4 +78,30 @@ class ParquetEncodingSuite extends 
ParquetCompatibilityTest with SharedSQLContex
   }}
 }
   }
+
+  test("Read row group containing both dictionary and plain encoded 
pages") {
+spark.conf.set("parquet.dictionary.page.size", "2048")
+spark.conf.set("parquet.page.size", "4096")
+
+withTempPath { dir =>
+  // In order to explicitly test for SPARK-14217, we set the parquet 
dictionary and page size
+  // such that the following data spans across 3 pages (within a 
single row group) where the
+  // first page is dictionary encoded and the remaining two are plain 
encoded.
+  val data = (0 until 512).flatMap(i => Seq.fill(3)(i.toString))
+  data.toDF("f").coalesce(1).write.parquet(dir.getCanonicalPath)
+  val file =
+
SpecificParquetRecordReaderBase.listDirectory(dir).toArray.head.asInstanceOf[String]
+
+  val reader = new VectorizedParquetRecordReader
+  reader.initialize(file, null /* set columns to null to project all 
columns */)
+  val column = reader.resultBatch().column(0)
+  assert(reader.nextBatch())
+
+  (0 until 512).foreach { i =>
+assert(column.getUTF8String(3 * i).toString == i.toString)
--- End diff --

Seems like there's no `toInt` function in 
`org.apache.spark.unsafe.types.UTF8String`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

2016-07-22 Thread sameeragarwal

Github user sameeragarwal commented on a diff in the pull request:

https://github.com/apache/spark/pull/14304#discussion_r71954540
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetEncodingSuite.scala
 ---
@@ -78,4 +78,30 @@ class ParquetEncodingSuite extends 
ParquetCompatibilityTest with SharedSQLContex
   }}
 }
   }
+
+  test("Read row group containing both dictionary and plain encoded 
pages") {
+spark.conf.set("parquet.dictionary.page.size", "2048")
+spark.conf.set("parquet.page.size", "4096")
+
+withTempPath { dir =>
+  // In order to explicitly test for SPARK-14217, we set the parquet 
dictionary and page size
+  // such that the following data spans across 3 pages (within a 
single row group) where the
+  // first page is dictionary encoded and the remaining two are plain 
encoded.
+  val data = (0 until 512).flatMap(i => Seq.fill(3)(i.toString))
+  data.toDF("f").coalesce(1).write.parquet(dir.getCanonicalPath)
+  val file =
+
SpecificParquetRecordReaderBase.listDirectory(dir).toArray.head.asInstanceOf[String]
+
+  val reader = new VectorizedParquetRecordReader
+  reader.initialize(file, null /* set columns to null to project all 
columns */)
+  val column = reader.resultBatch().column(0)
+  assert(reader.nextBatch())
+
+  (0 until 512).foreach { i =>
+assert(column.getUTF8String(3 * i).toString == i.toString)
--- End diff --

Ah, gotcha!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

2016-07-22 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14304#discussion_r71903616
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetEncodingSuite.scala
 ---
@@ -78,4 +78,30 @@ class ParquetEncodingSuite extends 
ParquetCompatibilityTest with SharedSQLContex
   }}
 }
   }
+
+  test("Read row group containing both dictionary and plain encoded 
pages") {
+spark.conf.set("parquet.dictionary.page.size", "2048")
+spark.conf.set("parquet.page.size", "4096")
--- End diff --

Nit: You can use the following constants instead of hard code the key 
strings:

- `ParquetOutputFormat.DICTIONARY_PAGE_SIZE`
- `ParquetOutputFormat.PAGE_SIZE`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

2016-07-22 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14304#discussion_r71902498
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetEncodingSuite.scala
 ---
@@ -78,4 +78,30 @@ class ParquetEncodingSuite extends 
ParquetCompatibilityTest with SharedSQLContex
   }}
 }
   }
+
+  test("Read row group containing both dictionary and plain encoded 
pages") {
+spark.conf.set("parquet.dictionary.page.size", "2048")
+spark.conf.set("parquet.page.size", "4096")
+
+withTempPath { dir =>
+  // In order to explicitly test for SPARK-14217, we set the parquet 
dictionary and page size
+  // such that the following data spans across 3 pages (within a 
single row group) where the
+  // first page is dictionary encoded and the remaining two are plain 
encoded.
+  val data = (0 until 512).flatMap(i => Seq.fill(3)(i.toString))
+  data.toDF("f").coalesce(1).write.parquet(dir.getCanonicalPath)
+  val file =
+
SpecificParquetRecordReaderBase.listDirectory(dir).toArray.head.asInstanceOf[String]
--- End diff --

We can use `JavaConverters` here:

```scala
import scala.collection.JavaConverters._

val file = SpecificParquetRecordReaderBase.listDirectory(dir).asScala.head
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

2016-07-22 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14304#discussion_r71901937
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetEncodingSuite.scala
 ---
@@ -78,4 +78,30 @@ class ParquetEncodingSuite extends 
ParquetCompatibilityTest with SharedSQLContex
   }}
 }
   }
+
+  test("Read row group containing both dictionary and plain encoded 
pages") {
+spark.conf.set("parquet.dictionary.page.size", "2048")
+spark.conf.set("parquet.page.size", "4096")
--- End diff --

Let's use `withSQLConf` to alter these settings so that they are 
automatically reverted to their original values at the end of the scope.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

2016-07-22 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14304#discussion_r71901576
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetEncodingSuite.scala
 ---
@@ -78,4 +78,30 @@ class ParquetEncodingSuite extends 
ParquetCompatibilityTest with SharedSQLContex
   }}
 }
   }
+
+  test("Read row group containing both dictionary and plain encoded 
pages") {
+spark.conf.set("parquet.dictionary.page.size", "2048")
+spark.conf.set("parquet.page.size", "4096")
+
+withTempPath { dir =>
+  // In order to explicitly test for SPARK-14217, we set the parquet 
dictionary and page size
+  // such that the following data spans across 3 pages (within a 
single row group) where the
+  // first page is dictionary encoded and the remaining two are plain 
encoded.
+  val data = (0 until 512).flatMap(i => Seq.fill(3)(i.toString))
+  data.toDF("f").coalesce(1).write.parquet(dir.getCanonicalPath)
+  val file =
+
SpecificParquetRecordReaderBase.listDirectory(dir).toArray.head.asInstanceOf[String]
+
+  val reader = new VectorizedParquetRecordReader
+  reader.initialize(file, null /* set columns to null to project all 
columns */)
--- End diff --

`VectorizedParquetRecordReader` is a Java class instead of a Scala class, 
so named parameter isn't feasible here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

2016-07-22 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14304#discussion_r71844762
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetEncodingSuite.scala
 ---
@@ -78,4 +78,30 @@ class ParquetEncodingSuite extends 
ParquetCompatibilityTest with SharedSQLContex
   }}
 }
   }
+
+  test("Read row group containing both dictionary and plain encoded 
pages") {
+spark.conf.set("parquet.dictionary.page.size", "2048")
+spark.conf.set("parquet.page.size", "4096")
+
+withTempPath { dir =>
+  // In order to explicitly test for SPARK-14217, we set the parquet 
dictionary and page size
+  // such that the following data spans across 3 pages (within a 
single row group) where the
+  // first page is dictionary encoded and the remaining two are plain 
encoded.
+  val data = (0 until 512).flatMap(i => Seq.fill(3)(i.toString))
+  data.toDF("f").coalesce(1).write.parquet(dir.getCanonicalPath)
+  val file =
+
SpecificParquetRecordReaderBase.listDirectory(dir).toArray.head.asInstanceOf[String]
+
+  val reader = new VectorizedParquetRecordReader
+  reader.initialize(file, null /* set columns to null to project all 
columns */)
+  val column = reader.resultBatch().column(0)
+  assert(reader.nextBatch())
+
+  (0 until 512).foreach { i =>
+assert(column.getUTF8String(3 * i).toString == i.toString)
--- End diff --

What about `toInt` as follows:

```
assert(column.getUTF8String(3 * i).toInt == i)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

2016-07-22 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14304#discussion_r71844632
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetEncodingSuite.scala
 ---
@@ -78,4 +78,30 @@ class ParquetEncodingSuite extends 
ParquetCompatibilityTest with SharedSQLContex
   }}
 }
   }
+
+  test("Read row group containing both dictionary and plain encoded 
pages") {
+spark.conf.set("parquet.dictionary.page.size", "2048")
+spark.conf.set("parquet.page.size", "4096")
+
+withTempPath { dir =>
+  // In order to explicitly test for SPARK-14217, we set the parquet 
dictionary and page size
+  // such that the following data spans across 3 pages (within a 
single row group) where the
+  // first page is dictionary encoded and the remaining two are plain 
encoded.
+  val data = (0 until 512).flatMap(i => Seq.fill(3)(i.toString))
+  data.toDF("f").coalesce(1).write.parquet(dir.getCanonicalPath)
+  val file =
+
SpecificParquetRecordReaderBase.listDirectory(dir).toArray.head.asInstanceOf[String]
+
+  val reader = new VectorizedParquetRecordReader
+  reader.initialize(file, null /* set columns to null to project all 
columns */)
--- End diff --

I meant `initialize(file, columns = null)` or even:

```
val projectAllColumns = null
initialize(file, projectAllColumns)
```

So you code what your intention is (without extra comments).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

2016-07-21 Thread sameeragarwal

Github user sameeragarwal commented on a diff in the pull request:

https://github.com/apache/spark/pull/14304#discussion_r71787200
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetEncodingSuite.scala
 ---
@@ -78,4 +78,29 @@ class ParquetEncodingSuite extends 
ParquetCompatibilityTest with SharedSQLContex
   }}
 }
   }
+
+  test("Read row group containing both dictionary and plain encoded 
pages") {
+spark.conf.set("parquet.dictionary.page.size", "2048")
+spark.conf.set("parquet.page.size", "4096")
+
+withTempPath { dir =>
+  // In order to explicitly test for SPARK-14217, we set the parquet 
dictionary and page size
+  // such that the following data spans across 3 pages (within a 
single row group) where the
+  // first page is dictionary encoded and the remaining two are plain 
encoded.
+  val data = (0 until 512).flatMap(i => List(i.toString, i.toString, 
i.toString))
+  data.toDF("f").coalesce(1).write.parquet(dir.getCanonicalPath)
+  val file = 
SpecificParquetRecordReaderBase.listDirectory(dir).toArray.head
+
+  val reader = new VectorizedParquetRecordReader
+  reader.initialize(file.asInstanceOf[String], null)
+  val batch = reader.resultBatch()
+  assert(reader.nextBatch())
+
+  (0 until 512).foreach { i =>
+assert(batch.column(0).getUTF8String(3 * i).toString == i.toString)
--- End diff --

Unfortunately, using ints wouldn't produce this hybrid encoding that we're 
testing for (it just ends up producing 2 dictionary encoded pages).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

2016-07-21 Thread sameeragarwal

Github user sameeragarwal commented on a diff in the pull request:

https://github.com/apache/spark/pull/14304#discussion_r71787031
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetEncodingSuite.scala
 ---
@@ -78,4 +78,29 @@ class ParquetEncodingSuite extends 
ParquetCompatibilityTest with SharedSQLContex
   }}
 }
   }
+
+  test("Read row group containing both dictionary and plain encoded 
pages") {
+spark.conf.set("parquet.dictionary.page.size", "2048")
+spark.conf.set("parquet.page.size", "4096")
+
+withTempPath { dir =>
+  // In order to explicitly test for SPARK-14217, we set the parquet 
dictionary and page size
+  // such that the following data spans across 3 pages (within a 
single row group) where the
+  // first page is dictionary encoded and the remaining two are plain 
encoded.
+  val data = (0 until 512).flatMap(i => List(i.toString, i.toString, 
i.toString))
+  data.toDF("f").coalesce(1).write.parquet(dir.getCanonicalPath)
+  val file = 
SpecificParquetRecordReaderBase.listDirectory(dir).toArray.head
+
+  val reader = new VectorizedParquetRecordReader
+  reader.initialize(file.asInstanceOf[String], null)
--- End diff --

This is calling into java code, so named parameters wouldn't work. I added 
a comment to make it clear.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

2016-07-21 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14304#discussion_r71780859
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetEncodingSuite.scala
 ---
@@ -78,4 +78,29 @@ class ParquetEncodingSuite extends 
ParquetCompatibilityTest with SharedSQLContex
   }}
 }
   }
+
+  test("Read row group containing both dictionary and plain encoded 
pages") {
+spark.conf.set("parquet.dictionary.page.size", "2048")
+spark.conf.set("parquet.page.size", "4096")
+
+withTempPath { dir =>
+  // In order to explicitly test for SPARK-14217, we set the parquet 
dictionary and page size
+  // such that the following data spans across 3 pages (within a 
single row group) where the
+  // first page is dictionary encoded and the remaining two are plain 
encoded.
+  val data = (0 until 512).flatMap(i => List(i.toString, i.toString, 
i.toString))
--- End diff --

What do you think about `Seq.fill(3)(i.toString)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

2016-07-21 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14304#discussion_r71774436
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetEncodingSuite.scala
 ---
@@ -78,4 +78,29 @@ class ParquetEncodingSuite extends 
ParquetCompatibilityTest with SharedSQLContex
   }}
 }
   }
+
+  test("Read row group containing both dictionary and plain encoded 
pages") {
+spark.conf.set("parquet.dictionary.page.size", "2048")
+spark.conf.set("parquet.page.size", "4096")
+
+withTempPath { dir =>
+  // In order to explicitly test for SPARK-14217, we set the parquet 
dictionary and page size
+  // such that the following data spans across 3 pages (within a 
single row group) where the
+  // first page is dictionary encoded and the remaining two are plain 
encoded.
+  val data = (0 until 512).flatMap(i => List(i.toString, i.toString, 
i.toString))
+  data.toDF("f").coalesce(1).write.parquet(dir.getCanonicalPath)
+  val file = 
SpecificParquetRecordReaderBase.listDirectory(dir).toArray.head
+
+  val reader = new VectorizedParquetRecordReader
+  reader.initialize(file.asInstanceOf[String], null)
+  val batch = reader.resultBatch()
+  assert(reader.nextBatch())
+
+  (0 until 512).foreach { i =>
+assert(batch.column(0).getUTF8String(3 * i).toString == i.toString)
--- End diff --

Two things here:

1. Create column in line 96 (`batch.column(0)`).
2. Since you convert `toString`, what do you think about `toInt` instead 
(since `i` is `Int` anyway). One conversion less :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

2016-07-21 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14304#discussion_r71773933
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetEncodingSuite.scala
 ---
@@ -78,4 +78,29 @@ class ParquetEncodingSuite extends 
ParquetCompatibilityTest with SharedSQLContex
   }}
 }
   }
+
+  test("Read row group containing both dictionary and plain encoded 
pages") {
+spark.conf.set("parquet.dictionary.page.size", "2048")
+spark.conf.set("parquet.page.size", "4096")
+
+withTempPath { dir =>
+  // In order to explicitly test for SPARK-14217, we set the parquet 
dictionary and page size
+  // such that the following data spans across 3 pages (within a 
single row group) where the
+  // first page is dictionary encoded and the remaining two are plain 
encoded.
+  val data = (0 until 512).flatMap(i => List(i.toString, i.toString, 
i.toString))
+  data.toDF("f").coalesce(1).write.parquet(dir.getCanonicalPath)
+  val file = 
SpecificParquetRecordReaderBase.listDirectory(dir).toArray.head
+
+  val reader = new VectorizedParquetRecordReader
+  reader.initialize(file.asInstanceOf[String], null)
--- End diff --

What do you think about moving this `asInstanceOf` to line 92 and using a 
name parameter for `null`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

2016-07-21 Thread sameeragarwal

GitHub user sameeragarwal opened a pull request:

https://github.com/apache/spark/pull/14304

[SPARK-16668][TEST] Test parquet reader for row groups containing both 
dictionary and plain encoded pages

## What changes were proposed in this pull request?

This patch adds an explicit test for [SPARK-14217] by setting the parquet 
dictionary and page size the generated parquet file spans across 3 pages 
(within a single row group) where the first page is dictionary encoded and the 
remaining two are plain encoded.

## How was this patch tested?

1. ParquetEncodingSuite
2. Also manually tested that this test fails without 
https://github.com/apache/spark/pull/12279

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sameeragarwal/spark hybrid-encoding-test

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14304.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14304


commit adffc4407a783bdf86d5ee5a26d289ee496d1247
Author: Sameer Agarwal 
Date:   2016-07-21T06:08:17Z

experiments

commit 5e7556cf96d991b2f38fda82d28256687f056474
Author: Sameer Agarwal 
Date:   2016-07-21T07:59:34Z

works

commit 6b688e97310f903066b4085cb0374e76a9baef0a
Author: Sameer Agarwal 
Date:   2016-07-21T18:29:53Z

cleanup

commit f3029080c449d40c1dde8e97b97f0354866788c4
Author: Sameer Agarwal 
Date:   2016-07-21T18:30:47Z

cleanup




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

15 matches

Site Navigation

Mail list logo

Footer information