svn commit: r22500 - in /dev/parquet/apache-parquet-format-2.4.0-rc1: ./ apache-parquet-format-2.4.0.tar.gz apache-parquet-format-2.4.0.tar.gz.asc apache-parquet-format-2.4.0.tar.gz.md5 apache-parquet

2017-10-16 Thread blue
Author: blue
Date: Tue Oct 17 00:10:07 2017
New Revision: 22500

Log:
Apache Parquet Format 2.4.0 RC1

Added:
dev/parquet/apache-parquet-format-2.4.0-rc1/

dev/parquet/apache-parquet-format-2.4.0-rc1/apache-parquet-format-2.4.0.tar.gz  
 (with props)

dev/parquet/apache-parquet-format-2.4.0-rc1/apache-parquet-format-2.4.0.tar.gz.asc

dev/parquet/apache-parquet-format-2.4.0-rc1/apache-parquet-format-2.4.0.tar.gz.md5

dev/parquet/apache-parquet-format-2.4.0-rc1/apache-parquet-format-2.4.0.tar.gz.sha

Added: 
dev/parquet/apache-parquet-format-2.4.0-rc1/apache-parquet-format-2.4.0.tar.gz
==
Binary file - no diff available.

Propchange: 
dev/parquet/apache-parquet-format-2.4.0-rc1/apache-parquet-format-2.4.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: 
dev/parquet/apache-parquet-format-2.4.0-rc1/apache-parquet-format-2.4.0.tar.gz.asc
==
--- 
dev/parquet/apache-parquet-format-2.4.0-rc1/apache-parquet-format-2.4.0.tar.gz.asc
 (added)
+++ 
dev/parquet/apache-parquet-format-2.4.0-rc1/apache-parquet-format-2.4.0.tar.gz.asc
 Tue Oct 17 00:10:07 2017
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+Version: GnuPG v1
+
+iQIcBAABAgAGBQJZ5UpbAAoJEIZ4HU+ksum1lyMQAIolpocrgjb8/HLepd2k0YZC
+hZNApEKYPgKBi0Vcv1osQqU6R7Ann6EV0yM9lInpo8Wb1OO6+aIh3dEX0dx3eX51
+rTg+z8lL73r5FXySn+zqJ30gmVlhkMZboviW5cZQRKJBTSWc/ATsz6dJ1HADZmGn
+4y8z18kirNhdlxEOJ/HRP26mCjYyQ6sasLHDfmQz4RK8lRb4XrcSBeSLWBpUY/TV
+EW/9DN64SuxaaT5dVszthzx6QxFKqwApUQJq9e1xVYLZTvxcL7sKljwCPJDhpIFq
+gVbnXYwzMnXOmX7OdTVaMyi2irRKsbNm2a8kPZq+Ocs+0wrQ1+dDohQLUwDBvRcf
+Tnyc44zGd8Q/3eSBXmXTkv8FpNpB95mpbita4HrPgJ/cF34XJj3x2KzD/3Stbo/B
+iBwfQ2Y9gaGmKmmu2FUfrLhcuszWxm8QOROl8ALCPp9xYx4zEb9hxgOKcCeoigA7
+A+RiOtUoh2cypnVLh1EQCgkbMRFLU7QcCPF68OCQDtr+jFynmgUfINyJSW9IT4F2
+y+KruEqBk112GmlTN0UG4hehg/Zzg442bC/QlHvSW95NUgQhQ000MCuvUFIETj1w
+dZv86dRJq36jcWgB7EgPQAlc9s43w/uX/XjlN07FGLhi+3TQctH0shjVTGV3cx7b
+ml50HL/FO/otGiEA9IKt
+=yJsD
+-END PGP SIGNATURE-

Added: 
dev/parquet/apache-parquet-format-2.4.0-rc1/apache-parquet-format-2.4.0.tar.gz.md5
==
--- 
dev/parquet/apache-parquet-format-2.4.0-rc1/apache-parquet-format-2.4.0.tar.gz.md5
 (added)
+++ 
dev/parquet/apache-parquet-format-2.4.0-rc1/apache-parquet-format-2.4.0.tar.gz.md5
 Tue Oct 17 00:10:07 2017
@@ -0,0 +1,2 @@
+apache-parquet-format-2.4.0.tar.gz: 32 13 9D E2 90 1D AF 7C  98 43 CE 1C 52 3F
+00 C8

Added: 
dev/parquet/apache-parquet-format-2.4.0-rc1/apache-parquet-format-2.4.0.tar.gz.sha
==
--- 
dev/parquet/apache-parquet-format-2.4.0-rc1/apache-parquet-format-2.4.0.tar.gz.sha
 (added)
+++ 
dev/parquet/apache-parquet-format-2.4.0-rc1/apache-parquet-format-2.4.0.tar.gz.sha
 Tue Oct 17 00:10:07 2017
@@ -0,0 +1 @@
+b26a31a09870f3805087a863854e35138adeea12  apache-parquet-format-2.4.0.tar.gz




[parquet-format] Git Push Summary

2017-10-16 Thread blue
Repository: parquet-format
Updated Tags:  refs/tags/apache-parquet-format-2.4.0 [created] 403dd0605


parquet-format git commit: [maven-release-plugin] prepare for next development iteration

2017-10-16 Thread blue
Repository: parquet-format
Updated Branches:
  refs/heads/master 3fb6b391d -> da4e39a15


[maven-release-plugin] prepare for next development iteration


Project: http://git-wip-us.apache.org/repos/asf/parquet-format/repo
Commit: http://git-wip-us.apache.org/repos/asf/parquet-format/commit/da4e39a1
Tree: http://git-wip-us.apache.org/repos/asf/parquet-format/tree/da4e39a1
Diff: http://git-wip-us.apache.org/repos/asf/parquet-format/diff/da4e39a1

Branch: refs/heads/master
Commit: da4e39a15b3fa6ad899ab23298e8697ec2c199e8
Parents: 3fb6b39
Author: Ryan Blue 
Authored: Mon Oct 16 17:07:13 2017 -0700
Committer: Ryan Blue 
Committed: Mon Oct 16 17:07:13 2017 -0700

--
 pom.xml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/parquet-format/blob/da4e39a1/pom.xml
--
diff --git a/pom.xml b/pom.xml
index 98ca595..5c9032c 100644
--- a/pom.xml
+++ b/pom.xml
@@ -28,7 +28,7 @@
 
   org.apache.parquet
   parquet-format
-  2.4.0
+  2.4.1-SNAPSHOT
   jar
 
   Apache Parquet Format
@@ -39,7 +39,7 @@
 scm:git:g...@github.com:apache/parquet-format.git
 scm:git:g...@github.com:apache/parquet-format.git
 
scm:git:https://git-wip-us.apache.org/repos/asf/parquet-format.git
-apache-parquet-format-2.4.0
+HEAD
   
 
   



parquet-format git commit: PARQUET-1134: Update CHANGES.md.

2017-10-16 Thread blue
Repository: parquet-format
Updated Branches:
  refs/heads/master f1de77d31 -> 54cc08d2c


PARQUET-1134: Update CHANGES.md.

Also cleaning up old PRs:
Closes #37


Project: http://git-wip-us.apache.org/repos/asf/parquet-format/repo
Commit: http://git-wip-us.apache.org/repos/asf/parquet-format/commit/54cc08d2
Tree: http://git-wip-us.apache.org/repos/asf/parquet-format/tree/54cc08d2
Diff: http://git-wip-us.apache.org/repos/asf/parquet-format/diff/54cc08d2

Branch: refs/heads/master
Commit: 54cc08d2c752d51f054f87efe4aa3d984794a6b0
Parents: f1de77d
Author: Ryan Blue 
Authored: Mon Oct 16 17:01:33 2017 -0700
Committer: Ryan Blue 
Committed: Mon Oct 16 17:01:33 2017 -0700

--
 CHANGES.md | 37 +
 1 file changed, 37 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/parquet-format/blob/54cc08d2/CHANGES.md
--
diff --git a/CHANGES.md b/CHANGES.md
index befe532..85d710c 100644
--- a/CHANGES.md
+++ b/CHANGES.md
@@ -19,6 +19,43 @@
 
 # Parquet #
 
+### Version 2.4.0 ###
+
+ Bug
+
+*   [PARQUET-255](https://issues.apache.org/jira/browse/PARQUET-255) - Typo in 
decimal type specification
+*   [PARQUET-322](https://issues.apache.org/jira/browse/PARQUET-322) - 
Document ENUM as a logical type
+*   [PARQUET-412](https://issues.apache.org/jira/browse/PARQUET-412) - Format: 
Do not shade slf4j-api
+*   [PARQUET-419](https://issues.apache.org/jira/browse/PARQUET-419) - Update 
dev script in parquet-cpp to remove incubator.
+*   [PARQUET-655](https://issues.apache.org/jira/browse/PARQUET-655) - The 
LogicalTypes.md link in README.md points to the old Parquet GitHub repository
+*   [PARQUET-1031](https://issues.apache.org/jira/browse/PARQUET-1031) - Fix 
spelling errors, whitespace, GitHub urls
+*   [PARQUET-1032](https://issues.apache.org/jira/browse/PARQUET-1032) - 
Change link in Encodings.md for variable length encoding
+*   [PARQUET-1050](https://issues.apache.org/jira/browse/PARQUET-1050) - The 
comment of Parquet Format Thrift definition file error
+*   [PARQUET-1076](https://issues.apache.org/jira/browse/PARQUET-1076) - 
[Format] Switch to long key ids in KEYs file
+*   [PARQUET-1091](https://issues.apache.org/jira/browse/PARQUET-1091) - Wrong 
and broken links in README
+*   [PARQUET-1102](https://issues.apache.org/jira/browse/PARQUET-1102) - 
Travis CI builds are failing for parquet-format PRs
+*   [PARQUET-1134](https://issues.apache.org/jira/browse/PARQUET-1134) - 
Release Parquet format 2.4.0
+*   [PARQUET-1136](https://issues.apache.org/jira/browse/PARQUET-1136) - 
Makefile is broken
+
+ Improvement
+
+*   [PARQUET-371](https://issues.apache.org/jira/browse/PARQUET-371) - Bumps 
Thrift version to 0.9.3
+*   [PARQUET-407](https://issues.apache.org/jira/browse/PARQUET-407) - 
Incorrect delta-encoding example
+*   [PARQUET-428](https://issues.apache.org/jira/browse/PARQUET-428) - Support 
INT96 and FIXED_LEN_BYTE_ARRAY types
+*   [PARQUET-601](https://issues.apache.org/jira/browse/PARQUET-601) - Add 
support in Parquet to configure the encoding used by ValueWriters
+*   [PARQUET-609](https://issues.apache.org/jira/browse/PARQUET-609) - Add 
Brotli compression to Parquet format
+*   [PARQUET-757](https://issues.apache.org/jira/browse/PARQUET-757) - Add 
NULL type to Bring Parquet logical types to par with Arrow
+*   [PARQUET-804](https://issues.apache.org/jira/browse/PARQUET-804) - 
parquet-format README.md still links to the old Google group
+*   [PARQUET-922](https://issues.apache.org/jira/browse/PARQUET-922) - Add 
index pages to the format to support efficient page skipping
+*   [PARQUET-1049](https://issues.apache.org/jira/browse/PARQUET-1049) - Make 
thrift version a property in pom.xml
+
+ Task
+
+*   [PARQUET-450](https://issues.apache.org/jira/browse/PARQUET-450) - Small 
typos/issues in parquet-format documentation
+*   [PARQUET-667](https://issues.apache.org/jira/browse/PARQUET-667) - Update 
committers lists to point to apache website
+*   [PARQUET-1124](https://issues.apache.org/jira/browse/PARQUET-1124) - Add 
new compression codecs to the Parquet spec
+*   [PARQUET-1125](https://issues.apache.org/jira/browse/PARQUET-1125) - Add 
UUID logical type
+
 ### Version 2.2.0 ###
 
 * [PARQUET-23](https://issues.apache.org/jira/browse/PARQUET-23): Rename 
packages and maven coordinates to org.apache



parquet-format git commit: PARQUET-922: Add column indexes to parquet.thrift

2017-10-16 Thread blue
Repository: parquet-format
Updated Branches:
  refs/heads/master 65f105707 -> f1de77d31


PARQUET-922: Add column indexes to parquet.thrift

I moved the design doc to a .md file and addressed the first round of review 
comments.

closes #63

This is based on work done by @mkornacker and @lekv who wrote the initial 
proposal and @poojanilangekar who evolved the design, wrote a prototypical 
implementation, and evaluated its performance.

Author: Lars Volker 
Author: poojanilangekar 
Author: Lars Volker 

Closes #72 from lekv/index and squashes the following commits:

babb356 [Lars Volker] Address comments from Marcel and Zoltan.
6897c2b [Lars Volker] Address Marcel's comments.
bbb3670 [Lars Volker] Reinstate PageIndex.md
ebcb33f [Lars Volker] Revert "Extend comments in parquet.thrift, remove 
PageIndex.md"
877e14c [Lars Volker] Revert "Remove picture"
5df2bbc [Lars Volker] Remove picture
a39bf49 [Lars Volker] Extend comments in parquet.thrift, remove PageIndex.md
9ea100a [Lars Volker] Address comments from Zoltan.
9f79d72 [Lars Volker] Merge branch 'master' into index
5e8ea1c [Lars Volker] Fix Typo
da6f648 [Lars Volker] Addressing more comments
8541da7 [Lars Volker] Addressing review comments from the Parquet sync meeting
8e3c533 [Lars Volker] More review comments
109b20d [Lars Volker] Address more review comments, clarify the description of 
ColumnIndex
f5bfe55 [Lars Volker] Address review comments on parquet.thrift.
700cc00 [Lars Volker] PARQUET-922: Add documentation on page indexes
f983794 [poojanilangekar] PARQUET-922: ColumnIndex Layout to Support Page 
Skipping


Project: http://git-wip-us.apache.org/repos/asf/parquet-format/repo
Commit: http://git-wip-us.apache.org/repos/asf/parquet-format/commit/f1de77d3
Tree: http://git-wip-us.apache.org/repos/asf/parquet-format/tree/f1de77d3
Diff: http://git-wip-us.apache.org/repos/asf/parquet-format/diff/f1de77d3

Branch: refs/heads/master
Commit: f1de77d31936f4d50f1286676a0034b6339918ee
Parents: 65f1057
Author: Lars Volker 
Authored: Mon Oct 16 16:47:12 2017 -0700
Committer: Ryan Blue 
Committed: Mon Oct 16 16:47:12 2017 -0700

--
 Makefile   |   7 +++
 PageIndex.md   | 101 
 README.md  |   4 ++
 doc/images/PageIndexLayout.png | Bin 0 -> 7442 bytes
 src/main/thrift/parquet.thrift |  85 ++
 5 files changed, 197 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/parquet-format/blob/f1de77d3/Makefile
--
diff --git a/Makefile b/Makefile
index d4cbf83..17750c1 100644
--- a/Makefile
+++ b/Makefile
@@ -17,7 +17,14 @@
 # under the License.
 #
 
+.PHONY: doc
+
 thrift:
mkdir -p generated
thrift --gen cpp -o generated src/main/thrift/parquet.thrift
thrift --gen java -o generated src/main/thrift/parquet.thrift
+
+%.html: %.md
+   pandoc -f markdown_github -t html -o $@ $<
+
+doc: README.html PageIndex.html LogicalTypes.html

http://git-wip-us.apache.org/repos/asf/parquet-format/blob/f1de77d3/PageIndex.md
--
diff --git a/PageIndex.md b/PageIndex.md
new file mode 100644
index 000..7ac6e42
--- /dev/null
+++ b/PageIndex.md
@@ -0,0 +1,101 @@
+
+
+# ColumnIndex Layout to Support Page Skipping
+
+This documents describes the format for column index pages in the Parquet
+footer. These pages contain statistics for DataPages and can be used to skip
+pages when scanning data in ordered and unordered columns.
+
+## Problem Statement
+In previous versions of the format, Statistics are stored for ColumnChunks in
+ColumnMetaData and for individual pages inside DataPageHeader structs. When
+reading pages, a reader had to process the page header in order to determine
+whether the page could be skipped based on the statistics. This means the 
reader
+had to access all pages in a column, thus likely reading most of the column
+data from disk.
+
+## Goals
+1. Make both range scans and point lookups I/O efficient by allowing direct
+   access to pages based on their min and max values. In particular:
+2. A single-row lookup in a rowgroup based on the sort column of that rowgroup
+   will only read one data page per retrieved column.
+* Range scans on the sort column will only need to read the exact data
+  pages that contain relevant data.
+* Make other selective scans I/O efficient: if we have a very selective
+  predicate on a non-sorting column, for the other retrieved columns we
+  should only need to access data pages that contain matching rows.
+3. No additional decoding effort for scans without selective predicates, e.g.,
+   full-row group 

parquet-cpp git commit: PARQUET-1138: Fix Arrow 0.7.1 build

2017-10-16 Thread uwe
Repository: parquet-cpp
Updated Branches:
  refs/heads/master 475be0ba7 -> 06c5fb88c


PARQUET-1138: Fix Arrow 0.7.1 build

This is a very minor issue with the 1.3.1 RC0. If this build passes cleanly I 
will vote to approve the release as this only affects this unit test

Author: Wes McKinney 

Closes #410 from wesm/arrow-0.7.1-fix-build and squashes the following commits:

fd6a527 [Wes McKinney] Add comment
f95ff0b [Wes McKinney] Fix compilation with Arrow 0.7.1, set 0.7.1 in 
ThirdpartyToolchain.cmake


Project: http://git-wip-us.apache.org/repos/asf/parquet-cpp/repo
Commit: http://git-wip-us.apache.org/repos/asf/parquet-cpp/commit/06c5fb88
Tree: http://git-wip-us.apache.org/repos/asf/parquet-cpp/tree/06c5fb88
Diff: http://git-wip-us.apache.org/repos/asf/parquet-cpp/diff/06c5fb88

Branch: refs/heads/master
Commit: 06c5fb88c722158be5f9413cd55b988af8f9ef82
Parents: 475be0b
Author: Wes McKinney 
Authored: Mon Oct 16 20:58:26 2017 +0200
Committer: Uwe L. Korn 
Committed: Mon Oct 16 20:58:26 2017 +0200

--
 cmake_modules/ThirdpartyToolchain.cmake   | 2 +-
 src/parquet/arrow/arrow-reader-writer-test.cc | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/parquet-cpp/blob/06c5fb88/cmake_modules/ThirdpartyToolchain.cmake
--
diff --git a/cmake_modules/ThirdpartyToolchain.cmake 
b/cmake_modules/ThirdpartyToolchain.cmake
index 3961abd..a470fc1 100644
--- a/cmake_modules/ThirdpartyToolchain.cmake
+++ b/cmake_modules/ThirdpartyToolchain.cmake
@@ -366,7 +366,7 @@ if (NOT ARROW_FOUND)
 -DARROW_BUILD_TESTS=OFF)
 
   if ("$ENV{PARQUET_ARROW_VERSION}" STREQUAL "")
-set(ARROW_VERSION "8309556c7d2b0e14df1422baa574cf2de8c1bd3b")
+set(ARROW_VERSION "0e21f84c2fc26dba949a03ee7d7ebfade0a65b81")  # Arrow 
0.7.1
   else()
 set(ARROW_VERSION "$ENV{PARQUET_ARROW_VERSION}")
   endif()

http://git-wip-us.apache.org/repos/asf/parquet-cpp/blob/06c5fb88/src/parquet/arrow/arrow-reader-writer-test.cc
--
diff --git a/src/parquet/arrow/arrow-reader-writer-test.cc 
b/src/parquet/arrow/arrow-reader-writer-test.cc
index fc6410d..a18c565 100644
--- a/src/parquet/arrow/arrow-reader-writer-test.cc
+++ b/src/parquet/arrow/arrow-reader-writer-test.cc
@@ -951,7 +951,7 @@ TEST_F(TestNullParquetIO, NullDictionaryColumn) {
 
   std::shared_ptr expected_values =
   std::make_shared<::arrow::NullArray>(SMALL_SIZE);
-  AssertArraysEqual(*expected_values, *chunked_array->chunk(0));
+  internal::AssertArraysEqual(*expected_values, *chunked_array->chunk(0));
 }
 
 template