[ 
https://issues.apache.org/jira/browse/PARQUET-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17249017#comment-17249017
 ] 

ASF GitHub Bot commented on PARQUET-1950:
-----------------------------------------

gszadovszky commented on a change in pull request #164:
URL: https://github.com/apache/parquet-format/pull/164#discussion_r542431137



##########
File path: CoreFeatures.md
##########
@@ -0,0 +1,181 @@
+<!--
+  - Licensed to the Apache Software Foundation (ASF) under one
+  - or more contributor license agreements.  See the NOTICE file
+  - distributed with this work for additional information
+  - regarding copyright ownership.  The ASF licenses this file
+  - to you under the Apache License, Version 2.0 (the
+  - "License"); you may not use this file except in compliance
+  - with the License.  You may obtain a copy of the License at
+  -
+  -   http://www.apache.org/licenses/LICENSE-2.0
+  -
+  - Unless required by applicable law or agreed to in writing,
+  - software distributed under the License is distributed on an
+  - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  - KIND, either express or implied.  See the License for the
+  - specific language governing permissions and limitations
+  - under the License.
+  -->
+
+# Parquet Core Features
+
+This document lists the core features for each parquet-format release. This
+list is a subset of the features which parquet-format makes available.
+
+## Purpose
+
+The list of core features for a certian release makes a compliance level that
+the different implementations can tied to. If an implementation claims that it
+provides the functionality of a parquet-format release core features it must
+implement all of the listed features according the specification (both read and
+write path). This way it is easier to ensure compatibility between the
+different parquet implementations.
+We cannot and don't want to stop our clients to use any features that are not
+on this list but it shall be highlighted that using these features might make
+the written parquet files unreadable by other implementations. We can say that
+the features available in a parquet-format release (and one of the
+implementations of it) and not on this list are experimental.
+
+## Versioning
+
+This document is versioned by the parquet-format releases which follows the
+scheme of semantic versioning. It means that no feature will be deleted from
+this document under the same major version. (We might deprecate some, though.)
+Because of the semantic versioning if one implementation supports the core
+features of the parquet-format release `a.b.x` it must be able to read any
+parquet files written by implementations supporting the release `a.d.y` where
+`b >= d`.
+
+If a parquet file is written according to a released version of this document
+it might be a good idea to write this version into the field `compliance_level`
+in the thrift object `FileMetaData`.
+
+## Adding new features
+
+The idea is to only include features which are specified correctly and proven
+to be useful for everyone. Because of that we require to have at least two
+different implementations that are released and widely tested.
+
+## The "list"
+
+This list is based on the [parquet thrift file](src/main/thrift/parquet.thrift)
+where all the data structures we might use in a parquet file are defined.
+
+### File structure
+
+All of the required fields in the structure (and sub-structures) of
+`FileMetaData` must be set according to the specification.
+The following page types are supported:
+* Data page V1 (see `DataPageHeader`)
+* Dictionary page (see `DictionaryPageHeader`)

Review comment:
       Based on parquet.thrift V2 only allows to select if a page is compressed 
or not. The compression codec is specified in `ColumnMetaData` so it can be set 
by column (and not page) for both V1 and V2.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Define core features / compliance level
> ---------------------------------------
>
>                 Key: PARQUET-1950
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1950
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-format
>            Reporter: Gabor Szadovszky
>            Assignee: Gabor Szadovszky
>            Priority: Major
>
> Parquet format is getting more and more features while the different 
> implementations cannot keep the pace and left behind with some features 
> implemented and some are not. In many cases it is also not clear if the 
> related feature is mature enough to be used widely or more an experimental 
> one.
> These are huge issues that makes hard ensure interoperability between the 
> different implementations.
> The following idea came up in a 
> [discussion|https://lists.apache.org/thread.html/rde5cba8443487bccd47593ddf5dfb39f69c729d260165cb936a1a289%40%3Cdev.parquet.apache.org%3E].
>  Create a now document in the parquet-format repository that lists the "core 
> features". This document is versioned by the parquet-format releases. This 
> way a certain version of "core features" defines a level of compatibility 
> between the different implementations. This version number can be written to 
> a new field (e.g. complianceLevel) in the footer. If an implementation writes 
> a file with a version in the field it must implement all the related "core 
> features" (read and write) and must not use any other features at write 
> because it makes the data unreadable by another implementation if only the 
> same level of "core features" are implemented.
> For example if we have encoding A listed in the version 1 "core features" but 
> encoding B is not then at "complianceLevel = 1" we can use encoding A but we 
> cannot use encoding B because it would make the related data unreadable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to