[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2021-07-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17386360#comment-17386360
 ] 

ASF GitHub Bot commented on PARQUET-968:


ccpstephanie commented on pull request #411:
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-885759396


   Although it's closed but I'm a bit confused... why I always get the old 
schema version?  `parquet.proto.writeSpecsCompliant=false`, and directly using 
ParquetWriter. I'm using the latest version currently 1.12.0.
   
   I'd highly appreciate if someone could point out something stupid in my 
code! Or it's the same issue you are experiencing? 
   
   My goal is to be able query data via Athena/Presto, or Hive Metastore, so 
need the the new parquet schema version.
   
   
   **Method 1:**
   
 // Doesn't work!
   Configuration conf = new Configuration();
   ProtoWriteSupport.setWriteSpecsCompliant(conf, false); // If set to 
true, the old schema style will be used (without wrappers).
   
   ParquetWriter writer =
   
ProtoParquetWriter.builder(file).withMessage(cls).withConf(conf).build();
   
   for (MessageOrBuilder record : records) {
   writer.write(record);
   }
   
   writer.close();
   System.err.println(writer.getFooter());
   
   **Method 2:**
   
 // Doesn't work!
   Configuration conf = new Configuration();
   ProtoWriteSupport.setWriteSpecsCompliant(conf, false); // If set to 
true, the old schema style will be used (without wrappers).
   
   try (ParquetWriter writer = new ParquetWriter(
   file,
   new 
ProtoWriteSupport(AddressBook.class),
   CompressionCodecName.GZIP,
   128 * 1024 * 
1024,//PARQUET_BLOCK_SIZE,
   
ParquetProperties.DEFAULT_PAGE_SIZE,
   
ParquetProperties.DEFAULT_PAGE_SIZE, 
   true,
   false,
   
ParquetProperties.DEFAULT_WRITER_VERSION,
   conf)) {
   for (Object record : messages) {
   writer.write(record);
   }
   writer.close();
   System.err.println(writer.getFooter());
   
   **Parquet output Metadata:**
   `
   _ParquetMetaData{FileMetaData{schema: message AddressBookProtos.AddressBook {
 repeated group people = 1 {
   optional binary name (STRING) = 1;
   optional int32 id = 2;
   optional binary email (STRING) = 3;
   repeated group phones = 4 {
 optional binary number (STRING) = 1;
 optional binary type (ENUM) = 2;
   }  }}
   , metadata: {parquet.proto.descriptor=name: "AddressBook"
   field {
 name: "people"
 number: 1
 label: LABEL_REPEATED
 type: TYPE_MESSAGE
 type_name: ".AddressBookProtos.Person"}
   , parquet.proto.writeSpecsCompliant=false,
   ...}
   `
   
   **Protobuf Messasge:**
   
   `
   syntax = "proto3";
   
   package AddressBookProtos;
   
   option java_multiple_files = true;
   option java_package = "com.mycompany.app";
   option java_outer_classname = "AddressBookProtos";
   
   message Person {
 string name = 1;
 int32 id = 2;
 string email = 3;
   
 enum PhoneType {
   MOBILE = 0;
   HOME = 1;
   WORK = 2;
 }
   
 message PhoneNumber {
   string number = 1;
   PhoneType type = 2;
 }
   
 repeated PhoneNumber phones = 4;
   }
   
   message AddressBook {
 repeated Person people = 1;
   }
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Assignee: Constantin Muraru
>Priority: Major
> Fix For: 1.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-09-15 Thread Scott Clasen (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616413#comment-16616413
 ] 

Scott Clasen commented on PARQUET-968:
--

Hi- 

   Is there a snapshot repo anywhere where this has been published? would love 
to try it out.

Any release schedule for when this gets in a non snapshot yet?

 

Thank you!

SC

> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Assignee: Constantin Muraru
>Priority: Major
> Fix For: 1.11.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-05-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460119#comment-16460119
 ] 

ASF GitHub Bot commented on PARQUET-968:


chawlakunal commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-385781922
 
 
   @BenoitHanotte Here's the PR for constructor with flag #473 
   
   Also, if a minor release can be done for this fix it will be greatly 
appreciated.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Assignee: Constantin Muraru
>Priority: Major
> Fix For: 1.11
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-05-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459834#comment-16459834
 ] 

ASF GitHub Bot commented on PARQUET-968:


BenoitHanotte commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-385718339
 
 
   I believe the 1.10.0 has just been released, so this will liekly land in the 
next "major" release, unfortunately I am not aware of any plan to have a new 
release in the near future.
   
   For the `ProtoParquetWriter` class, we have discussed it with @costimuraru 
and we will add a constructor with the flag in a future PR but I can't commit 
on a timeframe.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Assignee: Constantin Muraru
>Priority: Major
> Fix For: 1.11
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458886#comment-16458886
 ] 

ASF GitHub Bot commented on PARQUET-968:


chawlakunal commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-385494498
 
 
   @BenoitHanotte Is there a timeline of when this will be released?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Assignee: Constantin Muraru
>Priority: Major
> Fix For: 1.11
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458294#comment-16458294
 ] 

ASF GitHub Bot commented on PARQUET-968:


chawlakunal commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-385307748
 
 
   @BenoitHanotte That's exactly how I am using it right now but it kind of 
defeats the purpose of having ProtoParquetWriter class.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Assignee: Constantin Muraru
>Priority: Major
> Fix For: 1.11
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458109#comment-16458109
 ] 

ASF GitHub Bot commented on PARQUET-968:


BenoitHanotte commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-385270695
 
 
   @chawlakunal you can manually create your ParquetWriter by providing the 
ProtoWriteSupport as following:
   ```
   Configuration conf = new Configuration();
   ProtoWriteSupport.setWriteSpecsCompliant(true, conf); // set the flag in the 
configuration
   new ParquetWriter(file, conf, new ProtoWriteSupport(protoClass));
   ```
   (or any variation of this as ParquetWriter has multiple constructors that 
accept a configuration in which we can set the flag)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Assignee: Constantin Muraru
>Priority: Major
> Fix For: 1.11
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456936#comment-16456936
 ] 

ASF GitHub Bot commented on PARQUET-968:


chawlakunal commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-385068068
 
 
   @BenoitHanotte @costimuraru @julienledem  There is no way to instantiate 
ProtoParquetWriter with parquet.proto.writeSpecsCompliant flag enabled. Am I 
missing something or is this intentional? It would have been great if a 
constructor to enable the flag was provided.
   
   ```
   public ProtoParquetWriter(Path file, Class protoMessage,
 CompressionCodecName compressionCodecName, int blockSize, int 
pageSize, boolean enableDictionary,
 boolean validating, boolean writeSpecsCompliant) throws 
IOException {
 super(file, new ProtoWriteSupport(protoMessage), compressionCodecName, 
blockSize, pageSize, pageSize,
 enableDictionary, validating, DEFAULT_WRITER_VERSION,
 getConfigWithWriteSpecsCompliant(writeSpecsCompliant));
 }
 
 private static Configuration getConfigWithWriteSpecsCompliant(boolean 
writeSpecsCompliant) {
 Configuration config = new Configuration();
 ProtoWriteSupport.setWriteSpecsCompliant(config, writeSpecsCompliant);
 return config;
 }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Assignee: Constantin Muraru
>Priority: Major
> Fix For: 1.11
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456935#comment-16456935
 ] 

ASF GitHub Bot commented on PARQUET-968:


chawlakunal commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-385068068
 
 
   @BenoitHanotte @costimuraru There is no way to instantiate 
ProtoParquetWriter with parquet.proto.writeSpecsCompliant flag enabled. Am I 
missing something or is this intentional? It would have been great if a 
constructor to enable the flag was provided.
   
   ```
   public ProtoParquetWriter(Path file, Class protoMessage,
 CompressionCodecName compressionCodecName, int blockSize, int 
pageSize, boolean enableDictionary,
 boolean validating, boolean writeSpecsCompliant) throws 
IOException {
 super(file, new ProtoWriteSupport(protoMessage), compressionCodecName, 
blockSize, pageSize, pageSize,
 enableDictionary, validating, DEFAULT_WRITER_VERSION,
 getConfigWithWriteSpecsCompliant(writeSpecsCompliant));
 }
 
 private static Configuration getConfigWithWriteSpecsCompliant(boolean 
writeSpecsCompliant) {
 Configuration config = new Configuration();
 ProtoWriteSupport.setWriteSpecsCompliant(config, writeSpecsCompliant);
 return config;
 }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Assignee: Constantin Muraru
>Priority: Major
> Fix For: 1.11
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456933#comment-16456933
 ] 

ASF GitHub Bot commented on PARQUET-968:


chawlakunal commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-385068068
 
 
   @BenoitHanotte @costimuraru There is no way to instantiate 
ProtoParquetWriter with parquet.proto.writeSpecsCompliant flag enabled. Am I 
missing something or is this intentional? It would have been great if a 
constructor to enable the flag was provided.
   
   ```
   public QcProtoParquetWriter(Path file, Class protoMessage,
 CompressionCodecName compressionCodecName, int blockSize, int 
pageSize, boolean enableDictionary,
 boolean validating, boolean writeSpecsCompliant) throws 
IOException {
 super(file, new ProtoWriteSupport(protoMessage), compressionCodecName, 
blockSize, pageSize, pageSize,
 enableDictionary, validating, DEFAULT_WRITER_VERSION,
 getConfigWithWriteSpecsCompliant(writeSpecsCompliant));
 }
 
 private static Configuration getConfigWithWriteSpecsCompliant(boolean 
writeSpecsCompliant) {
 Configuration config = new Configuration();
 ProtoWriteSupport.setWriteSpecsCompliant(config, writeSpecsCompliant);
 return config;
 }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Assignee: Constantin Muraru
>Priority: Major
> Fix For: 1.11
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-26 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453972#comment-16453972
 ] 

Julien Le Dem commented on PARQUET-968:
---

merged in 
https://github.com/apache/parquet-mr/commit/f84938441be49c665595c936ac631c3e5f171bf9

> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Assignee: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453965#comment-16453965
 ] 

ASF GitHub Bot commented on PARQUET-968:


julienledem closed pull request #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java
 
b/parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java
index b5649a05b..d5f43e6b3 100644
--- 
a/parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java
+++ 
b/parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java
@@ -24,12 +24,14 @@
 import com.twitter.elephantbird.util.Protobufs;
 import org.apache.parquet.column.Dictionary;
 import org.apache.parquet.io.InvalidRecordException;
+import org.apache.parquet.io.ParquetDecodingException;
 import org.apache.parquet.io.api.Binary;
 import org.apache.parquet.io.api.Converter;
 import org.apache.parquet.io.api.GroupConverter;
 import org.apache.parquet.io.api.PrimitiveConverter;
 import org.apache.parquet.schema.GroupType;
 import org.apache.parquet.schema.IncompatibleSchemaModificationException;
+import org.apache.parquet.schema.OriginalType;
 import org.apache.parquet.schema.Type;
 
 import java.util.HashMap;
@@ -129,10 +131,15 @@ public void add(Object value) {
   };
 }
 
+if (OriginalType.LIST == parquetType.getOriginalType()) {
+  return new ListConverter(parentBuilder, fieldDescriptor, parquetType);
+}
+if (OriginalType.MAP == parquetType.getOriginalType()) {
+  return new MapConverter(parentBuilder, fieldDescriptor, parquetType);
+}
 return newScalarConverter(parent, parentBuilder, fieldDescriptor, 
parquetType);
   }
 
-
   private Converter newScalarConverter(ParentValueContainer pvc, 
Message.Builder parentBuilder, Descriptors.FieldDescriptor fieldDescriptor, 
Type parquetType) {
 
 JavaType javaType = fieldDescriptor.getJavaType();
@@ -345,4 +352,121 @@ public void addBinary(Binary binary) {
 }
 
   }
+
+  /**
+   * This class unwraps the additional LIST wrapper and makes it possible to 
read the underlying data and then convert
+   * it to protobuf.
+   * 
+   * Consider the following protobuf schema:
+   * message SimpleList {
+   *   repeated int64 first_array = 1;
+   * }
+   * 
+   * A LIST wrapper is created in parquet for the above mentioned protobuf 
schema:
+   * message SimpleList {
+   *   optional group first_array (LIST) = 1 {
+   * repeated group list {
+   * optional int32 element;
+   * }
+   *   }
+   * }
+   * 
+   * The LIST wrappers are used by 3rd party tools, such as Hive, to read 
parquet arrays. The wrapper contains
+   * a repeated group named 'list', itself containing only one field called 
'element' of the type of the repeated
+   * object (can be a primitive as in this example or a group in case of a 
repeated message in protobuf).
+   */
+  final class ListConverter extends GroupConverter {
+private final Converter converter;
+
+public ListConverter(Message.Builder parentBuilder, 
Descriptors.FieldDescriptor fieldDescriptor, Type parquetType) {
+  OriginalType originalType = parquetType.getOriginalType();
+  if (originalType != OriginalType.LIST || parquetType.isPrimitive()) {
+throw new ParquetDecodingException("Expected LIST wrapper. Found: " + 
originalType + " instead.");
+  }
+
+  GroupType rootWrapperType = parquetType.asGroupType();
+  if (!rootWrapperType.containsField("list") || 
rootWrapperType.getType("list").isPrimitive()) {
+throw new ParquetDecodingException("Expected repeated 'list' group 
inside LIST wrapperr but got: " + rootWrapperType);
+  }
+
+  GroupType listType = rootWrapperType.getType("list").asGroupType();
+  if (!listType.containsField("element")) {
+throw new ParquetDecodingException("Expected 'element' inside repeated 
list group but got: " + listType);
+  }
+
+  Type elementType = listType.getType("element");
+  converter = newMessageConverter(parentBuilder, fieldDescriptor, 
elementType);
+}
+
+@Override
+public Converter getConverter(int fieldIndex) {
+  if (fieldIndex > 0) {
+throw new ParquetDecodingException("Unexpected multiple fields in the 
LIST wrapper");
+  }
+
+  return new GroupConverter() {
+@Override
+public Converter getConverter(int fieldIndex) {
+  return converter;
+}
+
+@Override
+public void start() {
+
+}
+
+

[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451381#comment-16451381
 ] 

ASF GitHub Bot commented on PARQUET-968:


chawlakunal commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-384106975
 
 
   When can this be expected to be merged to master and released?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450169#comment-16450169
 ] 

ASF GitHub Bot commented on PARQUET-968:


julienledem commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-383997195
 
 
   This looks good.
   Thank you for this collaborative effort!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450040#comment-16450040
 ] 

ASF GitHub Bot commented on PARQUET-968:


lukasnalezenec commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-383969694
 
 
   Hi, I already did. 
   There is one typo in comment and it is little bit harder to read - i wanted 
to check flow once more. I think we can commit it as it is.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449697#comment-16449697
 ] 

ASF GitHub Bot commented on PARQUET-968:


BenoitHanotte commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-383903265
 
 
   Hello @lukasnalezenec, have you had time to have a look? Thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440773#comment-16440773
 ] 

ASF GitHub Bot commented on PARQUET-968:


BenoitHanotte commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-381955530
 
 
   @lukasnalezenec this is the schema written for a map with the flag set to 
true:
   ```
   optional group nonEmptyMap (MAP) = 5 {
 repeated group key_value {
   required int32 key;
   optional int32 value;
 }
   }
   ```
   
   There is a test suite at 
https://github.com/BenoitHanotte/parquet-968/blob/master/src/test/scala/org/bhnte/parquet968/SparkTest.scala
 to validate the support of empty maps and lists as well as to check that Spark 
is now able to correctly interpret maps as such with the flag set to true


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440765#comment-16440765
 ] 

ASF GitHub Bot commented on PARQUET-968:


BenoitHanotte commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-381955530
 
 
   @lukasnalezenec this is the schema written for a map with the flag set to 
true:
   ```
   optional group nonEmptyMap (MAP) = 5 {
 repeated group key_value {
   required int32 key;
   optional int32 value;
 }
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440764#comment-16440764
 ] 

ASF GitHub Bot commented on PARQUET-968:


BenoitHanotte commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-381955530
 
 
   @lukasnalezenec this is the schema written for a map with the flag set to 
true:
   ```
   optional group emptyMap (MAP) = 5 {
 repeated group key_value {
   required int32 key;
   optional int32 value;
 }
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440763#comment-16440763
 ] 

ASF GitHub Bot commented on PARQUET-968:


BenoitHanotte commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-381955530
 
 
   @lukasnalezenec this is the schema written with the specCompliantFlag set to 
true:
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440573#comment-16440573
 ] 

ASF GitHub Bot commented on PARQUET-968:


lukasnalezenec commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-381890716
 
 
   Hi,
   I will look to it and merge it ASAP.  
   Last think i want to check is the map repetition - how it's done ? Is that 
correct for sure?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440534#comment-16440534
 ] 

ASF GitHub Bot commented on PARQUET-968:


BenoitHanotte commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-381881399
 
 
   @costimuraru Perfect, thanks for merging my PR into your branch! :)
   Then I believe we have an agreement, all the comments have been addressed 
and this PR has been successfully tested. @julienledem, if your are ok with it, 
we should be ready to merge. Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437369#comment-16437369
 ] 

ASF GitHub Bot commented on PARQUET-968:


costimuraru commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-381155272
 
 
   Would be great if we could merge this by Apr 29, when this PR will turn one 
year :D


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437294#comment-16437294
 ] 

ASF GitHub Bot commented on PARQUET-968:


costimuraru commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-381134753
 
 
   @BenoitHanotte sounds awesome! I successfully tested this final patch and it 
works great with AWS Athena (Presto).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426863#comment-16426863
 ] 

ASF GitHub Bot commented on PARQUET-968:


BenoitHanotte commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-378924606
 
 
   @costimuraru @lukasnalezenec do we have an agreement? Can we merge the 2 
PRs? 
   These changes have been in prod here for some time, and I would like them to 
be merged so that we can start to work on other tickets that would otherwise 
require an expansive rebase after this one is merged.
   
   @virtualluke These changes are backward compatible for reading so you won't 
need to set any flag when reading. Once you have read the data, you will be 
able to write them using the "specs-compliant" schemas using the 
`parquet.proto.writeSpecsCompliant` flag


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426861#comment-16426861
 ] 

ASF GitHub Bot commented on PARQUET-968:


BenoitHanotte commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-378924606
 
 
   @costimuraru do we have an agreement? Can we merge the 2 PRs? 
   These changes have been in prod here for some time, and I would like them to 
be merged so that we can start to work on other tickets that would otherwise 
require an expansive rebase after this one is merged.
   
   @virtualluke These changes are backward compatible for reading so you won't 
need to set any flag when reading. Once you have read the data, you will be 
able to write them using the "specs-compliant" schemas using the 
`parquet.proto.writeSpecsCompliant` flag


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426797#comment-16426797
 ] 

ASF GitHub Bot commented on PARQUET-968:


virtualluke commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-378911550
 
 
   When this is implemented there will be a flag where I could read parquet 
files which have the 2-level repetition style and convert them to the 3-level 
style so they would be compatible with the current state of parquet parsing 
libraries built with parquet-cpp (thinking pyarrow here) ?  Does this look like 
it will be merged soon?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-03-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418262#comment-16418262
 ] 

ASF GitHub Bot commented on PARQUET-968:


rdblue commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-377076348
 
 
   These changes sound good to me once there is consensus from everyone that 
understands the protobuf parts.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415757#comment-16415757
 ] 

ASF GitHub Bot commented on PARQUET-968:


BenoitHanotte commented on a change in pull request #411: PARQUET-968 Add 
Hive/Presto support in ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#discussion_r177454755
 
 

 ##
 File path: 
parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java
 ##
 @@ -345,4 +351,121 @@ public void addBinary(Binary binary) {
 }
 
   }
+
+  /**
+   * This class unwraps the additional LIST wrapper and makes it possible to 
read the underlying data and then convert
+   * it to protobuf.
+   * 
+   * Consider the following protobuf schema:
+   * message SimpleList {
+   *   repeated int64 first_array = 1;
+   * }
+   * 
+   * A LIST wrapper is created in parquet for the above mentioned protobuf 
schema:
+   * message SimpleList {
+   *   required group first_array (LIST) = 1 {
+   * repeated int32 element;
+   *   }
+   * }
+   * 
+   * The LIST wrappers are used by 3rd party tools, such as Hive, to read 
parquet arrays. The wrapper contains
+   * one only one field: either a primitive field (like in the example above, 
where we have an array of ints) or
+   * another group (array of messages).
+   */
+  final class ListConverter extends GroupConverter {
+private final Converter converter;
+private final boolean listOfMessage;
+
+public ListConverter(Message.Builder parentBuilder, 
Descriptors.FieldDescriptor fieldDescriptor, Type parquetType) {
+  OriginalType originalType = parquetType.getOriginalType();
+  if (originalType != OriginalType.LIST) {
+throw new ParquetDecodingException("Expected LIST wrapper. Found: " + 
originalType + " instead.");
+  }
+
+  listOfMessage = fieldDescriptor.getJavaType() == JavaType.MESSAGE;
 
 Review comment:
   done in https://github.com/costimuraru/parquet-mr/pull/2


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415758#comment-16415758
 ] 

ASF GitHub Bot commented on PARQUET-968:


BenoitHanotte commented on a change in pull request #411: PARQUET-968 Add 
Hive/Presto support in ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#discussion_r177454793
 
 

 ##
 File path: 
parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java
 ##
 @@ -388,7 +388,7 @@ public ListConverter(Message.Builder parentBuilder, 
Descriptors.FieldDescriptor
   if (parquetType.asGroupType().containsField("list")) {
 parquetSchema = parquetType.asGroupType().getType("list");
 if (parquetSchema.asGroupType().containsField("element")) {
-  parquetSchema.asGroupType().getType("element");
+  parquetSchema = parquetSchema.asGroupType().getType("element");
 
 Review comment:
   done in https://github.com/costimuraru/parquet-mr/pull/2


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415756#comment-16415756
 ] 

ASF GitHub Bot commented on PARQUET-968:


BenoitHanotte commented on a change in pull request #411: PARQUET-968 Add 
Hive/Presto support in ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#discussion_r177454394
 
 

 ##
 File path: 
parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoWriteSupport.java
 ##
 @@ -247,21 +282,45 @@ final void writeRawValue(Object value) {
 @Override
 final void writeField(Object value) {
   recordConsumer.startField(fieldName, index);
+  recordConsumer.startGroup();
   List list = (List) value;
 
+  recordConsumer.startField("list", 0); // This is the wrapper group for 
the array field
   for (Object listEntry: list) {
+recordConsumer.startGroup();
+
+recordConsumer.startField("element", 0); // This is the mandatory 
inner field
+
+if (!isPrimitive(listEntry)) {
+  recordConsumer.startGroup();
+}
+
 fieldWriter.writeRawValue(listEntry);
+
+if (!isPrimitive(listEntry)) {
+  recordConsumer.endGroup();
+}
+
+recordConsumer.endField("element", 0);
+
+recordConsumer.endGroup();
   }
+  recordConsumer.endField("list", 0);
 
+  recordConsumer.endGroup();
   recordConsumer.endField(fieldName, index);
 }
   }
 
+  private boolean isPrimitive(Object listEntry) {
+return !(listEntry instanceof Message);
 
 Review comment:
   done in https://github.com/costimuraru/parquet-mr/pull/2, I removed the 
isPrimitive method as wrapping with a group can be done in 
`MessageWriter.writeRawValue()`, removing the need to handle non-primitive 
types differently inside the ArrayWriter.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415751#comment-16415751
 ] 

ASF GitHub Bot commented on PARQUET-968:


BenoitHanotte commented on a change in pull request #411: PARQUET-968 Add 
Hive/Presto support in ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#discussion_r177453689
 
 

 ##
 File path: 
parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java
 ##
 @@ -345,4 +351,121 @@ public void addBinary(Binary binary) {
 }
 
   }
+
+  /**
+   * This class unwraps the additional LIST wrapper and makes it possible to 
read the underlying data and then convert
+   * it to protobuf.
+   * 
+   * Consider the following protobuf schema:
+   * message SimpleList {
+   *   repeated int64 first_array = 1;
+   * }
+   * 
+   * A LIST wrapper is created in parquet for the above mentioned protobuf 
schema:
+   * message SimpleList {
+   *   required group first_array (LIST) = 1 {
+   * repeated int32 element;
+   *   }
+   * }
+   * 
+   * The LIST wrappers are used by 3rd party tools, such as Hive, to read 
parquet arrays. The wrapper contains
+   * one only one field: either a primitive field (like in the example above, 
where we have an array of ints) or
+   * another group (array of messages).
+   */
+  final class ListConverter extends GroupConverter {
+private final Converter converter;
+private final boolean listOfMessage;
+
+public ListConverter(Message.Builder parentBuilder, 
Descriptors.FieldDescriptor fieldDescriptor, Type parquetType) {
+  OriginalType originalType = parquetType.getOriginalType();
+  if (originalType != OriginalType.LIST) {
+throw new ParquetDecodingException("Expected LIST wrapper. Found: " + 
originalType + " instead.");
+  }
+
+  listOfMessage = fieldDescriptor.getJavaType() == JavaType.MESSAGE;
+
+  Type parquetSchema;
+  if (parquetType.asGroupType().containsField("list")) {
 
 Review comment:
   done in https://github.com/costimuraru/parquet-mr/pull/2


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415749#comment-16415749
 ] 

ASF GitHub Bot commented on PARQUET-968:


BenoitHanotte commented on a change in pull request #411: PARQUET-968 Add 
Hive/Presto support in ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#discussion_r177453607
 
 

 ##
 File path: 
parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java
 ##
 @@ -129,10 +131,14 @@ public void add(Object value) {
   };
 }
 
-return newScalarConverter(parent, parentBuilder, fieldDescriptor, 
parquetType);
+OriginalType originalType = parquetType.getOriginalType() == null ? 
OriginalType.UTF8 : parquetType.getOriginalType();
 
 Review comment:
   I believe the reason for this is that if originalType is null, the swicth 
will throw an exception. 
   In https://github.com/costimuraru/parquet-mr/pull/2 I replaced the switch 
with an if.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415744#comment-16415744
 ] 

ASF GitHub Bot commented on PARQUET-968:


BenoitHanotte commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-376552693
 
 
   @costimuraru @lukasnalezenec @qinghui-xu I have updated my PR 
(https://github.com/costimuraru/parquet-mr/pull/2) on top of this one with a 
comment addressing the code review feedback. Could you please have a look?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-03-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407735#comment-16407735
 ] 

ASF GitHub Bot commented on PARQUET-968:


BenoitHanotte commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-374895431
 
 
   @costimuraru I updated my PR in order to isolate the repetition-level change 
in a separate commit. 
   
   However with lists as required we are likely creating incorrect files: the 
root wrapper is `required` but not written as protobuf's `pb.getAllFields()` 
only returns fields that are set (i.e. not an empty list), thus the field for 
the list root wrapper is never written despite being `required`.
   
   Here is how Spark would show the resulting dataset if lists are required:
   ```
   +-+--+-++++
   |intNotSet|intSet|emptyRepeated|nonEmptyRepeated|emptyMap| nonEmptyMap|
   +-+--+-++++
   | null| 1|   []|  [1, 1]|null|[1 -> 1, 2 -> 2]|
   +-+--+-++++
   ```
   We can see that empty maps and primitives that are not set are null, however 
lists show up as empty


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-03-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407687#comment-16407687
 ] 

ASF GitHub Bot commented on PARQUET-968:


BenoitHanotte commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-374883095
 
 
   @virtualluke it seems related: the issue seems to be that your file are 
missing the LIST wrapper around repeated fields (the 3-levels lists), and 
instead using the 2-levels repetition style. 
   The 2-levels style is what parquet-protobuf currently does and this PR's aim 
is to allow it to use the 3-levels way, which is how the parquet specs define 
lists and maps when writting parquet files. 
   The PR you linked to does the opposite: allowing to read files that have 
been written with the 2-levels style.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405740#comment-16405740
 ] 

ASF GitHub Bot commented on PARQUET-968:


virtualluke commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-374458355
 
 
   Currently I am handed parquet files generated from parquet-mr and the 
current way LIST is handled makes the parquet files unusable in anything in the 
current python stack I have tried (arrow/parquet-cpp or fastparquet). This PR 
seems like it would address the difference? 
https://issues.apache.org/jira/browse/PARQUET-1122 . Or is it similar but not 
related?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404787#comment-16404787
 ] 

ASF GitHub Bot commented on PARQUET-968:


costimuraru commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-374205305
 
 
   Moving the discussion from the fork branch 
(https://github.com/costimuraru/parquet-mr/pull/2/files#r175112235), here.
   
   > costimuraru 3 days ago  Owner
   > @BenoitHanotte, we had a discussion a while back around the LIST wrapper 
being REQUIRED:
   > https://github.com/apache/parquet-mr/pull/411#issuecomment-301194565
   > 
   >   @BenoitHanotte
   > BenoitHanotte 3 days ago
   > I strongly believe the wrappers should be optional for the following 
reasons:
   > 
   > it is consistent with how Maps are handled
   > it is consistent with the fact that parquet-protobuf does not write fields 
if they have the default values (protobuf considers them cleared). this is 
currently the case for primitives and complex messages, I would like 
collections to behave the same way.
   > every field is optional with protobuf 3
   > I haven't seen any arguments for making the wrapper required in the first 
place, was there a specific reason I am not aware of?
   
   @julienledem, regarding your previous comment 
(https://github.com/apache/parquet-mr/pull/411#issuecomment-301194565), is 
there a specific reason for which the LIST wrapper needs to be `REQUIRED` and 
not `OPTIONAL`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404785#comment-16404785
 ] 

ASF GitHub Bot commented on PARQUET-968:


costimuraru commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-374205305
 
 
   > costimuraru 3 days ago  Owner
   > @BenoitHanotte, we had a discussion a while back around the LIST wrapper 
being REQUIRED:
   > https://github.com/apache/parquet-mr/pull/411#issuecomment-301194565
   > 
   >   @BenoitHanotte
   > BenoitHanotte 3 days ago
   > I strongly believe the wrappers should be optional for the following 
reasons:
   > 
   > it is consistent with how Maps are handled
   > it is consistent with the fact that parquet-protobuf does not write fields 
if they have the default values (protobuf considers them cleared). this is 
currently the case for primitives and complex messages, I would like 
collections to behave the same way.
   > every field is optional with protobuf 3
   > I haven't seen any arguments for making the wrapper required in the first 
place, was there a specific reason I am not aware of?
   
   @julienledem, regarding your previous comment 
(https://github.com/apache/parquet-mr/pull/411#issuecomment-301194565), is 
there a specific reason for which the LIST wrapper needs to be `REQUIRED` and 
not `OPTIONAL`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-03-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16401822#comment-16401822
 ] 

ASF GitHub Bot commented on PARQUET-968:


BenoitHanotte commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-373654480
 
 
   @costimuraru I did a PR in your repo to add a flag that enable writing using 
the specs-compliant schemas: https://github.com/costimuraru/parquet-mr/pull/2 . 
The flag is off by default in order to keep backward compatibility. 
   I believe this is the safest way going forward and will allow to merge this 
PR without breaking backward compatibility. The flag could be set to true in a 
future major release of parquet.
   
   You can find a test suite at https://github.com/BenoitHanotte/parquet-968 
that highlights the differences in how the old schemas and the new ones 
(specs-compliant) are interpreted by Spark and validates the changes in this PR:
   - default behavior (with flag set to false, default behavior, backward 
compatible):
   ```
   +-++++
   |emptyRepeated|nonEmptyRepeated|emptyMap| nonEmptyMap|
   +-++++
   |   []|  [1, 1]|  []|[[1, 1], [2, 2]]|
   +-++++
   ```
   - specs-compliant schemas:
   ```
   +-++++
   |emptyRepeated|nonEmptyRepeated|emptyMap| nonEmptyMap|
   +-++++
   | null|  [1, 1]|null|[1 -> 1, 2 -> 2]|
   +-++++
   ```
   
   You can see that maps are interpreted differently (they are interpreted as 
lists of key-value tuples with the non-compliant schemas) as are the default 
values (empty lists and maps).
   
   @lukasnalezenec @qinghui-xu feel free to also have a look as we would like 
to have this PR merged if all the concerns are addressed 
   
   Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-03-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16401662#comment-16401662
 ] 

ASF GitHub Bot commented on PARQUET-968:


BenoitHanotte commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-373654480
 
 
   @costimuraru I did a PR in your repo to add a flag that enablse writing 
using the specs-compliant schemas: 
https://github.com/costimuraru/parquet-mr/pull/2 . The flag is off by default 
in order to keep backward compatibility. 
   I believe this is the safest way going forward and will allow to merge this 
PR without breaking backward compatibility. the flag could be set to true in a 
future major release of parquet.
   
   You can find a test suite at https://github.com/BenoitHanotte/parquet-968 
that highlights the differences in how the old schemas and the new ones 
(specs-compliant) are interpreted by Spark and validates the changes in this PR:
   - default behavior (with flag set to false, default behavior, backward 
compatible):
   ```
   +-++++
   |emptyRepeated|nonEmptyRepeated|emptyMap| nonEmptyMap|
   +-++++
   |   []|  [1, 1]|  []|[[1, 1], [2, 2]]|
   +-++++
   ```
   - specs-compliant schemas:
   ```
   +-++++
   |emptyRepeated|nonEmptyRepeated|emptyMap| nonEmptyMap|
   +-++++
   | null|  [1, 1]|null|[1 -> 1, 2 -> 2]|
   +-++++
   ```
   
   You can see that maps are interpreted differently (they are interpreted as 
lists of key-value tuples with the non-compliant schemas) as are the default 
values (empty lists and maps).
   
   @lukasnalezenec @qinghui-xu feel free to also have a look as we would like 
to have this PR merged if all the concerns are resolved 
   
   Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397313#comment-16397313
 ] 

ASF GitHub Bot commented on PARQUET-968:


qinghui-xu commented on a change in pull request #411: PARQUET-968 Add 
Hive/Presto support in ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#discussion_r174216544
 
 

 ##
 File path: 
parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java
 ##
 @@ -129,10 +131,14 @@ public void add(Object value) {
   };
 }
 
-return newScalarConverter(parent, parentBuilder, fieldDescriptor, 
parquetType);
+OriginalType originalType = parquetType.getOriginalType() == null ? 
OriginalType.UTF8 : parquetType.getOriginalType();
 
 Review comment:
   I guess for the data generated in previous version of parquet-protobuf, it 
is not having the "OriginalType" annotation for repeated fields, thus this 
conditional test to be backward compatible.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)