[jira] [Commented] (PARQUET-2013) [Format] Mention that converted types are deprecated

2021-04-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314636#comment-17314636
 ] 

ASF GitHub Bot commented on PARQUET-2013:
-

wesm merged pull request #171:
URL: https://github.com/apache/parquet-format/pull/171


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Format] Mention that converted types are deprecated
> 
>
> Key: PARQUET-2013
> URL: https://issues.apache.org/jira/browse/PARQUET-2013
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: format-2.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-2013) [Format] Mention that converted types are deprecated

2021-04-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314637#comment-17314637
 ] 

ASF GitHub Bot commented on PARQUET-2013:
-

wesm commented on pull request #171:
URL: https://github.com/apache/parquet-format/pull/171#issuecomment-813110387


   Thanks @pitrou!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Format] Mention that converted types are deprecated
> 
>
> Key: PARQUET-2013
> URL: https://issues.apache.org/jira/browse/PARQUET-2013
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: format-2.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-format] wesm commented on pull request #171: PARQUET-2013: Replace back "should" with "must"

2021-04-04 Thread GitBox


wesm commented on pull request #171:
URL: https://github.com/apache/parquet-format/pull/171#issuecomment-813110387


   Thanks @pitrou!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [parquet-format] wesm merged pull request #171: PARQUET-2013: Replace back "should" with "must"

2021-04-04 Thread GitBox


wesm merged pull request #171:
URL: https://github.com/apache/parquet-format/pull/171


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-2013) [Format] Mention that converted types are deprecated

2021-04-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314620#comment-17314620
 ] 

ASF GitHub Bot commented on PARQUET-2013:
-

emkornfield commented on pull request #171:
URL: https://github.com/apache/parquet-format/pull/171#issuecomment-813096900


   LGTM.  Is merging for this repo done through the GitHub UI?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Format] Mention that converted types are deprecated
> 
>
> Key: PARQUET-2013
> URL: https://issues.apache.org/jira/browse/PARQUET-2013
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: format-2.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-format] emkornfield commented on pull request #171: PARQUET-2013: Replace back "should" with "must"

2021-04-04 Thread GitBox


emkornfield commented on pull request #171:
URL: https://github.com/apache/parquet-format/pull/171#issuecomment-813096900


   LGTM.  Is merging for this repo done through the GitHub UI?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-1143) Update Java for format 2.4.0 changes

2021-04-04 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314618#comment-17314618
 ] 

Dongjoon Hyun commented on PARQUET-1143:


Hi, [~rdblue]. Could you set the Fix Version, please?

> Update Java for format 2.4.0 changes
> 
>
> Key: PARQUET-1143
> URL: https://issues.apache.org/jira/browse/PARQUET-1143
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-mr
>Affects Versions: 1.9.0, 1.8.2
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-2013) [Format] Mention that converted types are deprecated

2021-04-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314585#comment-17314585
 ] 

ASF GitHub Bot commented on PARQUET-2013:
-

pitrou opened a new pull request #171:
URL: https://github.com/apache/parquet-format/pull/171


   Followup to cabeea7ca4afe22f4768555a017594ce343d88df and the discussion in 
https://github.com/apache/parquet-format/pull/169#discussion_r606836868


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Format] Mention that converted types are deprecated
> 
>
> Key: PARQUET-2013
> URL: https://issues.apache.org/jira/browse/PARQUET-2013
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: format-2.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-format] pitrou opened a new pull request #171: PARQUET-2013: Replace back "should" with "must"

2021-04-04 Thread GitBox


pitrou opened a new pull request #171:
URL: https://github.com/apache/parquet-format/pull/171


   Followup to cabeea7ca4afe22f4768555a017594ce343d88df and the discussion in 
https://github.com/apache/parquet-format/pull/169#discussion_r606836868


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-2013) [Format] Mention that converted types are deprecated

2021-04-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314579#comment-17314579
 ] 

ASF GitHub Bot commented on PARQUET-2013:
-

wesm commented on a change in pull request #169:
URL: https://github.com/apache/parquet-format/pull/169#discussion_r606836868



##
File path: src/main/thrift/parquet.thrift
##
@@ -316,7 +317,7 @@ struct BsonType {
  * LogicalType annotations to replace ConvertedType.
  *
  * To maintain compatibility, implementations using LogicalType for a
- * SchemaElement must also set the corresponding ConvertedType from the
+ * SchemaElement should also set the corresponding ConvertedType from the

Review comment:
   It's a change with forward compatibility implications, so either we 
revert or we discuss/vote, but we can't leave it like this. Impala for example 
only started to get support for LogicalType in 2019 (and this is only one of 
the mainstream independent implementations), so we'd be making a bet about how 
quickly updates are rolling out to all of the corners of the world
   
   
https://github.com/apache/impala/commit/0906e0817ce2301d7e20f355b334861f0232f16f




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Format] Mention that converted types are deprecated
> 
>
> Key: PARQUET-2013
> URL: https://issues.apache.org/jira/browse/PARQUET-2013
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: format-2.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-format] wesm commented on a change in pull request #169: PARQUET-2013: [Format] Mention that ConvertedType is deprecated

2021-04-04 Thread GitBox


wesm commented on a change in pull request #169:
URL: https://github.com/apache/parquet-format/pull/169#discussion_r606836868



##
File path: src/main/thrift/parquet.thrift
##
@@ -316,7 +317,7 @@ struct BsonType {
  * LogicalType annotations to replace ConvertedType.
  *
  * To maintain compatibility, implementations using LogicalType for a
- * SchemaElement must also set the corresponding ConvertedType from the
+ * SchemaElement should also set the corresponding ConvertedType from the

Review comment:
   It's a change with forward compatibility implications, so either we 
revert or we discuss/vote, but we can't leave it like this. Impala for example 
only started to get support for LogicalType in 2019 (and this is only one of 
the mainstream independent implementations), so we'd be making a bet about how 
quickly updates are rolling out to all of the corners of the world
   
   
https://github.com/apache/impala/commit/0906e0817ce2301d7e20f355b334861f0232f16f




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-2013) [Format] Mention that converted types are deprecated

2021-04-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314576#comment-17314576
 ] 

ASF GitHub Bot commented on PARQUET-2013:
-

pitrou commented on a change in pull request #169:
URL: https://github.com/apache/parquet-format/pull/169#discussion_r606835027



##
File path: src/main/thrift/parquet.thrift
##
@@ -316,7 +317,7 @@ struct BsonType {
  * LogicalType annotations to replace ConvertedType.
  *
  * To maintain compatibility, implementations using LogicalType for a
- * SchemaElement must also set the corresponding ConvertedType from the
+ * SchemaElement should also set the corresponding ConvertedType from the

Review comment:
   That said, if people feel strongly about it, I can simply revert this 
one-word change, because I don't think there's any point organizing a vote on 
it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Format] Mention that converted types are deprecated
> 
>
> Key: PARQUET-2013
> URL: https://issues.apache.org/jira/browse/PARQUET-2013
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: format-2.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-format] pitrou commented on a change in pull request #169: PARQUET-2013: [Format] Mention that ConvertedType is deprecated

2021-04-04 Thread GitBox


pitrou commented on a change in pull request #169:
URL: https://github.com/apache/parquet-format/pull/169#discussion_r606835027



##
File path: src/main/thrift/parquet.thrift
##
@@ -316,7 +317,7 @@ struct BsonType {
  * LogicalType annotations to replace ConvertedType.
  *
  * To maintain compatibility, implementations using LogicalType for a
- * SchemaElement must also set the corresponding ConvertedType from the
+ * SchemaElement should also set the corresponding ConvertedType from the

Review comment:
   That said, if people feel strongly about it, I can simply revert this 
one-word change, because I don't think there's any point organizing a vote on 
it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-2013) [Format] Mention that converted types are deprecated

2021-04-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314575#comment-17314575
 ] 

ASF GitHub Bot commented on PARQUET-2013:
-

pitrou commented on a change in pull request #169:
URL: https://github.com/apache/parquet-format/pull/169#discussion_r606833694



##
File path: src/main/thrift/parquet.thrift
##
@@ -316,7 +317,7 @@ struct BsonType {
  * LogicalType annotations to replace ConvertedType.
  *
  * To maintain compatibility, implementations using LogicalType for a
- * SchemaElement must also set the corresponding ConvertedType from the
+ * SchemaElement should also set the corresponding ConvertedType from the

Review comment:
   Well, given the state, even today, of inter-implementation 
compatibility, I think anyone using a 2017-era implementation would run into 
lots of issues when reading files produced by other implementations, regardless 
of whether logical types are used.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Format] Mention that converted types are deprecated
> 
>
> Key: PARQUET-2013
> URL: https://issues.apache.org/jira/browse/PARQUET-2013
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: format-2.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-format] pitrou commented on a change in pull request #169: PARQUET-2013: [Format] Mention that ConvertedType is deprecated

2021-04-04 Thread GitBox


pitrou commented on a change in pull request #169:
URL: https://github.com/apache/parquet-format/pull/169#discussion_r606833694



##
File path: src/main/thrift/parquet.thrift
##
@@ -316,7 +317,7 @@ struct BsonType {
  * LogicalType annotations to replace ConvertedType.
  *
  * To maintain compatibility, implementations using LogicalType for a
- * SchemaElement must also set the corresponding ConvertedType from the
+ * SchemaElement should also set the corresponding ConvertedType from the

Review comment:
   Well, given the state, even today, of inter-implementation 
compatibility, I think anyone using a 2017-era implementation would run into 
lots of issues when reading files produced by other implementations, regardless 
of whether logical types are used.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-2013) [Format] Mention that converted types are deprecated

2021-04-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314567#comment-17314567
 ] 

ASF GitHub Bot commented on PARQUET-2013:
-

wesm commented on a change in pull request #169:
URL: https://github.com/apache/parquet-format/pull/169#discussion_r606832265



##
File path: src/main/thrift/parquet.thrift
##
@@ -316,7 +317,7 @@ struct BsonType {
  * LogicalType annotations to replace ConvertedType.
  *
  * To maintain compatibility, implementations using LogicalType for a
- * SchemaElement must also set the corresponding ConvertedType from the
+ * SchemaElement should also set the corresponding ConvertedType from the

Review comment:
   If you want to make this change, I think it needs to be voted on. 
Knowing the motley state of Parquet implementations in enterprise deployments I 
would be uncomfortable writing files that don't have the field. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Format] Mention that converted types are deprecated
> 
>
> Key: PARQUET-2013
> URL: https://issues.apache.org/jira/browse/PARQUET-2013
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: format-2.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-format] wesm commented on a change in pull request #169: PARQUET-2013: [Format] Mention that ConvertedType is deprecated

2021-04-04 Thread GitBox


wesm commented on a change in pull request #169:
URL: https://github.com/apache/parquet-format/pull/169#discussion_r606832265



##
File path: src/main/thrift/parquet.thrift
##
@@ -316,7 +317,7 @@ struct BsonType {
  * LogicalType annotations to replace ConvertedType.
  *
  * To maintain compatibility, implementations using LogicalType for a
- * SchemaElement must also set the corresponding ConvertedType from the
+ * SchemaElement should also set the corresponding ConvertedType from the

Review comment:
   If you want to make this change, I think it needs to be voted on. 
Knowing the motley state of Parquet implementations in enterprise deployments I 
would be uncomfortable writing files that don't have the field. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-2013) [Format] Mention that converted types are deprecated

2021-04-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314563#comment-17314563
 ] 

ASF GitHub Bot commented on PARQUET-2013:
-

pitrou commented on a change in pull request #169:
URL: https://github.com/apache/parquet-format/pull/169#discussion_r606831456



##
File path: src/main/thrift/parquet.thrift
##
@@ -316,7 +317,7 @@ struct BsonType {
  * LogicalType annotations to replace ConvertedType.
  *
  * To maintain compatibility, implementations using LogicalType for a
- * SchemaElement must also set the corresponding ConvertedType from the
+ * SchemaElement should also set the corresponding ConvertedType from the

Review comment:
   That said, feel free to spawn a discussion on the ML.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Format] Mention that converted types are deprecated
> 
>
> Key: PARQUET-2013
> URL: https://issues.apache.org/jira/browse/PARQUET-2013
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: format-2.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-format] pitrou commented on a change in pull request #169: PARQUET-2013: [Format] Mention that ConvertedType is deprecated

2021-04-04 Thread GitBox


pitrou commented on a change in pull request #169:
URL: https://github.com/apache/parquet-format/pull/169#discussion_r606831456



##
File path: src/main/thrift/parquet.thrift
##
@@ -316,7 +317,7 @@ struct BsonType {
  * LogicalType annotations to replace ConvertedType.
  *
  * To maintain compatibility, implementations using LogicalType for a
- * SchemaElement must also set the corresponding ConvertedType from the
+ * SchemaElement should also set the corresponding ConvertedType from the

Review comment:
   That said, feel free to spawn a discussion on the ML.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-2013) [Format] Mention that converted types are deprecated

2021-04-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314560#comment-17314560
 ] 

ASF GitHub Bot commented on PARQUET-2013:
-

pitrou commented on a change in pull request #169:
URL: https://github.com/apache/parquet-format/pull/169#discussion_r606831346



##
File path: src/main/thrift/parquet.thrift
##
@@ -316,7 +317,7 @@ struct BsonType {
  * LogicalType annotations to replace ConvertedType.
  *
  * To maintain compatibility, implementations using LogicalType for a
- * SchemaElement must also set the corresponding ConvertedType from the
+ * SchemaElement should also set the corresponding ConvertedType from the

Review comment:
   Logical types were introduced 3.5 years go, so it seemed reasonable to 
relax the requirement. Do you expect some Parquet readers in the wild to still 
not understand them?
   It's easy, for every converted type you support, to also support the 
corresponding logical type. It's only a small amount of logic in the decoding 
path, and it's not performance-critical.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Format] Mention that converted types are deprecated
> 
>
> Key: PARQUET-2013
> URL: https://issues.apache.org/jira/browse/PARQUET-2013
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: format-2.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-format] pitrou commented on a change in pull request #169: PARQUET-2013: [Format] Mention that ConvertedType is deprecated

2021-04-04 Thread GitBox


pitrou commented on a change in pull request #169:
URL: https://github.com/apache/parquet-format/pull/169#discussion_r606831346



##
File path: src/main/thrift/parquet.thrift
##
@@ -316,7 +317,7 @@ struct BsonType {
  * LogicalType annotations to replace ConvertedType.
  *
  * To maintain compatibility, implementations using LogicalType for a
- * SchemaElement must also set the corresponding ConvertedType from the
+ * SchemaElement should also set the corresponding ConvertedType from the

Review comment:
   Logical types were introduced 3.5 years go, so it seemed reasonable to 
relax the requirement. Do you expect some Parquet readers in the wild to still 
not understand them?
   It's easy, for every converted type you support, to also support the 
corresponding logical type. It's only a small amount of logic in the decoding 
path, and it's not performance-critical.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-2013) [Format] Mention that converted types are deprecated

2021-04-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314545#comment-17314545
 ] 

ASF GitHub Bot commented on PARQUET-2013:
-

wesm commented on a change in pull request #169:
URL: https://github.com/apache/parquet-format/pull/169#discussion_r606826402



##
File path: src/main/thrift/parquet.thrift
##
@@ -316,7 +317,7 @@ struct BsonType {
  * LogicalType annotations to replace ConvertedType.
  *
  * To maintain compatibility, implementations using LogicalType for a
- * SchemaElement must also set the corresponding ConvertedType from the
+ * SchemaElement should also set the corresponding ConvertedType from the

Review comment:
   I also believed that setting ConvertedType was MUST for forward 
compatibility reasons. Can we please discuss on the mailing list?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Format] Mention that converted types are deprecated
> 
>
> Key: PARQUET-2013
> URL: https://issues.apache.org/jira/browse/PARQUET-2013
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: format-2.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-format] wesm commented on a change in pull request #169: PARQUET-2013: [Format] Mention that ConvertedType is deprecated

2021-04-04 Thread GitBox


wesm commented on a change in pull request #169:
URL: https://github.com/apache/parquet-format/pull/169#discussion_r606826402



##
File path: src/main/thrift/parquet.thrift
##
@@ -316,7 +317,7 @@ struct BsonType {
  * LogicalType annotations to replace ConvertedType.
  *
  * To maintain compatibility, implementations using LogicalType for a
- * SchemaElement must also set the corresponding ConvertedType from the
+ * SchemaElement should also set the corresponding ConvertedType from the

Review comment:
   I also believed that setting ConvertedType was MUST for forward 
compatibility reasons. Can we please discuss on the mailing list?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-2013) [Format] Mention that converted types are deprecated

2021-04-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314543#comment-17314543
 ] 

ASF GitHub Bot commented on PARQUET-2013:
-

emkornfield commented on a change in pull request #169:
URL: https://github.com/apache/parquet-format/pull/169#discussion_r606825680



##
File path: src/main/thrift/parquet.thrift
##
@@ -316,7 +317,7 @@ struct BsonType {
  * LogicalType annotations to replace ConvertedType.
  *
  * To maintain compatibility, implementations using LogicalType for a
- * SchemaElement must also set the corresponding ConvertedType from the
+ * SchemaElement should also set the corresponding ConvertedType from the

Review comment:
   sorry for the belated comment by why the change from must to should?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Format] Mention that converted types are deprecated
> 
>
> Key: PARQUET-2013
> URL: https://issues.apache.org/jira/browse/PARQUET-2013
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: format-2.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-format] emkornfield commented on a change in pull request #169: PARQUET-2013: [Format] Mention that ConvertedType is deprecated

2021-04-04 Thread GitBox


emkornfield commented on a change in pull request #169:
URL: https://github.com/apache/parquet-format/pull/169#discussion_r606825680



##
File path: src/main/thrift/parquet.thrift
##
@@ -316,7 +317,7 @@ struct BsonType {
  * LogicalType annotations to replace ConvertedType.
  *
  * To maintain compatibility, implementations using LogicalType for a
- * SchemaElement must also set the corresponding ConvertedType from the
+ * SchemaElement should also set the corresponding ConvertedType from the

Review comment:
   sorry for the belated comment by why the change from must to should?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (PARQUET-2014) Local key wrapping with rotation

2021-04-04 Thread Gidon Gershinsky (Jira)
Gidon Gershinsky created PARQUET-2014:
-

 Summary: Local key wrapping with rotation
 Key: PARQUET-2014
 URL: https://issues.apache.org/jira/browse/PARQUET-2014
 Project: Parquet
  Issue Type: New Feature
  Components: parquet-mr
Reporter: Gidon Gershinsky
Assignee: Gidon Gershinsky


parquet-mr-1.12.0 has an experimental support for local wrapping of encryption 
keys, that doesn't handle master key versions and key rotation. This Jira will 
add these capabilities.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (PARQUET-1613) Key rotation tool

2021-04-04 Thread Gidon Gershinsky (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gidon Gershinsky resolved PARQUET-1613.
---
Resolution: Done

handled by pr 615

> Key rotation tool
> -
>
> Key: PARQUET-1613
> URL: https://issues.apache.org/jira/browse/PARQUET-1613
> Project: Parquet
>  Issue Type: Sub-task
>Reporter: Gidon Gershinsky
>Assignee: Maya Anderson
>Priority: Major
>
> Rotates the master key, for both single and double wrappers.
> For the latter, enables support for a single KMS call per column, in readers 
> of any data sets.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (PARQUET-1612) Double wrapped key manager

2021-04-04 Thread Gidon Gershinsky (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gidon Gershinsky resolved PARQUET-1612.
---
Resolution: Done

handled by pr 615

> Double wrapped key manager
> --
>
> Key: PARQUET-1612
> URL: https://issues.apache.org/jira/browse/PARQUET-1612
> Project: Parquet
>  Issue Type: Sub-task
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>
> To minimize interaction with KMS, this manager will wrap the encryption keys 
> twice.  Might be combined with key rotation for further optimization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (PARQUET-1945) Add an option to allow auto conversion from empty fields to NULL

2021-04-04 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated PARQUET-1945:

Component/s: (was: parquet-format)
 parquet-mr

> Add an option to allow auto conversion from empty fields to NULL
> 
>
> Key: PARQUET-1945
> URL: https://issues.apache.org/jira/browse/PARQUET-1945
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Zheng Shao
>Priority: Minor
>
> Right now, Parquet Writer throws out an exception:
> {{Parquet record is malformed: empty fields are illegal, the field should be 
> ommited completely instead}}
> when an empty field (array or struct or map I guess?) is written.
> The suggestion here is to add an option "auto_convert_empty_fields_to_null" 
> that convert empty fields to null automatically on write.
> The LOC to change is 
> [here:|https://sourcegraph.com/github.com/apache/parquet-mr/-/blob/parquet-column/src/main/java/org/apache/parquet/io/MessageColumnIO.java#L328]
> {quote}{{if (emptyField) {}}
> {{ {{   throw new ParquetEncodingException("empty fields are illegal, the 
> field should be ommited completely instead");
> {{}}}{quote}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1222) Specify a well-defined sorting order for float and double types

2021-04-04 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314456#comment-17314456
 ] 

Antoine Pitrou commented on PARQUET-1222:
-

I'll note that Parquet C++ now has the following behaviour:

* signed zeros are properly ordered (ARROW-5562)
* NaNs are ignored when computing min/max (PARQUET-1225); if a page or column 
chunk only has NaNs, the statistics are unset


> Specify a well-defined sorting order for float and double types
> ---
>
> Key: PARQUET-1222
> URL: https://issues.apache.org/jira/browse/PARQUET-1222
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Zoltan Ivanfi
>Priority: Critical
>
> Currently parquet-format specifies the sort order for floating point numbers 
> as follows:
> {code:java}
>*   FLOAT - signed comparison of the represented value
>*   DOUBLE - signed comparison of the represented value
> {code}
> The problem is that the comparison of floating point numbers is only a 
> partial ordering with strange behaviour in specific corner cases. For 
> example, according to IEEE 754, -0 is neither less nor more than \+0 and 
> comparing NaN to anything always returns false. This ordering is not suitable 
> for statistics. Additionally, the Java implementation already uses a 
> different (total) ordering that handles these cases correctly but differently 
> than the C\+\+ implementations, which leads to interoperability problems.
> TypeDefinedOrder for doubles and floats should be deprecated and a new 
> TotalFloatingPointOrder should be introduced. The default for writing doubles 
> and floats would be the new TotalFloatingPointOrder. This ordering should be 
> effective and easy to implement in all programming languages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (PARQUET-1852) Array Index OutOf Bounds Exception when fall Back Dictionary Encoded Data

2021-04-04 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated PARQUET-1852:

Component/s: (was: parquet-format)
 parquet-mr

> Array Index OutOf Bounds Exception when fall Back Dictionary Encoded Data
> -
>
> Key: PARQUET-1852
> URL: https://issues.apache.org/jira/browse/PARQUET-1852
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Reporter: jiangbo
>Priority: Major
>
> java.lang.ArrayIndexOutOfBoundsException: 39782
> \n\tat 
> org.apache.parquet.column.values.dictionary.DictionaryValuesWriter$PlainBinaryDictionaryValuesWriter.fallBackDictionaryEncodedData(DictionaryValuesWriter.java:284)
> \n\tat 
> org.apache.parquet.column.values.dictionary.DictionaryValuesWriter.fallBackAllValuesTo(DictionaryValuesWriter.java:123)
> \n\tat 
> org.apache.parquet.column.values.fallback.FallbackValuesWriter.fallBack(FallbackValuesWriter.java:147)
> \n\tat 
> org.apache.parquet.column.values.fallback.FallbackValuesWriter.checkFallback(FallbackValuesWriter.java:141)
> \n\tat 
> org.apache.parquet.column.values.fallback.FallbackValuesWriter.writeBytes(FallbackValuesWriter.java:163)
> \n\tat 
> org.apache.parquet.column.impl.ColumnWriterV1.write(ColumnWriterV1.java:201)
> \n\tat 
> org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.addBinary(MessageColumnIO.java:467)
> \n\tat 
> org.apache.parquet.io.RecordConsumerLoggingWrapper.addBinary(RecordConsumerLoggingWrapper.java:119)
> \n\tat 
> org.apache.parquet.example.data.simple.BinaryValue.writeValue(BinaryValue.java:45)
> \n\tat 
> org.apache.parquet.example.data.simple.SimpleGroup.writeValue(SimpleGroup.java:229)
> \n\tat 
> org.apache.parquet.example.data.GroupWriter.writeGroup(GroupWriter.java:51)
> \n\tat org.apache.parquet.example.data.GroupWriter.write(GroupWriter.java:37)
> \n\tat 
> org.apache.parquet.hadoop.example.GroupWriteSupport.write(GroupWriteSupport.java:79)
> \n\tat 
> org.apache.parquet.hadoop.example.GroupWriteSupport.write(GroupWriteSupport.java:36)
> \n\tat 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:123)
> \n\tat org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:293)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (PARQUET-1881) How to enable sorted array flag while writing a column

2021-04-04 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated PARQUET-1881:

Priority: Trivial  (was: Blocker)

> How to enable sorted array flag while writing a column
> --
>
> Key: PARQUET-1881
> URL: https://issues.apache.org/jira/browse/PARQUET-1881
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-avro, parquet-cpp, parquet-format, parquet-mr, 
> parquet-thrift
>Reporter: Khasim Shaik
>Priority: Trivial
>  Labels: newbie
>
> I want to understand how can we enable the flag "sortedArray" information in 
> metadata while writing a row group or column
> I am exploring parquet.thrift to understand more about metadata, 
> I observed a field in metadata which is related to below struct in 
> parquet.thrift
> I am wondering how to set these fields from parquet while writing a column or 
> rowgroup 
> struct SortingColumn {
>   /** The column index (in this row group) **/
>   1: required i32 column_idx
>   /** If true, indicates this column is sorted in descending order. **/
>   2: required bool descending
>   /** If true, nulls will come before non-null values, otherwise,
>* nulls go at the end. */
>   3: required bool nulls_first
> }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1881) How to enable sorted array flag while writing a column

2021-04-04 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314451#comment-17314451
 ] 

Antoine Pitrou commented on PARQUET-1881:
-

Belated answer, but Parquet C++ doesn't use the SortingColumn information.

> How to enable sorted array flag while writing a column
> --
>
> Key: PARQUET-1881
> URL: https://issues.apache.org/jira/browse/PARQUET-1881
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-avro, parquet-cpp, parquet-format, parquet-mr, 
> parquet-thrift
>Reporter: Khasim Shaik
>Priority: Blocker
>  Labels: newbie
>
> I want to understand how can we enable the flag "sortedArray" information in 
> metadata while writing a row group or column
> I am exploring parquet.thrift to understand more about metadata, 
> I observed a field in metadata which is related to below struct in 
> parquet.thrift
> I am wondering how to set these fields from parquet while writing a column or 
> rowgroup 
> struct SortingColumn {
>   /** The column index (in this row group) **/
>   1: required i32 column_idx
>   /** If true, indicates this column is sorted in descending order. **/
>   2: required bool descending
>   /** If true, nulls will come before non-null values, otherwise,
>* nulls go at the end. */
>   3: required bool nulls_first
> }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (PARQUET-1881) How to enable sorted array flag while writing a column

2021-04-04 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved PARQUET-1881.
-
Resolution: Not A Bug

Closing since this is a user question, there is nothing to track here.

> How to enable sorted array flag while writing a column
> --
>
> Key: PARQUET-1881
> URL: https://issues.apache.org/jira/browse/PARQUET-1881
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-avro, parquet-cpp, parquet-format, parquet-mr, 
> parquet-thrift
>Reporter: Khasim Shaik
>Priority: Trivial
>  Labels: newbie
>
> I want to understand how can we enable the flag "sortedArray" information in 
> metadata while writing a row group or column
> I am exploring parquet.thrift to understand more about metadata, 
> I observed a field in metadata which is related to below struct in 
> parquet.thrift
> I am wondering how to set these fields from parquet while writing a column or 
> rowgroup 
> struct SortingColumn {
>   /** The column index (in this row group) **/
>   1: required i32 column_idx
>   /** If true, indicates this column is sorted in descending order. **/
>   2: required bool descending
>   /** If true, nulls will come before non-null values, otherwise,
>* nulls go at the end. */
>   3: required bool nulls_first
> }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (PARQUET-2013) [Format] Mention that converted types are deprecated

2021-04-04 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved PARQUET-2013.
-
Resolution: Fixed

Fixed by GH PR [https://github.com/apache/parquet-format/pull/169]

> [Format] Mention that converted types are deprecated
> 
>
> Key: PARQUET-2013
> URL: https://issues.apache.org/jira/browse/PARQUET-2013
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: format-2.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-2013) [Format] Mention that converted types are deprecated

2021-04-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314449#comment-17314449
 ] 

ASF GitHub Bot commented on PARQUET-2013:
-

pitrou merged pull request #169:
URL: https://github.com/apache/parquet-format/pull/169


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Format] Mention that converted types are deprecated
> 
>
> Key: PARQUET-2013
> URL: https://issues.apache.org/jira/browse/PARQUET-2013
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: format-2.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-format] pitrou merged pull request #169: PARQUET-2013: [Format] Mention that ConvertedType is deprecated

2021-04-04 Thread GitBox


pitrou merged pull request #169:
URL: https://github.com/apache/parquet-format/pull/169


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org