[jira] [Commented] (PARQUET-2013) [Format] Mention that converted types are deprecated
[ https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314636#comment-17314636 ] ASF GitHub Bot commented on PARQUET-2013: - wesm merged pull request #171: URL: https://github.com/apache/parquet-format/pull/171 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Format] Mention that converted types are deprecated > > > Key: PARQUET-2013 > URL: https://issues.apache.org/jira/browse/PARQUET-2013 > Project: Parquet > Issue Type: Task > Components: parquet-format >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Fix For: format-2.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-2013) [Format] Mention that converted types are deprecated
[ https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314637#comment-17314637 ] ASF GitHub Bot commented on PARQUET-2013: - wesm commented on pull request #171: URL: https://github.com/apache/parquet-format/pull/171#issuecomment-813110387 Thanks @pitrou! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Format] Mention that converted types are deprecated > > > Key: PARQUET-2013 > URL: https://issues.apache.org/jira/browse/PARQUET-2013 > Project: Parquet > Issue Type: Task > Components: parquet-format >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Fix For: format-2.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [parquet-format] wesm commented on pull request #171: PARQUET-2013: Replace back "should" with "must"
wesm commented on pull request #171: URL: https://github.com/apache/parquet-format/pull/171#issuecomment-813110387 Thanks @pitrou! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [parquet-format] wesm merged pull request #171: PARQUET-2013: Replace back "should" with "must"
wesm merged pull request #171: URL: https://github.com/apache/parquet-format/pull/171 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (PARQUET-2013) [Format] Mention that converted types are deprecated
[ https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314620#comment-17314620 ] ASF GitHub Bot commented on PARQUET-2013: - emkornfield commented on pull request #171: URL: https://github.com/apache/parquet-format/pull/171#issuecomment-813096900 LGTM. Is merging for this repo done through the GitHub UI? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Format] Mention that converted types are deprecated > > > Key: PARQUET-2013 > URL: https://issues.apache.org/jira/browse/PARQUET-2013 > Project: Parquet > Issue Type: Task > Components: parquet-format >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Fix For: format-2.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [parquet-format] emkornfield commented on pull request #171: PARQUET-2013: Replace back "should" with "must"
emkornfield commented on pull request #171: URL: https://github.com/apache/parquet-format/pull/171#issuecomment-813096900 LGTM. Is merging for this repo done through the GitHub UI? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (PARQUET-1143) Update Java for format 2.4.0 changes
[ https://issues.apache.org/jira/browse/PARQUET-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314618#comment-17314618 ] Dongjoon Hyun commented on PARQUET-1143: Hi, [~rdblue]. Could you set the Fix Version, please? > Update Java for format 2.4.0 changes > > > Key: PARQUET-1143 > URL: https://issues.apache.org/jira/browse/PARQUET-1143 > Project: Parquet > Issue Type: Task > Components: parquet-mr >Affects Versions: 1.9.0, 1.8.2 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-2013) [Format] Mention that converted types are deprecated
[ https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314585#comment-17314585 ] ASF GitHub Bot commented on PARQUET-2013: - pitrou opened a new pull request #171: URL: https://github.com/apache/parquet-format/pull/171 Followup to cabeea7ca4afe22f4768555a017594ce343d88df and the discussion in https://github.com/apache/parquet-format/pull/169#discussion_r606836868 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Format] Mention that converted types are deprecated > > > Key: PARQUET-2013 > URL: https://issues.apache.org/jira/browse/PARQUET-2013 > Project: Parquet > Issue Type: Task > Components: parquet-format >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Fix For: format-2.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [parquet-format] pitrou opened a new pull request #171: PARQUET-2013: Replace back "should" with "must"
pitrou opened a new pull request #171: URL: https://github.com/apache/parquet-format/pull/171 Followup to cabeea7ca4afe22f4768555a017594ce343d88df and the discussion in https://github.com/apache/parquet-format/pull/169#discussion_r606836868 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (PARQUET-2013) [Format] Mention that converted types are deprecated
[ https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314579#comment-17314579 ] ASF GitHub Bot commented on PARQUET-2013: - wesm commented on a change in pull request #169: URL: https://github.com/apache/parquet-format/pull/169#discussion_r606836868 ## File path: src/main/thrift/parquet.thrift ## @@ -316,7 +317,7 @@ struct BsonType { * LogicalType annotations to replace ConvertedType. * * To maintain compatibility, implementations using LogicalType for a - * SchemaElement must also set the corresponding ConvertedType from the + * SchemaElement should also set the corresponding ConvertedType from the Review comment: It's a change with forward compatibility implications, so either we revert or we discuss/vote, but we can't leave it like this. Impala for example only started to get support for LogicalType in 2019 (and this is only one of the mainstream independent implementations), so we'd be making a bet about how quickly updates are rolling out to all of the corners of the world https://github.com/apache/impala/commit/0906e0817ce2301d7e20f355b334861f0232f16f -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Format] Mention that converted types are deprecated > > > Key: PARQUET-2013 > URL: https://issues.apache.org/jira/browse/PARQUET-2013 > Project: Parquet > Issue Type: Task > Components: parquet-format >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Fix For: format-2.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [parquet-format] wesm commented on a change in pull request #169: PARQUET-2013: [Format] Mention that ConvertedType is deprecated
wesm commented on a change in pull request #169: URL: https://github.com/apache/parquet-format/pull/169#discussion_r606836868 ## File path: src/main/thrift/parquet.thrift ## @@ -316,7 +317,7 @@ struct BsonType { * LogicalType annotations to replace ConvertedType. * * To maintain compatibility, implementations using LogicalType for a - * SchemaElement must also set the corresponding ConvertedType from the + * SchemaElement should also set the corresponding ConvertedType from the Review comment: It's a change with forward compatibility implications, so either we revert or we discuss/vote, but we can't leave it like this. Impala for example only started to get support for LogicalType in 2019 (and this is only one of the mainstream independent implementations), so we'd be making a bet about how quickly updates are rolling out to all of the corners of the world https://github.com/apache/impala/commit/0906e0817ce2301d7e20f355b334861f0232f16f -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (PARQUET-2013) [Format] Mention that converted types are deprecated
[ https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314576#comment-17314576 ] ASF GitHub Bot commented on PARQUET-2013: - pitrou commented on a change in pull request #169: URL: https://github.com/apache/parquet-format/pull/169#discussion_r606835027 ## File path: src/main/thrift/parquet.thrift ## @@ -316,7 +317,7 @@ struct BsonType { * LogicalType annotations to replace ConvertedType. * * To maintain compatibility, implementations using LogicalType for a - * SchemaElement must also set the corresponding ConvertedType from the + * SchemaElement should also set the corresponding ConvertedType from the Review comment: That said, if people feel strongly about it, I can simply revert this one-word change, because I don't think there's any point organizing a vote on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Format] Mention that converted types are deprecated > > > Key: PARQUET-2013 > URL: https://issues.apache.org/jira/browse/PARQUET-2013 > Project: Parquet > Issue Type: Task > Components: parquet-format >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Fix For: format-2.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [parquet-format] pitrou commented on a change in pull request #169: PARQUET-2013: [Format] Mention that ConvertedType is deprecated
pitrou commented on a change in pull request #169: URL: https://github.com/apache/parquet-format/pull/169#discussion_r606835027 ## File path: src/main/thrift/parquet.thrift ## @@ -316,7 +317,7 @@ struct BsonType { * LogicalType annotations to replace ConvertedType. * * To maintain compatibility, implementations using LogicalType for a - * SchemaElement must also set the corresponding ConvertedType from the + * SchemaElement should also set the corresponding ConvertedType from the Review comment: That said, if people feel strongly about it, I can simply revert this one-word change, because I don't think there's any point organizing a vote on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (PARQUET-2013) [Format] Mention that converted types are deprecated
[ https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314575#comment-17314575 ] ASF GitHub Bot commented on PARQUET-2013: - pitrou commented on a change in pull request #169: URL: https://github.com/apache/parquet-format/pull/169#discussion_r606833694 ## File path: src/main/thrift/parquet.thrift ## @@ -316,7 +317,7 @@ struct BsonType { * LogicalType annotations to replace ConvertedType. * * To maintain compatibility, implementations using LogicalType for a - * SchemaElement must also set the corresponding ConvertedType from the + * SchemaElement should also set the corresponding ConvertedType from the Review comment: Well, given the state, even today, of inter-implementation compatibility, I think anyone using a 2017-era implementation would run into lots of issues when reading files produced by other implementations, regardless of whether logical types are used. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Format] Mention that converted types are deprecated > > > Key: PARQUET-2013 > URL: https://issues.apache.org/jira/browse/PARQUET-2013 > Project: Parquet > Issue Type: Task > Components: parquet-format >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Fix For: format-2.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [parquet-format] pitrou commented on a change in pull request #169: PARQUET-2013: [Format] Mention that ConvertedType is deprecated
pitrou commented on a change in pull request #169: URL: https://github.com/apache/parquet-format/pull/169#discussion_r606833694 ## File path: src/main/thrift/parquet.thrift ## @@ -316,7 +317,7 @@ struct BsonType { * LogicalType annotations to replace ConvertedType. * * To maintain compatibility, implementations using LogicalType for a - * SchemaElement must also set the corresponding ConvertedType from the + * SchemaElement should also set the corresponding ConvertedType from the Review comment: Well, given the state, even today, of inter-implementation compatibility, I think anyone using a 2017-era implementation would run into lots of issues when reading files produced by other implementations, regardless of whether logical types are used. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (PARQUET-2013) [Format] Mention that converted types are deprecated
[ https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314567#comment-17314567 ] ASF GitHub Bot commented on PARQUET-2013: - wesm commented on a change in pull request #169: URL: https://github.com/apache/parquet-format/pull/169#discussion_r606832265 ## File path: src/main/thrift/parquet.thrift ## @@ -316,7 +317,7 @@ struct BsonType { * LogicalType annotations to replace ConvertedType. * * To maintain compatibility, implementations using LogicalType for a - * SchemaElement must also set the corresponding ConvertedType from the + * SchemaElement should also set the corresponding ConvertedType from the Review comment: If you want to make this change, I think it needs to be voted on. Knowing the motley state of Parquet implementations in enterprise deployments I would be uncomfortable writing files that don't have the field. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Format] Mention that converted types are deprecated > > > Key: PARQUET-2013 > URL: https://issues.apache.org/jira/browse/PARQUET-2013 > Project: Parquet > Issue Type: Task > Components: parquet-format >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Fix For: format-2.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [parquet-format] wesm commented on a change in pull request #169: PARQUET-2013: [Format] Mention that ConvertedType is deprecated
wesm commented on a change in pull request #169: URL: https://github.com/apache/parquet-format/pull/169#discussion_r606832265 ## File path: src/main/thrift/parquet.thrift ## @@ -316,7 +317,7 @@ struct BsonType { * LogicalType annotations to replace ConvertedType. * * To maintain compatibility, implementations using LogicalType for a - * SchemaElement must also set the corresponding ConvertedType from the + * SchemaElement should also set the corresponding ConvertedType from the Review comment: If you want to make this change, I think it needs to be voted on. Knowing the motley state of Parquet implementations in enterprise deployments I would be uncomfortable writing files that don't have the field. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (PARQUET-2013) [Format] Mention that converted types are deprecated
[ https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314563#comment-17314563 ] ASF GitHub Bot commented on PARQUET-2013: - pitrou commented on a change in pull request #169: URL: https://github.com/apache/parquet-format/pull/169#discussion_r606831456 ## File path: src/main/thrift/parquet.thrift ## @@ -316,7 +317,7 @@ struct BsonType { * LogicalType annotations to replace ConvertedType. * * To maintain compatibility, implementations using LogicalType for a - * SchemaElement must also set the corresponding ConvertedType from the + * SchemaElement should also set the corresponding ConvertedType from the Review comment: That said, feel free to spawn a discussion on the ML. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Format] Mention that converted types are deprecated > > > Key: PARQUET-2013 > URL: https://issues.apache.org/jira/browse/PARQUET-2013 > Project: Parquet > Issue Type: Task > Components: parquet-format >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Fix For: format-2.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [parquet-format] pitrou commented on a change in pull request #169: PARQUET-2013: [Format] Mention that ConvertedType is deprecated
pitrou commented on a change in pull request #169: URL: https://github.com/apache/parquet-format/pull/169#discussion_r606831456 ## File path: src/main/thrift/parquet.thrift ## @@ -316,7 +317,7 @@ struct BsonType { * LogicalType annotations to replace ConvertedType. * * To maintain compatibility, implementations using LogicalType for a - * SchemaElement must also set the corresponding ConvertedType from the + * SchemaElement should also set the corresponding ConvertedType from the Review comment: That said, feel free to spawn a discussion on the ML. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (PARQUET-2013) [Format] Mention that converted types are deprecated
[ https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314560#comment-17314560 ] ASF GitHub Bot commented on PARQUET-2013: - pitrou commented on a change in pull request #169: URL: https://github.com/apache/parquet-format/pull/169#discussion_r606831346 ## File path: src/main/thrift/parquet.thrift ## @@ -316,7 +317,7 @@ struct BsonType { * LogicalType annotations to replace ConvertedType. * * To maintain compatibility, implementations using LogicalType for a - * SchemaElement must also set the corresponding ConvertedType from the + * SchemaElement should also set the corresponding ConvertedType from the Review comment: Logical types were introduced 3.5 years go, so it seemed reasonable to relax the requirement. Do you expect some Parquet readers in the wild to still not understand them? It's easy, for every converted type you support, to also support the corresponding logical type. It's only a small amount of logic in the decoding path, and it's not performance-critical. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Format] Mention that converted types are deprecated > > > Key: PARQUET-2013 > URL: https://issues.apache.org/jira/browse/PARQUET-2013 > Project: Parquet > Issue Type: Task > Components: parquet-format >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Fix For: format-2.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [parquet-format] pitrou commented on a change in pull request #169: PARQUET-2013: [Format] Mention that ConvertedType is deprecated
pitrou commented on a change in pull request #169: URL: https://github.com/apache/parquet-format/pull/169#discussion_r606831346 ## File path: src/main/thrift/parquet.thrift ## @@ -316,7 +317,7 @@ struct BsonType { * LogicalType annotations to replace ConvertedType. * * To maintain compatibility, implementations using LogicalType for a - * SchemaElement must also set the corresponding ConvertedType from the + * SchemaElement should also set the corresponding ConvertedType from the Review comment: Logical types were introduced 3.5 years go, so it seemed reasonable to relax the requirement. Do you expect some Parquet readers in the wild to still not understand them? It's easy, for every converted type you support, to also support the corresponding logical type. It's only a small amount of logic in the decoding path, and it's not performance-critical. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (PARQUET-2013) [Format] Mention that converted types are deprecated
[ https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314545#comment-17314545 ] ASF GitHub Bot commented on PARQUET-2013: - wesm commented on a change in pull request #169: URL: https://github.com/apache/parquet-format/pull/169#discussion_r606826402 ## File path: src/main/thrift/parquet.thrift ## @@ -316,7 +317,7 @@ struct BsonType { * LogicalType annotations to replace ConvertedType. * * To maintain compatibility, implementations using LogicalType for a - * SchemaElement must also set the corresponding ConvertedType from the + * SchemaElement should also set the corresponding ConvertedType from the Review comment: I also believed that setting ConvertedType was MUST for forward compatibility reasons. Can we please discuss on the mailing list? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Format] Mention that converted types are deprecated > > > Key: PARQUET-2013 > URL: https://issues.apache.org/jira/browse/PARQUET-2013 > Project: Parquet > Issue Type: Task > Components: parquet-format >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Fix For: format-2.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [parquet-format] wesm commented on a change in pull request #169: PARQUET-2013: [Format] Mention that ConvertedType is deprecated
wesm commented on a change in pull request #169: URL: https://github.com/apache/parquet-format/pull/169#discussion_r606826402 ## File path: src/main/thrift/parquet.thrift ## @@ -316,7 +317,7 @@ struct BsonType { * LogicalType annotations to replace ConvertedType. * * To maintain compatibility, implementations using LogicalType for a - * SchemaElement must also set the corresponding ConvertedType from the + * SchemaElement should also set the corresponding ConvertedType from the Review comment: I also believed that setting ConvertedType was MUST for forward compatibility reasons. Can we please discuss on the mailing list? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (PARQUET-2013) [Format] Mention that converted types are deprecated
[ https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314543#comment-17314543 ] ASF GitHub Bot commented on PARQUET-2013: - emkornfield commented on a change in pull request #169: URL: https://github.com/apache/parquet-format/pull/169#discussion_r606825680 ## File path: src/main/thrift/parquet.thrift ## @@ -316,7 +317,7 @@ struct BsonType { * LogicalType annotations to replace ConvertedType. * * To maintain compatibility, implementations using LogicalType for a - * SchemaElement must also set the corresponding ConvertedType from the + * SchemaElement should also set the corresponding ConvertedType from the Review comment: sorry for the belated comment by why the change from must to should? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Format] Mention that converted types are deprecated > > > Key: PARQUET-2013 > URL: https://issues.apache.org/jira/browse/PARQUET-2013 > Project: Parquet > Issue Type: Task > Components: parquet-format >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Fix For: format-2.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [parquet-format] emkornfield commented on a change in pull request #169: PARQUET-2013: [Format] Mention that ConvertedType is deprecated
emkornfield commented on a change in pull request #169: URL: https://github.com/apache/parquet-format/pull/169#discussion_r606825680 ## File path: src/main/thrift/parquet.thrift ## @@ -316,7 +317,7 @@ struct BsonType { * LogicalType annotations to replace ConvertedType. * * To maintain compatibility, implementations using LogicalType for a - * SchemaElement must also set the corresponding ConvertedType from the + * SchemaElement should also set the corresponding ConvertedType from the Review comment: sorry for the belated comment by why the change from must to should? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (PARQUET-2014) Local key wrapping with rotation
Gidon Gershinsky created PARQUET-2014: - Summary: Local key wrapping with rotation Key: PARQUET-2014 URL: https://issues.apache.org/jira/browse/PARQUET-2014 Project: Parquet Issue Type: New Feature Components: parquet-mr Reporter: Gidon Gershinsky Assignee: Gidon Gershinsky parquet-mr-1.12.0 has an experimental support for local wrapping of encryption keys, that doesn't handle master key versions and key rotation. This Jira will add these capabilities. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (PARQUET-1613) Key rotation tool
[ https://issues.apache.org/jira/browse/PARQUET-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gidon Gershinsky resolved PARQUET-1613. --- Resolution: Done handled by pr 615 > Key rotation tool > - > > Key: PARQUET-1613 > URL: https://issues.apache.org/jira/browse/PARQUET-1613 > Project: Parquet > Issue Type: Sub-task >Reporter: Gidon Gershinsky >Assignee: Maya Anderson >Priority: Major > > Rotates the master key, for both single and double wrappers. > For the latter, enables support for a single KMS call per column, in readers > of any data sets. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (PARQUET-1612) Double wrapped key manager
[ https://issues.apache.org/jira/browse/PARQUET-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gidon Gershinsky resolved PARQUET-1612. --- Resolution: Done handled by pr 615 > Double wrapped key manager > -- > > Key: PARQUET-1612 > URL: https://issues.apache.org/jira/browse/PARQUET-1612 > Project: Parquet > Issue Type: Sub-task >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > > To minimize interaction with KMS, this manager will wrap the encryption keys > twice. Might be combined with key rotation for further optimization. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (PARQUET-1945) Add an option to allow auto conversion from empty fields to NULL
[ https://issues.apache.org/jira/browse/PARQUET-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1945: Component/s: (was: parquet-format) parquet-mr > Add an option to allow auto conversion from empty fields to NULL > > > Key: PARQUET-1945 > URL: https://issues.apache.org/jira/browse/PARQUET-1945 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Reporter: Zheng Shao >Priority: Minor > > Right now, Parquet Writer throws out an exception: > {{Parquet record is malformed: empty fields are illegal, the field should be > ommited completely instead}} > when an empty field (array or struct or map I guess?) is written. > The suggestion here is to add an option "auto_convert_empty_fields_to_null" > that convert empty fields to null automatically on write. > The LOC to change is > [here:|https://sourcegraph.com/github.com/apache/parquet-mr/-/blob/parquet-column/src/main/java/org/apache/parquet/io/MessageColumnIO.java#L328] > {quote}{{if (emptyField) {}} > {{ {{ throw new ParquetEncodingException("empty fields are illegal, the > field should be ommited completely instead"); > {{}}}{quote} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1222) Specify a well-defined sorting order for float and double types
[ https://issues.apache.org/jira/browse/PARQUET-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314456#comment-17314456 ] Antoine Pitrou commented on PARQUET-1222: - I'll note that Parquet C++ now has the following behaviour: * signed zeros are properly ordered (ARROW-5562) * NaNs are ignored when computing min/max (PARQUET-1225); if a page or column chunk only has NaNs, the statistics are unset > Specify a well-defined sorting order for float and double types > --- > > Key: PARQUET-1222 > URL: https://issues.apache.org/jira/browse/PARQUET-1222 > Project: Parquet > Issue Type: Bug > Components: parquet-format >Reporter: Zoltan Ivanfi >Priority: Critical > > Currently parquet-format specifies the sort order for floating point numbers > as follows: > {code:java} >* FLOAT - signed comparison of the represented value >* DOUBLE - signed comparison of the represented value > {code} > The problem is that the comparison of floating point numbers is only a > partial ordering with strange behaviour in specific corner cases. For > example, according to IEEE 754, -0 is neither less nor more than \+0 and > comparing NaN to anything always returns false. This ordering is not suitable > for statistics. Additionally, the Java implementation already uses a > different (total) ordering that handles these cases correctly but differently > than the C\+\+ implementations, which leads to interoperability problems. > TypeDefinedOrder for doubles and floats should be deprecated and a new > TotalFloatingPointOrder should be introduced. The default for writing doubles > and floats would be the new TotalFloatingPointOrder. This ordering should be > effective and easy to implement in all programming languages. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (PARQUET-1852) Array Index OutOf Bounds Exception when fall Back Dictionary Encoded Data
[ https://issues.apache.org/jira/browse/PARQUET-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1852: Component/s: (was: parquet-format) parquet-mr > Array Index OutOf Bounds Exception when fall Back Dictionary Encoded Data > - > > Key: PARQUET-1852 > URL: https://issues.apache.org/jira/browse/PARQUET-1852 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Reporter: jiangbo >Priority: Major > > java.lang.ArrayIndexOutOfBoundsException: 39782 > \n\tat > org.apache.parquet.column.values.dictionary.DictionaryValuesWriter$PlainBinaryDictionaryValuesWriter.fallBackDictionaryEncodedData(DictionaryValuesWriter.java:284) > \n\tat > org.apache.parquet.column.values.dictionary.DictionaryValuesWriter.fallBackAllValuesTo(DictionaryValuesWriter.java:123) > \n\tat > org.apache.parquet.column.values.fallback.FallbackValuesWriter.fallBack(FallbackValuesWriter.java:147) > \n\tat > org.apache.parquet.column.values.fallback.FallbackValuesWriter.checkFallback(FallbackValuesWriter.java:141) > \n\tat > org.apache.parquet.column.values.fallback.FallbackValuesWriter.writeBytes(FallbackValuesWriter.java:163) > \n\tat > org.apache.parquet.column.impl.ColumnWriterV1.write(ColumnWriterV1.java:201) > \n\tat > org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.addBinary(MessageColumnIO.java:467) > \n\tat > org.apache.parquet.io.RecordConsumerLoggingWrapper.addBinary(RecordConsumerLoggingWrapper.java:119) > \n\tat > org.apache.parquet.example.data.simple.BinaryValue.writeValue(BinaryValue.java:45) > \n\tat > org.apache.parquet.example.data.simple.SimpleGroup.writeValue(SimpleGroup.java:229) > \n\tat > org.apache.parquet.example.data.GroupWriter.writeGroup(GroupWriter.java:51) > \n\tat org.apache.parquet.example.data.GroupWriter.write(GroupWriter.java:37) > \n\tat > org.apache.parquet.hadoop.example.GroupWriteSupport.write(GroupWriteSupport.java:79) > \n\tat > org.apache.parquet.hadoop.example.GroupWriteSupport.write(GroupWriteSupport.java:36) > \n\tat > org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:123) > \n\tat org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:293) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (PARQUET-1881) How to enable sorted array flag while writing a column
[ https://issues.apache.org/jira/browse/PARQUET-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1881: Priority: Trivial (was: Blocker) > How to enable sorted array flag while writing a column > -- > > Key: PARQUET-1881 > URL: https://issues.apache.org/jira/browse/PARQUET-1881 > Project: Parquet > Issue Type: Task > Components: parquet-avro, parquet-cpp, parquet-format, parquet-mr, > parquet-thrift >Reporter: Khasim Shaik >Priority: Trivial > Labels: newbie > > I want to understand how can we enable the flag "sortedArray" information in > metadata while writing a row group or column > I am exploring parquet.thrift to understand more about metadata, > I observed a field in metadata which is related to below struct in > parquet.thrift > I am wondering how to set these fields from parquet while writing a column or > rowgroup > struct SortingColumn { > /** The column index (in this row group) **/ > 1: required i32 column_idx > /** If true, indicates this column is sorted in descending order. **/ > 2: required bool descending > /** If true, nulls will come before non-null values, otherwise, >* nulls go at the end. */ > 3: required bool nulls_first > } -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1881) How to enable sorted array flag while writing a column
[ https://issues.apache.org/jira/browse/PARQUET-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314451#comment-17314451 ] Antoine Pitrou commented on PARQUET-1881: - Belated answer, but Parquet C++ doesn't use the SortingColumn information. > How to enable sorted array flag while writing a column > -- > > Key: PARQUET-1881 > URL: https://issues.apache.org/jira/browse/PARQUET-1881 > Project: Parquet > Issue Type: Task > Components: parquet-avro, parquet-cpp, parquet-format, parquet-mr, > parquet-thrift >Reporter: Khasim Shaik >Priority: Blocker > Labels: newbie > > I want to understand how can we enable the flag "sortedArray" information in > metadata while writing a row group or column > I am exploring parquet.thrift to understand more about metadata, > I observed a field in metadata which is related to below struct in > parquet.thrift > I am wondering how to set these fields from parquet while writing a column or > rowgroup > struct SortingColumn { > /** The column index (in this row group) **/ > 1: required i32 column_idx > /** If true, indicates this column is sorted in descending order. **/ > 2: required bool descending > /** If true, nulls will come before non-null values, otherwise, >* nulls go at the end. */ > 3: required bool nulls_first > } -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (PARQUET-1881) How to enable sorted array flag while writing a column
[ https://issues.apache.org/jira/browse/PARQUET-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved PARQUET-1881. - Resolution: Not A Bug Closing since this is a user question, there is nothing to track here. > How to enable sorted array flag while writing a column > -- > > Key: PARQUET-1881 > URL: https://issues.apache.org/jira/browse/PARQUET-1881 > Project: Parquet > Issue Type: Task > Components: parquet-avro, parquet-cpp, parquet-format, parquet-mr, > parquet-thrift >Reporter: Khasim Shaik >Priority: Trivial > Labels: newbie > > I want to understand how can we enable the flag "sortedArray" information in > metadata while writing a row group or column > I am exploring parquet.thrift to understand more about metadata, > I observed a field in metadata which is related to below struct in > parquet.thrift > I am wondering how to set these fields from parquet while writing a column or > rowgroup > struct SortingColumn { > /** The column index (in this row group) **/ > 1: required i32 column_idx > /** If true, indicates this column is sorted in descending order. **/ > 2: required bool descending > /** If true, nulls will come before non-null values, otherwise, >* nulls go at the end. */ > 3: required bool nulls_first > } -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (PARQUET-2013) [Format] Mention that converted types are deprecated
[ https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved PARQUET-2013. - Resolution: Fixed Fixed by GH PR [https://github.com/apache/parquet-format/pull/169] > [Format] Mention that converted types are deprecated > > > Key: PARQUET-2013 > URL: https://issues.apache.org/jira/browse/PARQUET-2013 > Project: Parquet > Issue Type: Task > Components: parquet-format >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Fix For: format-2.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-2013) [Format] Mention that converted types are deprecated
[ https://issues.apache.org/jira/browse/PARQUET-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314449#comment-17314449 ] ASF GitHub Bot commented on PARQUET-2013: - pitrou merged pull request #169: URL: https://github.com/apache/parquet-format/pull/169 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Format] Mention that converted types are deprecated > > > Key: PARQUET-2013 > URL: https://issues.apache.org/jira/browse/PARQUET-2013 > Project: Parquet > Issue Type: Task > Components: parquet-format >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Fix For: format-2.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [parquet-format] pitrou merged pull request #169: PARQUET-2013: [Format] Mention that ConvertedType is deprecated
pitrou merged pull request #169: URL: https://github.com/apache/parquet-format/pull/169 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org