[jira] [Commented] (PARQUET-1322) Statistics is not available for DECIMAL types

2018-06-12 Thread Vlad Rozov (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510414#comment-16510414
 ] 

Vlad Rozov commented on PARQUET-1322:
-

As statistics for DECIMAL encoded as INT32/INT64 were available prior to 
PARQUET-686 fix, I would still consider this as a bug, but I do agree that it 
is not critical. I'll try to put together a fix.

> Statistics is not available for DECIMAL types
> -
>
> Key: PARQUET-1322
> URL: https://issues.apache.org/jira/browse/PARQUET-1322
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Minor
>
> According to parquet format specification columns annotated as DECIMAL should 
> use SIGNED comparator and statistics should be available. The sort order 
> returned by {{org.apache.parquet.format.converter.ParquetMetadataConverter}} 
> for DECIMAL is {{SortOrder.UNKNOWN}} which contradicts the specification and 
> makes statistics for DECIMAL types unavailable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1322) Statistics is not available for DECIMAL types

2018-06-12 Thread Vlad Rozov (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Rozov updated PARQUET-1322:

Priority: Minor  (was: Critical)

> Statistics is not available for DECIMAL types
> -
>
> Key: PARQUET-1322
> URL: https://issues.apache.org/jira/browse/PARQUET-1322
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Minor
>
> According to parquet format specification columns annotated as DECIMAL should 
> use SIGNED comparator and statistics should be available. The sort order 
> returned by {{org.apache.parquet.format.converter.ParquetMetadataConverter}} 
> for DECIMAL is {{SortOrder.UNKNOWN}} which contradicts the specification and 
> makes statistics for DECIMAL types unavailable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1322) Statistics is not available for DECIMAL types

2018-06-11 Thread Vlad Rozov (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508156#comment-16508156
 ] 

Vlad Rozov commented on PARQUET-1322:
-

Should statistics be available for DECIMAL if it is encoded as INT32/INT64? For 
FIXED_LEN_BYTE_ARRAY and BINARY, was BigInteger/BigDecimal used for comparison 
when old statistics were written or it was compared as byte[]?

> Statistics is not available for DECIMAL types
> -
>
> Key: PARQUET-1322
> URL: https://issues.apache.org/jira/browse/PARQUET-1322
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Critical
>
> According to parquet format specification columns annotated as DECIMAL should 
> use SIGNED comparator and statistics should be available. The sort order 
> returned by {{org.apache.parquet.format.converter.ParquetMetadataConverter}} 
> for DECIMAL is {{SortOrder.UNKNOWN}} which contradicts the specification and 
> makes statistics for DECIMAL types unavailable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1322) Statistics is not available for DECIMAL types

2018-06-11 Thread Vlad Rozov (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508120#comment-16508120
 ] 

Vlad Rozov commented on PARQUET-1322:
-

I agree that it is not worth fixing the write code path for the old DECIMAL 
statistics if the new min/max is available. IMO, it is still necessary to fix 
the code path for reading DECIMAL statistics old values when new values are not 
available assuming that old values are correct.

> Statistics is not available for DECIMAL types
> -
>
> Key: PARQUET-1322
> URL: https://issues.apache.org/jira/browse/PARQUET-1322
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Critical
>
> According to parquet format specification columns annotated as DECIMAL should 
> use SIGNED comparator and statistics should be available. The sort order 
> returned by {{org.apache.parquet.format.converter.ParquetMetadataConverter}} 
> for DECIMAL is {{SortOrder.UNKNOWN}} which contradicts the specification and 
> makes statistics for DECIMAL types unavailable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1322) Statistics is not available for DECIMAL types

2018-06-10 Thread Vlad Rozov (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Rozov updated PARQUET-1322:

Affects Version/s: 1.9.0
   1.10.0

> Statistics is not available for DECIMAL types
> -
>
> Key: PARQUET-1322
> URL: https://issues.apache.org/jira/browse/PARQUET-1322
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Vlad Rozov
>Priority: Critical
>
> According to parquet format specification columns annotated as DECIMAL should 
> use SIGNED comparator and statistics should be available. The sort order 
> returned by {{org.apache.parquet.format.converter.ParquetMetadataConverter}} 
> for DECIMAL is {{SortOrder.UNKNOWN}} which contradicts the specification and 
> makes statistics for DECIMAL types unavailable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1322) Statistics is not available for DECIMAL types

2018-06-10 Thread Vlad Rozov (JIRA)
Vlad Rozov created PARQUET-1322:
---

 Summary: Statistics is not available for DECIMAL types
 Key: PARQUET-1322
 URL: https://issues.apache.org/jira/browse/PARQUET-1322
 Project: Parquet
  Issue Type: Bug
Reporter: Vlad Rozov


According to parquet format specification columns annotated as DECIMAL should 
use SIGNED comparator and statistics should be available. The sort order 
returned by {{org.apache.parquet.format.converter.ParquetMetadataConverter}} 
for DECIMAL is {{SortOrder.UNKNOWN}} which contradicts the specification and 
makes statistics for DECIMAL types unavailable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1295) Parquet libraries do not follow proper semantic versioning

2018-05-21 Thread Vlad Rozov (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483216#comment-16483216
 ] 

Vlad Rozov commented on PARQUET-1295:
-

Parquet libraries do not follow proper semantic versioning as any outside 
developer can't reliably say whether she is using API that is stable and is 
subject of the semantic versioning or it is an internal API and an upgrade to a 
newer version may require a significant code changes. Take a look at 
{{org.apache.parquet.column.values.ValuesReader}}. Nothing in the package name, 
class or method annotations suggests that it is an internal class that is not 
subject of semantic versioning. That information is hidden somewhere in the pom 
file. The class even has java documentation, that implies that it is not 
"internal".

The same applies to classes and methods added since 1.7.0. How may an external 
developer know that a new method added after 1.7.0 to a class that existed 
before 1.7.0 is not subject of the semantic versioning and avoid using it?

> Parquet libraries do not follow proper semantic versioning
> --
>
> Key: PARQUET-1295
> URL: https://issues.apache.org/jira/browse/PARQUET-1295
> Project: Parquet
>  Issue Type: Bug
>Reporter: Vlad Rozov
>Priority: Major
>
> There are changes between 1.8.0 and 1.10.0 that break API compatibility. A 
> minor version change is supposed to be backward compatible with 1.9.0 and 
> 1.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1302) ParquetProperties should not use static ValuesWriterFactory

2018-05-17 Thread Vlad Rozov (JIRA)
Vlad Rozov created PARQUET-1302:
---

 Summary: ParquetProperties should not use static 
ValuesWriterFactory
 Key: PARQUET-1302
 URL: https://issues.apache.org/jira/browse/PARQUET-1302
 Project: Parquet
  Issue Type: Bug
Reporter: Vlad Rozov


{{ParquetProperties}} by default is initialized with static 
{{ValuesWriterFactory}}, but during {{ParquetProperties}} construction 
{{ValuesWriterFactory}} is initialized with property. It means that if I 
construct two {{ParquetProperties}} with different properties (for example 
allocators), both will reference a static {{ValuesWriterFactory}} that would be 
initialized with the latest {{ParquetProperties}}. The same problem applies to 
{{DefaultValuesWriterFactory}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1295) Parquet libraries do not follow proper semantic versioning

2018-05-17 Thread Vlad Rozov (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479138#comment-16479138
 ] 

Vlad Rozov commented on PARQUET-1295:
-

I don't see anything other than the exclusion of 
{{org.apache.parquet.column.*}} from semantic versioning that suggests that it 
is not public. It must be part of semantic versioning, IMO.

> Parquet libraries do not follow proper semantic versioning
> --
>
> Key: PARQUET-1295
> URL: https://issues.apache.org/jira/browse/PARQUET-1295
> Project: Parquet
>  Issue Type: Bug
>Reporter: Vlad Rozov
>Priority: Major
>
> There are changes between 1.8.0 and 1.10.0 that break API compatibility. A 
> minor version change is supposed to be backward compatible with 1.9.0 and 
> 1.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1295) Parquet libraries do not follow proper semantic versioning

2018-05-10 Thread Vlad Rozov (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471293#comment-16471293
 ] 

Vlad Rozov commented on PARQUET-1295:
-

My guess is that semantic versioning is enforced for the API that existed in 
1.7.0, but it is not enforced for any new API introduced in 1.8.x or 1.9.x.

> Parquet libraries do not follow proper semantic versioning
> --
>
> Key: PARQUET-1295
> URL: https://issues.apache.org/jira/browse/PARQUET-1295
> Project: Parquet
>  Issue Type: Bug
>Reporter: Vlad Rozov
>Priority: Major
>
> There are changes between 1.8.0 and 1.10.0 that break API compatibility. A 
> minor version change is supposed to be backward compatible with 1.9.0 and 
> 1.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1295) Parquet libraries do not follow proper semantic versioning

2018-05-10 Thread Vlad Rozov (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1647#comment-1647
 ] 

Vlad Rozov commented on PARQUET-1295:
-

Just as an example, there are semver incompatible changes to 
{{org.apache.parquet.column.values.ValuesReader}}.

> Parquet libraries do not follow proper semantic versioning
> --
>
> Key: PARQUET-1295
> URL: https://issues.apache.org/jira/browse/PARQUET-1295
> Project: Parquet
>  Issue Type: Bug
>Reporter: Vlad Rozov
>Priority: Major
>
> There are changes between 1.8.0 and 1.10.0 that break API compatibility. A 
> minor version change is supposed to be backward compatible with 1.9.0 and 
> 1.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1295) Parquet libraries do not follow proper semantic versioning

2018-05-09 Thread Vlad Rozov (JIRA)
Vlad Rozov created PARQUET-1295:
---

 Summary: Parquet libraries do not follow proper semantic versioning
 Key: PARQUET-1295
 URL: https://issues.apache.org/jira/browse/PARQUET-1295
 Project: Parquet
  Issue Type: Bug
Reporter: Vlad Rozov


There are changes between 1.8.0 and 1.10.0 that break API compatibility. A 
minor version change is supposed to be backward compatible with 1.9.0 and 1.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)