[jira] [Created] (PARQUET-529) Avoid evoking job.toString() in ParquetLoader
Liwei Lin created PARQUET-529: - Summary: Avoid evoking job.toString() in ParquetLoader Key: PARQUET-529 URL: https://issues.apache.org/jira/browse/PARQUET-529 Project: Parquet Issue Type: Bug Components: parquet-pig Affects Versions: 1.8.0, 1.8.1 Reporter: Liwei Lin Assignee: Liwei Lin Fix For: 1.9.0 When ran under hadoop2 environment and log level setting to _DEBUG_, _ParquetLoader_ would evoke _job.toString()_ in several methods, which might cause the whole application to stop due to : {quote} java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING at org.apache.hadoop.mapreduce.Job.ensureState(Job.java:283) at org.apache.hadoop.mapreduce.Job.toString(Job.java:452) at java.lang.String.valueOf(String.java:2847) at java.lang.StringBuilder.append(StringBuilder.java:128) at org.apache.parquet.pig.ParquetLoader.getSchema(ParquetLoader.java:260) at org.apache.parquet.pig.TestParquetLoader.testSchema(TestParquetLoader.java:54) ... {quote} The reason is that in the hadoop 2.x branch, _org.apache.hadoop.mapreduce.Job.toString()_ has added an _ensureState(JobState.RUNNING)_ check; see map-reduce: Job.java#452. In contrast, the hadoop 1.x branch does not contain such checks, so _ParquetLoader_ works well. This ticket simply avoids evoking _job.toString()_ in _ParquetLoader_. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PARQUET-528) Fix flush() for RecordConsumer and implementations
Liwei Lin created PARQUET-528: - Summary: Fix flush() for RecordConsumer and implementations Key: PARQUET-528 URL: https://issues.apache.org/jira/browse/PARQUET-528 Project: Parquet Issue Type: Bug Components: parquet-mr Affects Versions: 1.8.0, 1.8.1 Reporter: Liwei Lin Assignee: Liwei Lin Fix For: 1.9.0 _+flush()+_ was added in _+RecordConsumer+_ and _+MessageColumnIO+_ to help implementing nulls caching. However, other _+RecordConsumer+_ implementations should also implements _+flush()+_ properly. For instance, _+RecordConsumerLoggingWrapper+_ and _+ValidatingRecordConsumer+_ should call _+delegate.flush()+_ in their _+flush()+_ methods, otherwise data might be mistakenly truncated. This ticket: - makes _+flush()+_ abstract in _+RecordConsumer+_ - implements _+flush()+_ properly for all _+RecordConsumer+_ subclasses, specifically: -- _+RecordConsumerLoggingWrapper+_ -- _+ValidatingRecordConsumer+_ -- _+ConverterConsumer+_ -- _+ExpectationValidatingRecordConsumer+_ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-401) Deprecate Log and move to SLF4J Logger
[ https://issues.apache.org/jira/browse/PARQUET-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127535#comment-15127535 ] Liwei Lin commented on PARQUET-401: --- Looked through the code base and have found hundreds of Log.xxx() usages within 90+ classes. Should take 3 days to replace all of them. Do we want to get this in 1.9.0? I think it'd better not delay the release. > Deprecate Log and move to SLF4J Logger > -- > > Key: PARQUET-401 > URL: https://issues.apache.org/jira/browse/PARQUET-401 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.8.1 >Reporter: Ryan Blue > > The current Log class is intended to allow swapping out logger back-ends, but > SLF4J already does this. It also doesn't expose as nice of an API as SLF4J, > which can handle formatting to avoid the cost of building log messages that > won't be used. I think we should deprecate the org.apache.parquet.Log class > and move to using SLF4J directly, instead of wrapping SLF4J (PARQUET-305). > This will require deprecating the current Log class and replacing the current > uses of it with SLF4J. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-401) Deprecate Log and move to SLF4J Logger
[ https://issues.apache.org/jira/browse/PARQUET-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127515#comment-15127515 ] Liwei Lin commented on PARQUET-401: --- hi [~julienledem], [~rdblue], and [~liancheng]: Now that [Parquet-305|https://issues.apache.org/jira/browse/PARQUET-305] has been merged, maybe we should consider replacing all Log.java usages with slf4j? Should anyone hasn't started it yet, I'd like to do this. Will remove the +if (Log.DEBUG)+ condition, and place the original +LOG.debug("msg is" + msg)+ with the slfj4 parameterized form +LOG.debug("msg is {}", msg)+, leaving it for slf4j to judge if the certain log level is enabled or not. > Deprecate Log and move to SLF4J Logger > -- > > Key: PARQUET-401 > URL: https://issues.apache.org/jira/browse/PARQUET-401 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.8.1 >Reporter: Ryan Blue > > The current Log class is intended to allow swapping out logger back-ends, but > SLF4J already does this. It also doesn't expose as nice of an API as SLF4J, > which can handle formatting to avoid the cost of building log messages that > won't be used. I think we should deprecate the org.apache.parquet.Log class > and move to using SLF4J directly, instead of wrapping SLF4J (PARQUET-305). > This will require deprecating the current Log class and replacing the current > uses of it with SLF4J. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PARQUET-495) Fix mismatches in Types class comments
Liwei Lin created PARQUET-495: - Summary: Fix mismatches in Types class comments Key: PARQUET-495 URL: https://issues.apache.org/jira/browse/PARQUET-495 Project: Parquet Issue Type: Bug Components: parquet-mr Affects Versions: 1.8.0, 1.8.1 Reporter: Liwei Lin Assignee: Liwei Lin Priority: Trivial Fix For: 1.9.0 To produce: required group User \{ required int64 id; *optional* binary email (UTF8); \} we should do: Types.requiredGroup() .required(INT64).named("id") .-*required* (BINARY).as(UTF8).named("email")- .*optional* (BINARY).as(UTF8).named("email") .named("User") -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PARQUET-484) Add precision, scale compatibility check for DecimalMetadata
Liwei Lin created PARQUET-484: - Summary: Add precision, scale compatibility check for DecimalMetadata Key: PARQUET-484 URL: https://issues.apache.org/jira/browse/PARQUET-484 Project: Parquet Issue Type: Improvement Components: parquet-mr Affects Versions: 1.8.0, 1.8.1 Reporter: Liwei Lin Assignee: Liwei Lin We have some constrains for the Decimal type's precision and scale, as is documented in https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md: int32: for 1 <= precision <= 9 int64: for 1 <= precision <= 18; precision < 10 will produce a warning This JIRA proposes to implements the constrains, specifically: Throws Exception when any of the following does not hold: - precision > 0 && scale >= 0 - precision >= scale - for int32, precision <= 9 - for int64, precision <= 18 Throws an Warning when: - for int64, precision < 10 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PARQUET-484) Warn when Decimal is stored as INT64 while could be stored as INT32
[ https://issues.apache.org/jira/browse/PARQUET-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liwei Lin updated PARQUET-484: -- Description: As is documented in https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md: int32: for 1 <= precision <= 9 int64: for 1 <= precision <= 18; precision < 10 will produce a warning This JIRA proposes to implements the waring part. was: We have some constrains for the Decimal type's precision and scale, as is documented in https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md: int32: for 1 <= precision <= 9 int64: for 1 <= precision <= 18; precision < 10 will produce a warning This JIRA proposes to implements the constrains, specifically: Throws Exception when any of the following does not hold: - precision > 0 && scale >= 0 - precision >= scale - for int32, precision <= 9 - for int64, precision <= 18 Throws an Warning when: - for int64, precision < 10 > Warn when Decimal is stored as INT64 while could be stored as INT32 > --- > > Key: PARQUET-484 > URL: https://issues.apache.org/jira/browse/PARQUET-484 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.8.0, 1.8.1 >Reporter: Liwei Lin >Assignee: Liwei Lin > > As is documented in > https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md: > int32: for 1 <= precision <= 9 > int64: for 1 <= precision <= 18; precision < 10 will produce a warning > This JIRA proposes to implements the waring part. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PARQUET-484) Warn when Decimal is stored as INT64 while could be stored as INT32
[ https://issues.apache.org/jira/browse/PARQUET-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liwei Lin updated PARQUET-484: -- Summary: Warn when Decimal is stored as INT64 while could be stored as INT32 (was: Add precision, scale compatibility check for DecimalMetadata) > Warn when Decimal is stored as INT64 while could be stored as INT32 > --- > > Key: PARQUET-484 > URL: https://issues.apache.org/jira/browse/PARQUET-484 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.8.0, 1.8.1 >Reporter: Liwei Lin >Assignee: Liwei Lin > > We have some constrains for the Decimal type's precision and scale, as is > documented in > https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md: > int32: for 1 <= precision <= 9 > int64: for 1 <= precision <= 18; precision < 10 will produce a warning > This JIRA proposes to implements the constrains, specifically: > Throws Exception when any of the following does not hold: > - precision > 0 && scale >= 0 > - precision >= scale > - for int32, precision <= 9 > - for int64, precision <= 18 > Throws an Warning when: > - for int64, precision < 10 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PARQUET-431) Make ParquetOutputFormat.memoryManager volatile
Liwei Lin created PARQUET-431: - Summary: Make ParquetOutputFormat.memoryManager volatile Key: PARQUET-431 URL: https://issues.apache.org/jira/browse/PARQUET-431 Project: Parquet Issue Type: Bug Components: parquet-mr Affects Versions: 1.8.0, 1.8.1 Reporter: Liwei Lin Assignee: Liwei Lin Fix For: 1.9.0 Currently ParquetOutputFormat.getRecordWriter() contains an unsynchronized lazy initialization of the non-volatile static field *memoryManager*. Because the compiler or processor may reorder instructions, threads are not guaranteed to see a completely initialized object, when ParquetOutputFormat.getRecordWriter() is called by multiple threads. This ticket proposes to make *memoryManager* volatile to correct the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PARQUET-429) Make predicates collect referred columns
Liwei Lin created PARQUET-429: - Summary: Make predicates collect referred columns Key: PARQUET-429 URL: https://issues.apache.org/jira/browse/PARQUET-429 Project: Parquet Issue Type: New Feature Components: parquet-mr Affects Versions: 1.8.0, 1.8.1 Reporter: Liwei Lin Assignee: Liwei Lin Fix For: 1.9.0 Sometimes we need to collect the columns referred by the predicates to, say, do validation. This issue propose to enable all 3 FilterCompats to collect the referred columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-426) Throw Exception when predicate contains columns not specified in prejection, to prevent filtering out data improperly
[ https://issues.apache.org/jira/browse/PARQUET-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097553#comment-15097553 ] Liwei Lin commented on PARQUET-426: --- Will issue a PR for this ticket as soon as PR#310 for PARQUET-429 is merged. > Throw Exception when predicate contains columns not specified in prejection, > to prevent filtering out data improperly > - > > Key: PARQUET-426 > URL: https://issues.apache.org/jira/browse/PARQUET-426 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.8.0, 1.8.1 >Reporter: Liwei Lin >Assignee: Liwei Lin > Fix For: 1.9.0 > > > As is reported by Parquet-425, data will be filtered out improperly under > certain cases. > Before Parquet-425 is fixed, let's throw an Exception to warn the upper > application that work-arounds should be done. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PARQUET-429) Enables predicates collecting their referred columns
[ https://issues.apache.org/jira/browse/PARQUET-429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liwei Lin updated PARQUET-429: -- Summary: Enables predicates collecting their referred columns (was: Make predicates collect referred columns) > Enables predicates collecting their referred columns > > > Key: PARQUET-429 > URL: https://issues.apache.org/jira/browse/PARQUET-429 > Project: Parquet > Issue Type: New Feature > Components: parquet-mr >Affects Versions: 1.8.0, 1.8.1 >Reporter: Liwei Lin >Assignee: Liwei Lin > Fix For: 1.9.0 > > > Sometimes we need to collect the columns referred by the predicates to, say, > do validation. > This issue propose to enable all 3 FilterCompats to collect the referred > columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PARQUET-425) Fix the bug when predicate contains columns not specified in prejection, to prevent filtering out data improperly
[ https://issues.apache.org/jira/browse/PARQUET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liwei Lin updated PARQUET-425: -- Description: As is reported in Parquet-427, data will be filtered out improperly under the case where: 1. predicates - requested schema ≠ ∅ 2. predicates - file schema = ∅ To give an example: data: |a|b| |1|1| |2|2| |3|3| file schema: a,b, requested schema: a, and predicate: b = 1 we should get: |a| |1| but we'll end up get nothing, which is wrong. This issue proposes to fix this. was: As is reported in Parquet-427, data will be filtered out improperly under the case where: 1. predicates - requested schema ≠ ∅ 2. predicates - file schema = ∅ To give an example: data: |a|b| |1|1| |2|2| |3|3| file schema: a,b, requested schema: a, and predicate: b = 1 we should get: |a| |1| but we'll end up get nothing, which is wrong. > Fix the bug when predicate contains columns not specified in prejection, to > prevent filtering out data improperly > - > > Key: PARQUET-425 > URL: https://issues.apache.org/jira/browse/PARQUET-425 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.8.0, 1.8.1 >Reporter: Liwei Lin >Assignee: Liwei Lin > Fix For: 1.9.0 > > > As is reported in Parquet-427, data will be filtered out improperly under the > case where: > 1. predicates - requested schema ≠ ∅ > 2. predicates - file schema = ∅ > To give an example: > data: > |a|b| > |1|1| > |2|2| > |3|3| > file schema: a,b, requested schema: a, and predicate: b = 1 > we should get: > |a| > |1| > but we'll end up get nothing, which is wrong. > This issue proposes to fix this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PARQUET-425) Fix the bug when predicate contains columns not specified in prejection, to prevent filtering out data improperly
Liwei Lin created PARQUET-425: - Summary: Fix the bug when predicate contains columns not specified in prejection, to prevent filtering out data improperly Key: PARQUET-425 URL: https://issues.apache.org/jira/browse/PARQUET-425 Project: Parquet Issue Type: Bug Components: parquet-mr Affects Versions: 1.8.0, 1.8.1 Reporter: Liwei Lin Assignee: Liwei Lin Fix For: 1.9.0 As is reported in Parquet-427, data will be filtered out improperly under the case where: 1. requested schema ∩ predicates ≠ ∅ 2. (requested schema ∪ predicates) - file schema = ∅ To give an example: data: a: int b: int 1 1 2 2 3 3 file schema: a,b, requested schema: a, and predicate: b = 1 we should get: a: int 1 but we'll end up get nothing, which is wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PARQUET-427) Push predicates into the whole read path
[ https://issues.apache.org/jira/browse/PARQUET-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liwei Lin updated PARQUET-427: -- Attachment: Parquet-427 v0.1.pdf [~julienledem][~rdblue] would you take a look at it, please? Any comments are welcome. Also, who else should we @ ? :-) > Push predicates into the whole read path > > > Key: PARQUET-427 > URL: https://issues.apache.org/jira/browse/PARQUET-427 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.8.0, 1.8.1 >Reporter: Liwei Lin >Assignee: Liwei Lin > Fix For: 1.9.0 > > Attachments: Parquet-427 v0.1.pdf > > > Regarding primitive types, there are 3 import primitive types set: > - the file schema primitive type set > - the requested schema primitive type set > - the predicates primitive type set > Currently the file schema primitive type set and the requested schema > primitive type set has been pushed into the whole read path, but not the the > predicates primitive type set. > This brings some problems like: > - PARQUET-389, SPARK-11103, SPARK-11434 > - PARQUET-425, PARQUET-426 > - PARQUET-295 > This issue propose to push the predicates primitive type set into the whole > read path as well. When this is resolved, hopefully the issues listed above > would also be resolved as well. > Attached is a change proposal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PARQUET-425) Fix the bug when predicate contains columns not specified in prejection, to prevent filtering out data improperly
[ https://issues.apache.org/jira/browse/PARQUET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liwei Lin updated PARQUET-425: -- Description: As is reported in Parquet-427, data will be filtered out improperly under the case where: 1. requested schema ∩ predicates ≠ ∅ 2. (requested schema ∪ predicates) - file schema = ∅ To give an example: data: |a|b| |1|1| |2|2| |3|3| file schema: a,b, requested schema: a, and predicate: b = 1 we should get: |a| |1| but we'll end up get nothing, which is wrong. was: As is reported in Parquet-427, data will be filtered out improperly under the case where: 1. requested schema ∩ predicates ≠ ∅ 2. (requested schema ∪ predicates) - file schema = ∅ To give an example: data: +---+ | b| +---+ +---+ file schema: a,b, requested schema: a, and predicate: b = 1 we should get: a: int 1 but we'll end up get nothing, which is wrong. > Fix the bug when predicate contains columns not specified in prejection, to > prevent filtering out data improperly > - > > Key: PARQUET-425 > URL: https://issues.apache.org/jira/browse/PARQUET-425 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.8.0, 1.8.1 >Reporter: Liwei Lin >Assignee: Liwei Lin > Fix For: 1.9.0 > > > As is reported in Parquet-427, data will be filtered out improperly under the > case where: > 1. requested schema ∩ predicates ≠ ∅ > 2. (requested schema ∪ predicates) - file schema = ∅ > To give an example: > data: > |a|b| > |1|1| > |2|2| > |3|3| > file schema: a,b, requested schema: a, and predicate: b = 1 > we should get: > |a| > |1| > but we'll end up get nothing, which is wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PARQUET-425) Fix the bug when predicate contains columns not specified in prejection, to prevent filtering out data improperly
[ https://issues.apache.org/jira/browse/PARQUET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liwei Lin updated PARQUET-425: -- Description: As is reported in Parquet-427, data will be filtered out improperly under the case where: 1. predicates - requested schema ≠ ∅ 2. predicates - file schema = ∅ To give an example: data: |a|b| |1|1| |2|2| |3|3| file schema: a,b, requested schema: a, and predicate: b = 1 we should get: |a| |1| but we'll end up get nothing, which is wrong. was: As is reported in Parquet-427, data will be filtered out improperly under the case where: 1. requested schema ∩ predicates ≠ ∅ 2. (requested schema ∪ predicates) - file schema = ∅ To give an example: data: |a|b| |1|1| |2|2| |3|3| file schema: a,b, requested schema: a, and predicate: b = 1 we should get: |a| |1| but we'll end up get nothing, which is wrong. > Fix the bug when predicate contains columns not specified in prejection, to > prevent filtering out data improperly > - > > Key: PARQUET-425 > URL: https://issues.apache.org/jira/browse/PARQUET-425 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.8.0, 1.8.1 >Reporter: Liwei Lin >Assignee: Liwei Lin > Fix For: 1.9.0 > > > As is reported in Parquet-427, data will be filtered out improperly under the > case where: > 1. predicates - requested schema ≠ ∅ > 2. predicates - file schema = ∅ > To give an example: > data: > |a|b| > |1|1| > |2|2| > |3|3| > file schema: a,b, requested schema: a, and predicate: b = 1 > we should get: > |a| > |1| > but we'll end up get nothing, which is wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PARQUET-426) Throw Exception when predicate contains columns not specified in prejection, to prevent filtering out data improperly
[ https://issues.apache.org/jira/browse/PARQUET-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liwei Lin updated PARQUET-426: -- External issue URL: https://issues.apache.org/jira/browse/PARQUET-425 External issue ID: (was: PARQUET-425) > Throw Exception when predicate contains columns not specified in prejection, > to prevent filtering out data improperly > - > > Key: PARQUET-426 > URL: https://issues.apache.org/jira/browse/PARQUET-426 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.8.0, 1.8.1 >Reporter: Liwei Lin >Assignee: Liwei Lin > Fix For: 1.9.0 > > > As is reported by Parquet-425, data will be filtered out improperly under > certain cases. > Before Parquet-425 is fixed, let's throw an Exception to warn the upper > application that work-arounds should be done. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PARQUET-426) Throw Exception when predicate contains columns not specified in prejection, to prevent filtering out data improperly
Liwei Lin created PARQUET-426: - Summary: Throw Exception when predicate contains columns not specified in prejection, to prevent filtering out data improperly Key: PARQUET-426 URL: https://issues.apache.org/jira/browse/PARQUET-426 Project: Parquet Issue Type: Bug Components: parquet-mr Affects Versions: 1.8.0, 1.8.1 Reporter: Liwei Lin Assignee: Liwei Lin Fix For: 1.9.0 As is reported by Parquet-425, data will be filtered out improperly under certain cases. Before Parquet-425 is fixed, let's throw an Exception to warn the upper application that work-arounds should be done. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PARQUET-425) Fix the bug when predicate contains columns not specified in prejection, to prevent filtering out data improperly
[ https://issues.apache.org/jira/browse/PARQUET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liwei Lin updated PARQUET-425: -- Description: As is reported in Parquet-427, data will be filtered out improperly under the case where: 1. requested schema ∩ predicates ≠ ∅ 2. (requested schema ∪ predicates) - file schema = ∅ To give an example: data: +---+ | b| +---+ +---+ file schema: a,b, requested schema: a, and predicate: b = 1 we should get: a: int 1 but we'll end up get nothing, which is wrong. was: As is reported in Parquet-427, data will be filtered out improperly under the case where: 1. requested schema ∩ predicates ≠ ∅ 2. (requested schema ∪ predicates) - file schema = ∅ To give an example: data: a: int b: int 1 1 2 2 3 3 file schema: a,b, requested schema: a, and predicate: b = 1 we should get: a: int 1 but we'll end up get nothing, which is wrong. > Fix the bug when predicate contains columns not specified in prejection, to > prevent filtering out data improperly > - > > Key: PARQUET-425 > URL: https://issues.apache.org/jira/browse/PARQUET-425 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.8.0, 1.8.1 >Reporter: Liwei Lin >Assignee: Liwei Lin > Fix For: 1.9.0 > > > As is reported in Parquet-427, data will be filtered out improperly under the > case where: > 1. requested schema ∩ predicates ≠ ∅ > 2. (requested schema ∪ predicates) - file schema = ∅ > To give an example: > data: > +---+ > | b| > +---+ > +---+ > file schema: a,b, requested schema: a, and predicate: b = 1 > we should get: > a: int > 1 > but we'll end up get nothing, which is wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-256) Deprecate ConversionPatterns
[ https://issues.apache.org/jira/browse/PARQUET-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15095420#comment-15095420 ] Liwei Lin commented on PARQUET-256: --- [~rdblue] could you share some updates for this, please? > Deprecate ConversionPatterns > > > Key: PARQUET-256 > URL: https://issues.apache.org/jira/browse/PARQUET-256 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.6.0 >Reporter: Cheng Lian > > Methods in {{ConversionPatterns}} doesn't conform to standard LIST and MAP > schema, and should be deprecated. We can either suggest users to use > {{Types}} builder methods or create new wrapper methods for LIST and MAP. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PARQUET-421) Fix mismatch of javadoc names and method parameters in module encoding, column, and hadoop
[ https://issues.apache.org/jira/browse/PARQUET-421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liwei Lin updated PARQUET-421: -- Summary: Fix mismatch of javadoc names and method parameters in module encoding, column, and hadoop (was: Fix mismatch of javadoc names and method parameters) > Fix mismatch of javadoc names and method parameters in module encoding, > column, and hadoop > -- > > Key: PARQUET-421 > URL: https://issues.apache.org/jira/browse/PARQUET-421 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.8.0, 1.8.1 >Reporter: Liwei Lin >Priority: Minor > Fix For: 1.9.0 > > > Codes change now and then, but some corresponding doc comments are left out. > This issue fixes only the doc comments that should have been changed. It > should be OK, since none codes are touched. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PARQUET-421) Fix mismatch of javadoc names and method parameters
Liwei Lin created PARQUET-421: - Summary: Fix mismatch of javadoc names and method parameters Key: PARQUET-421 URL: https://issues.apache.org/jira/browse/PARQUET-421 Project: Parquet Issue Type: Improvement Components: parquet-mr Affects Versions: 1.8.0, 1.8.1 Reporter: Liwei Lin Priority: Minor Fix For: 1.9.0 Codes change now and then, but some corresponding doc comments are left out. This issue fixes only the doc comments that should have been changed. It should be OK, since none codes are touched. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PARQUET-422) Fix a potential bug in MessageTypeParser where we ignore and overwrite the initial value of a method parameter
Liwei Lin created PARQUET-422: - Summary: Fix a potential bug in MessageTypeParser where we ignore and overwrite the initial value of a method parameter Key: PARQUET-422 URL: https://issues.apache.org/jira/browse/PARQUET-422 Project: Parquet Issue Type: Bug Components: parquet-mr Affects Versions: 1.8.0, 1.8.1 Reporter: Liwei Lin Priority: Minor Fix For: 1.9.0 In org.apache.parquet.schema.MessageTypeParser, for addGroupType() and addPrimitiveType(), the initial value of this parameter t is ignored, and t is overwritten here. This often indicates a mistaken belief that the write to the parameter will be conveyed back to the caller. This is a bug found by FindBugs™. -- This message was sent by Atlassian JIRA (v6.3.4#6332)