[jira] [Created] (PARQUET-529) Avoid evoking job.toString() in ParquetLoader

2016-02-14 Thread Liwei Lin (JIRA)
Liwei Lin created PARQUET-529:
-

 Summary: Avoid evoking job.toString() in ParquetLoader
 Key: PARQUET-529
 URL: https://issues.apache.org/jira/browse/PARQUET-529
 Project: Parquet
  Issue Type: Bug
  Components: parquet-pig
Affects Versions: 1.8.0, 1.8.1
Reporter: Liwei Lin
Assignee: Liwei Lin
 Fix For: 1.9.0


When ran under hadoop2 environment and log level setting to _DEBUG_, 
_ParquetLoader_ would evoke _job.toString()_ in several methods, which might 
cause the whole application to stop due to :

{quote}
java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING

at org.apache.hadoop.mapreduce.Job.ensureState(Job.java:283)
at org.apache.hadoop.mapreduce.Job.toString(Job.java:452)
at java.lang.String.valueOf(String.java:2847)
at java.lang.StringBuilder.append(StringBuilder.java:128)
at org.apache.parquet.pig.ParquetLoader.getSchema(ParquetLoader.java:260)
at 
org.apache.parquet.pig.TestParquetLoader.testSchema(TestParquetLoader.java:54)
...
{quote}

The reason is that in the hadoop 2.x branch, 
_org.apache.hadoop.mapreduce.Job.toString()_ has added an 
_ensureState(JobState.RUNNING)_ check; see map-reduce: Job.java#452. In 
contrast, the hadoop 1.x branch does not contain such checks, so 
_ParquetLoader_ works well.

This ticket simply avoids evoking _job.toString()_ in _ParquetLoader_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PARQUET-528) Fix flush() for RecordConsumer and implementations

2016-02-14 Thread Liwei Lin (JIRA)
Liwei Lin created PARQUET-528:
-

 Summary: Fix flush() for RecordConsumer and implementations
 Key: PARQUET-528
 URL: https://issues.apache.org/jira/browse/PARQUET-528
 Project: Parquet
  Issue Type: Bug
  Components: parquet-mr
Affects Versions: 1.8.0, 1.8.1
Reporter: Liwei Lin
Assignee: Liwei Lin
 Fix For: 1.9.0


_+flush()+_ was added in _+RecordConsumer+_ and _+MessageColumnIO+_ to help 
implementing nulls caching.

However, other _+RecordConsumer+_ implementations should also implements 
_+flush()+_ properly. For instance, _+RecordConsumerLoggingWrapper+_ and 
_+ValidatingRecordConsumer+_ should call _+delegate.flush()+_ in their 
_+flush()+_ methods, otherwise data might be mistakenly truncated.

This ticket:

- makes _+flush()+_ abstract in _+RecordConsumer+_
- implements _+flush()+_ properly for all _+RecordConsumer+_ subclasses, 
specifically:
-- _+RecordConsumerLoggingWrapper+_
-- _+ValidatingRecordConsumer+_
-- _+ConverterConsumer+_
-- _+ExpectationValidatingRecordConsumer+_



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-401) Deprecate Log and move to SLF4J Logger

2016-02-01 Thread Liwei Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127535#comment-15127535
 ] 

Liwei Lin commented on PARQUET-401:
---

Looked through the code base and have found hundreds of Log.xxx() usages within 
90+ classes. Should take 3 days to replace all of them. Do we want to get this 
in 1.9.0? I think it'd better not delay the release.

> Deprecate Log and move to SLF4J Logger
> --
>
> Key: PARQUET-401
> URL: https://issues.apache.org/jira/browse/PARQUET-401
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.8.1
>Reporter: Ryan Blue
>
> The current Log class is intended to allow swapping out logger back-ends, but 
> SLF4J already does this. It also doesn't expose as nice of an API as SLF4J, 
> which can handle formatting to avoid the cost of building log messages that 
> won't be used. I think we should deprecate the org.apache.parquet.Log class 
> and move to using SLF4J directly, instead of wrapping SLF4J (PARQUET-305).
> This will require deprecating the current Log class and replacing the current 
> uses of it with SLF4J.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-401) Deprecate Log and move to SLF4J Logger

2016-02-01 Thread Liwei Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127515#comment-15127515
 ] 

Liwei Lin commented on PARQUET-401:
---

hi [~julienledem], [~rdblue], and [~liancheng]:
Now that [Parquet-305|https://issues.apache.org/jira/browse/PARQUET-305] has 
been merged, maybe we should consider replacing all Log.java usages with slf4j? 
Should anyone hasn't started it yet, I'd like to do this.

Will remove the +if (Log.DEBUG)+ condition, and place the original 
+LOG.debug("msg is" + msg)+ with the slfj4 parameterized form +LOG.debug("msg 
is {}", msg)+, leaving it for slf4j to judge if the certain log level is 
enabled or not.

> Deprecate Log and move to SLF4J Logger
> --
>
> Key: PARQUET-401
> URL: https://issues.apache.org/jira/browse/PARQUET-401
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.8.1
>Reporter: Ryan Blue
>
> The current Log class is intended to allow swapping out logger back-ends, but 
> SLF4J already does this. It also doesn't expose as nice of an API as SLF4J, 
> which can handle formatting to avoid the cost of building log messages that 
> won't be used. I think we should deprecate the org.apache.parquet.Log class 
> and move to using SLF4J directly, instead of wrapping SLF4J (PARQUET-305).
> This will require deprecating the current Log class and replacing the current 
> uses of it with SLF4J.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PARQUET-495) Fix mismatches in Types class comments

2016-01-31 Thread Liwei Lin (JIRA)
Liwei Lin created PARQUET-495:
-

 Summary: Fix mismatches in Types class comments
 Key: PARQUET-495
 URL: https://issues.apache.org/jira/browse/PARQUET-495
 Project: Parquet
  Issue Type: Bug
  Components: parquet-mr
Affects Versions: 1.8.0, 1.8.1
Reporter: Liwei Lin
Assignee: Liwei Lin
Priority: Trivial
 Fix For: 1.9.0


To produce:
required group User \{
required int64 id;
*optional* binary email (UTF8);
\}

we should do:
Types.requiredGroup()
  .required(INT64).named("id")
  .-*required* (BINARY).as(UTF8).named("email")-
  .*optional* (BINARY).as(UTF8).named("email")
  .named("User")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PARQUET-484) Add precision, scale compatibility check for DecimalMetadata

2016-01-30 Thread Liwei Lin (JIRA)
Liwei Lin created PARQUET-484:
-

 Summary: Add precision, scale compatibility check for 
DecimalMetadata
 Key: PARQUET-484
 URL: https://issues.apache.org/jira/browse/PARQUET-484
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-mr
Affects Versions: 1.8.0, 1.8.1
Reporter: Liwei Lin
Assignee: Liwei Lin


We have some constrains for the Decimal type's precision and scale, as is 
documented in 
https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md:

int32: for 1 <= precision <= 9
int64: for 1 <= precision <= 18; precision < 10 will produce a warning

This JIRA proposes to implements the constrains, specifically:

Throws Exception when any of the following does not hold:
- precision > 0 && scale >= 0
- precision >= scale
- for int32, precision <= 9
- for int64, precision <= 18

Throws an Warning when:
- for int64, precision < 10




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PARQUET-484) Warn when Decimal is stored as INT64 while could be stored as INT32

2016-01-30 Thread Liwei Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liwei Lin updated PARQUET-484:
--
Description: 
As is documented in 
https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md:

int32: for 1 <= precision <= 9
int64: for 1 <= precision <= 18; precision < 10 will produce a warning

This JIRA proposes to implements the waring part.

  was:
We have some constrains for the Decimal type's precision and scale, as is 
documented in 
https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md:

int32: for 1 <= precision <= 9
int64: for 1 <= precision <= 18; precision < 10 will produce a warning

This JIRA proposes to implements the constrains, specifically:

Throws Exception when any of the following does not hold:
- precision > 0 && scale >= 0
- precision >= scale
- for int32, precision <= 9
- for int64, precision <= 18

Throws an Warning when:
- for int64, precision < 10



> Warn when Decimal is stored as INT64 while could be stored as INT32
> ---
>
> Key: PARQUET-484
> URL: https://issues.apache.org/jira/browse/PARQUET-484
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.8.0, 1.8.1
>Reporter: Liwei Lin
>Assignee: Liwei Lin
>
> As is documented in 
> https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md:
> int32: for 1 <= precision <= 9
> int64: for 1 <= precision <= 18; precision < 10 will produce a warning
> This JIRA proposes to implements the waring part.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PARQUET-484) Warn when Decimal is stored as INT64 while could be stored as INT32

2016-01-30 Thread Liwei Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liwei Lin updated PARQUET-484:
--
Summary: Warn when Decimal is stored as INT64 while could be stored as 
INT32  (was: Add precision, scale compatibility check for DecimalMetadata)

> Warn when Decimal is stored as INT64 while could be stored as INT32
> ---
>
> Key: PARQUET-484
> URL: https://issues.apache.org/jira/browse/PARQUET-484
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.8.0, 1.8.1
>Reporter: Liwei Lin
>Assignee: Liwei Lin
>
> We have some constrains for the Decimal type's precision and scale, as is 
> documented in 
> https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md:
> int32: for 1 <= precision <= 9
> int64: for 1 <= precision <= 18; precision < 10 will produce a warning
> This JIRA proposes to implements the constrains, specifically:
> Throws Exception when any of the following does not hold:
> - precision > 0 && scale >= 0
> - precision >= scale
> - for int32, precision <= 9
> - for int64, precision <= 18
> Throws an Warning when:
> - for int64, precision < 10



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PARQUET-431) Make ParquetOutputFormat.memoryManager volatile

2016-01-17 Thread Liwei Lin (JIRA)
Liwei Lin created PARQUET-431:
-

 Summary: Make ParquetOutputFormat.memoryManager volatile
 Key: PARQUET-431
 URL: https://issues.apache.org/jira/browse/PARQUET-431
 Project: Parquet
  Issue Type: Bug
  Components: parquet-mr
Affects Versions: 1.8.0, 1.8.1
Reporter: Liwei Lin
Assignee: Liwei Lin
 Fix For: 1.9.0


Currently ParquetOutputFormat.getRecordWriter() contains an unsynchronized lazy 
initialization of the non-volatile static field *memoryManager*.

Because the compiler or processor may reorder instructions, threads are not 
guaranteed to see a completely initialized object, when 
ParquetOutputFormat.getRecordWriter() is called by multiple threads.

This ticket proposes to make *memoryManager* volatile to correct the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PARQUET-429) Make predicates collect referred columns

2016-01-13 Thread Liwei Lin (JIRA)
Liwei Lin created PARQUET-429:
-

 Summary: Make predicates collect referred columns
 Key: PARQUET-429
 URL: https://issues.apache.org/jira/browse/PARQUET-429
 Project: Parquet
  Issue Type: New Feature
  Components: parquet-mr
Affects Versions: 1.8.0, 1.8.1
Reporter: Liwei Lin
Assignee: Liwei Lin
 Fix For: 1.9.0


Sometimes we need to collect the columns referred by the predicates to, say, do 
validation.

This issue propose to enable all 3 FilterCompats to collect the referred 
columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-426) Throw Exception when predicate contains columns not specified in prejection, to prevent filtering out data improperly

2016-01-13 Thread Liwei Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097553#comment-15097553
 ] 

Liwei Lin commented on PARQUET-426:
---

Will issue a PR for this ticket as soon as PR#310 for PARQUET-429 is merged.

> Throw Exception when predicate contains columns not specified in prejection, 
> to prevent filtering out data improperly
> -
>
> Key: PARQUET-426
> URL: https://issues.apache.org/jira/browse/PARQUET-426
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.8.0, 1.8.1
>Reporter: Liwei Lin
>Assignee: Liwei Lin
> Fix For: 1.9.0
>
>
> As is reported by Parquet-425, data will be filtered out improperly under 
> certain cases.
> Before Parquet-425 is fixed, let's throw an Exception to warn the upper 
> application that work-arounds should be done.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PARQUET-429) Enables predicates collecting their referred columns

2016-01-13 Thread Liwei Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liwei Lin updated PARQUET-429:
--
Summary: Enables predicates collecting their referred columns  (was: Make 
predicates collect referred columns)

> Enables predicates collecting their referred columns
> 
>
> Key: PARQUET-429
> URL: https://issues.apache.org/jira/browse/PARQUET-429
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-mr
>Affects Versions: 1.8.0, 1.8.1
>Reporter: Liwei Lin
>Assignee: Liwei Lin
> Fix For: 1.9.0
>
>
> Sometimes we need to collect the columns referred by the predicates to, say, 
> do validation.
> This issue propose to enable all 3 FilterCompats to collect the referred 
> columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PARQUET-425) Fix the bug when predicate contains columns not specified in prejection, to prevent filtering out data improperly

2016-01-12 Thread Liwei Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liwei Lin updated PARQUET-425:
--
Description: 
As is reported in Parquet-427, data will be filtered out improperly under the 
case where:
1. predicates - requested schema ≠ ∅
2. predicates - file schema = ∅

To give an example:
data:
|a|b|
|1|1|
|2|2|
|3|3|
file schema: a,b, requested schema: a, and predicate: b = 1

we should get:
|a|
|1|
but we'll end up get nothing, which is wrong.

This issue proposes to fix this.

  was:
As is reported in Parquet-427, data will be filtered out improperly under the 
case where:
1. predicates - requested schema ≠ ∅
2. predicates - file schema = ∅

To give an example:
data:
|a|b|
|1|1|
|2|2|
|3|3|
file schema: a,b, requested schema: a, and predicate: b = 1

we should get:
|a|
|1|
but we'll end up get nothing, which is wrong.


> Fix the bug when predicate contains columns not specified in prejection, to 
> prevent filtering out data improperly
> -
>
> Key: PARQUET-425
> URL: https://issues.apache.org/jira/browse/PARQUET-425
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.8.0, 1.8.1
>Reporter: Liwei Lin
>Assignee: Liwei Lin
> Fix For: 1.9.0
>
>
> As is reported in Parquet-427, data will be filtered out improperly under the 
> case where:
> 1. predicates - requested schema ≠ ∅
> 2. predicates - file schema = ∅
> To give an example:
> data:
> |a|b|
> |1|1|
> |2|2|
> |3|3|
> file schema: a,b, requested schema: a, and predicate: b = 1
> we should get:
> |a|
> |1|
> but we'll end up get nothing, which is wrong.
> This issue proposes to fix this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PARQUET-425) Fix the bug when predicate contains columns not specified in prejection, to prevent filtering out data improperly

2016-01-12 Thread Liwei Lin (JIRA)
Liwei Lin created PARQUET-425:
-

 Summary: Fix the bug when predicate contains columns not specified 
in prejection, to prevent filtering out data improperly
 Key: PARQUET-425
 URL: https://issues.apache.org/jira/browse/PARQUET-425
 Project: Parquet
  Issue Type: Bug
  Components: parquet-mr
Affects Versions: 1.8.0, 1.8.1
Reporter: Liwei Lin
Assignee: Liwei Lin
 Fix For: 1.9.0


As is reported in Parquet-427, data will be filtered out improperly under the 
case where:
1. requested schema ∩ predicates ≠ ∅
2. (requested schema ∪ predicates) - file schema = ∅

To give an example:
data:
a: int  b: int
1   1
2   2
3   3
file schema: a,b, requested schema: a, and predicate: b = 1

we should get:
a: int
1
but we'll end up get nothing, which is wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PARQUET-427) Push predicates into the whole read path

2016-01-12 Thread Liwei Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liwei Lin updated PARQUET-427:
--
Attachment: Parquet-427 v0.1.pdf

[~julienledem][~rdblue] would you take a look at it, please? Any comments are 
welcome. Also, who else should we @ ? :-)

> Push predicates into the whole read path
> 
>
> Key: PARQUET-427
> URL: https://issues.apache.org/jira/browse/PARQUET-427
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.8.0, 1.8.1
>Reporter: Liwei Lin
>Assignee: Liwei Lin
> Fix For: 1.9.0
>
> Attachments: Parquet-427 v0.1.pdf
>
>
> Regarding primitive types, there are 3 import primitive types set:
> - the file schema primitive type set
> - the requested schema primitive type set
> - the predicates primitive type set
> Currently the file schema primitive type set and the requested schema 
> primitive type set has been pushed into the whole read path, but not the the 
> predicates primitive type set.
> This brings some problems like:
> - PARQUET-389, SPARK-11103, SPARK-11434
> - PARQUET-425, PARQUET-426
> - PARQUET-295
> This issue propose to push the predicates primitive type set into the whole 
> read path as well. When this is resolved, hopefully the issues listed above 
> would also be resolved as well.
> Attached is a change proposal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PARQUET-425) Fix the bug when predicate contains columns not specified in prejection, to prevent filtering out data improperly

2016-01-12 Thread Liwei Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liwei Lin updated PARQUET-425:
--
Description: 
As is reported in Parquet-427, data will be filtered out improperly under the 
case where:
1. requested schema ∩ predicates ≠ ∅
2. (requested schema ∪ predicates) - file schema = ∅

To give an example:
data:
|a|b|
|1|1|
|2|2|
|3|3|
file schema: a,b, requested schema: a, and predicate: b = 1

we should get:
|a|
|1|
but we'll end up get nothing, which is wrong.

  was:
As is reported in Parquet-427, data will be filtered out improperly under the 
case where:
1. requested schema ∩ predicates ≠ ∅
2. (requested schema ∪ predicates) - file schema = ∅

To give an example:
data:
+---+
|  b|
+---+
+---+
file schema: a,b, requested schema: a, and predicate: b = 1

we should get:
a: int
1
but we'll end up get nothing, which is wrong.


> Fix the bug when predicate contains columns not specified in prejection, to 
> prevent filtering out data improperly
> -
>
> Key: PARQUET-425
> URL: https://issues.apache.org/jira/browse/PARQUET-425
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.8.0, 1.8.1
>Reporter: Liwei Lin
>Assignee: Liwei Lin
> Fix For: 1.9.0
>
>
> As is reported in Parquet-427, data will be filtered out improperly under the 
> case where:
> 1. requested schema ∩ predicates ≠ ∅
> 2. (requested schema ∪ predicates) - file schema = ∅
> To give an example:
> data:
> |a|b|
> |1|1|
> |2|2|
> |3|3|
> file schema: a,b, requested schema: a, and predicate: b = 1
> we should get:
> |a|
> |1|
> but we'll end up get nothing, which is wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PARQUET-425) Fix the bug when predicate contains columns not specified in prejection, to prevent filtering out data improperly

2016-01-12 Thread Liwei Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liwei Lin updated PARQUET-425:
--
Description: 
As is reported in Parquet-427, data will be filtered out improperly under the 
case where:
1. predicates - requested schema ≠ ∅
2. predicates - file schema = ∅

To give an example:
data:
|a|b|
|1|1|
|2|2|
|3|3|
file schema: a,b, requested schema: a, and predicate: b = 1

we should get:
|a|
|1|
but we'll end up get nothing, which is wrong.

  was:
As is reported in Parquet-427, data will be filtered out improperly under the 
case where:
1. requested schema ∩ predicates ≠ ∅
2. (requested schema ∪ predicates) - file schema = ∅

To give an example:
data:
|a|b|
|1|1|
|2|2|
|3|3|
file schema: a,b, requested schema: a, and predicate: b = 1

we should get:
|a|
|1|
but we'll end up get nothing, which is wrong.


> Fix the bug when predicate contains columns not specified in prejection, to 
> prevent filtering out data improperly
> -
>
> Key: PARQUET-425
> URL: https://issues.apache.org/jira/browse/PARQUET-425
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.8.0, 1.8.1
>Reporter: Liwei Lin
>Assignee: Liwei Lin
> Fix For: 1.9.0
>
>
> As is reported in Parquet-427, data will be filtered out improperly under the 
> case where:
> 1. predicates - requested schema ≠ ∅
> 2. predicates - file schema = ∅
> To give an example:
> data:
> |a|b|
> |1|1|
> |2|2|
> |3|3|
> file schema: a,b, requested schema: a, and predicate: b = 1
> we should get:
> |a|
> |1|
> but we'll end up get nothing, which is wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PARQUET-426) Throw Exception when predicate contains columns not specified in prejection, to prevent filtering out data improperly

2016-01-12 Thread Liwei Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liwei Lin updated PARQUET-426:
--
External issue URL: https://issues.apache.org/jira/browse/PARQUET-425
 External issue ID:   (was: PARQUET-425)

> Throw Exception when predicate contains columns not specified in prejection, 
> to prevent filtering out data improperly
> -
>
> Key: PARQUET-426
> URL: https://issues.apache.org/jira/browse/PARQUET-426
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.8.0, 1.8.1
>Reporter: Liwei Lin
>Assignee: Liwei Lin
> Fix For: 1.9.0
>
>
> As is reported by Parquet-425, data will be filtered out improperly under 
> certain cases.
> Before Parquet-425 is fixed, let's throw an Exception to warn the upper 
> application that work-arounds should be done.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PARQUET-426) Throw Exception when predicate contains columns not specified in prejection, to prevent filtering out data improperly

2016-01-12 Thread Liwei Lin (JIRA)
Liwei Lin created PARQUET-426:
-

 Summary: Throw Exception when predicate contains columns not 
specified in prejection, to prevent filtering out data improperly
 Key: PARQUET-426
 URL: https://issues.apache.org/jira/browse/PARQUET-426
 Project: Parquet
  Issue Type: Bug
  Components: parquet-mr
Affects Versions: 1.8.0, 1.8.1
Reporter: Liwei Lin
Assignee: Liwei Lin
 Fix For: 1.9.0


As is reported by Parquet-425, data will be filtered out improperly under 
certain cases.

Before Parquet-425 is fixed, let's throw an Exception to warn the upper 
application that work-arounds should be done.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PARQUET-425) Fix the bug when predicate contains columns not specified in prejection, to prevent filtering out data improperly

2016-01-12 Thread Liwei Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liwei Lin updated PARQUET-425:
--
Description: 
As is reported in Parquet-427, data will be filtered out improperly under the 
case where:
1. requested schema ∩ predicates ≠ ∅
2. (requested schema ∪ predicates) - file schema = ∅

To give an example:
data:
+---+
|  b|
+---+
+---+
file schema: a,b, requested schema: a, and predicate: b = 1

we should get:
a: int
1
but we'll end up get nothing, which is wrong.

  was:
As is reported in Parquet-427, data will be filtered out improperly under the 
case where:
1. requested schema ∩ predicates ≠ ∅
2. (requested schema ∪ predicates) - file schema = ∅

To give an example:
data:
a: int  b: int
1   1
2   2
3   3
file schema: a,b, requested schema: a, and predicate: b = 1

we should get:
a: int
1
but we'll end up get nothing, which is wrong.


> Fix the bug when predicate contains columns not specified in prejection, to 
> prevent filtering out data improperly
> -
>
> Key: PARQUET-425
> URL: https://issues.apache.org/jira/browse/PARQUET-425
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.8.0, 1.8.1
>Reporter: Liwei Lin
>Assignee: Liwei Lin
> Fix For: 1.9.0
>
>
> As is reported in Parquet-427, data will be filtered out improperly under the 
> case where:
> 1. requested schema ∩ predicates ≠ ∅
> 2. (requested schema ∪ predicates) - file schema = ∅
> To give an example:
> data:
> +---+
> |  b|
> +---+
> +---+
> file schema: a,b, requested schema: a, and predicate: b = 1
> we should get:
> a: int
> 1
> but we'll end up get nothing, which is wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-256) Deprecate ConversionPatterns

2016-01-12 Thread Liwei Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15095420#comment-15095420
 ] 

Liwei Lin commented on PARQUET-256:
---

[~rdblue] could you share some updates for this, please?

> Deprecate ConversionPatterns
> 
>
> Key: PARQUET-256
> URL: https://issues.apache.org/jira/browse/PARQUET-256
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.6.0
>Reporter: Cheng Lian
>
> Methods in {{ConversionPatterns}} doesn't conform to standard LIST and MAP  
> schema, and should be deprecated. We can either suggest users to use 
> {{Types}} builder methods or create new wrapper methods for LIST and MAP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PARQUET-421) Fix mismatch of javadoc names and method parameters in module encoding, column, and hadoop

2016-01-11 Thread Liwei Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liwei Lin updated PARQUET-421:
--
Summary: Fix mismatch of javadoc names and method parameters in module 
encoding, column, and hadoop  (was: Fix mismatch of javadoc names and method 
parameters)

> Fix mismatch of javadoc names and method parameters in module encoding, 
> column, and hadoop
> --
>
> Key: PARQUET-421
> URL: https://issues.apache.org/jira/browse/PARQUET-421
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.8.0, 1.8.1
>Reporter: Liwei Lin
>Priority: Minor
> Fix For: 1.9.0
>
>
> Codes change now and then, but some corresponding doc comments are left out.
> This issue fixes only the doc comments that should have been changed. It 
> should be OK, since none codes are touched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PARQUET-421) Fix mismatch of javadoc names and method parameters

2016-01-11 Thread Liwei Lin (JIRA)
Liwei Lin created PARQUET-421:
-

 Summary: Fix mismatch of javadoc names and method parameters
 Key: PARQUET-421
 URL: https://issues.apache.org/jira/browse/PARQUET-421
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-mr
Affects Versions: 1.8.0, 1.8.1
Reporter: Liwei Lin
Priority: Minor
 Fix For: 1.9.0


Codes change now and then, but some corresponding doc comments are left out.

This issue fixes only the doc comments that should have been changed. It should 
be OK, since none codes are touched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PARQUET-422) Fix a potential bug in MessageTypeParser where we ignore and overwrite the initial value of a method parameter

2016-01-11 Thread Liwei Lin (JIRA)
Liwei Lin created PARQUET-422:
-

 Summary: Fix a potential bug in MessageTypeParser where we ignore 
and overwrite the initial value of a method parameter
 Key: PARQUET-422
 URL: https://issues.apache.org/jira/browse/PARQUET-422
 Project: Parquet
  Issue Type: Bug
  Components: parquet-mr
Affects Versions: 1.8.0, 1.8.1
Reporter: Liwei Lin
Priority: Minor
 Fix For: 1.9.0


In org.apache.parquet.schema.MessageTypeParser, for addGroupType() and 
addPrimitiveType(), the initial value of this parameter t is ignored, and t is 
overwritten here.

This often indicates a mistaken belief that the write to the parameter will be 
conveyed back to the caller.

This is a bug found by FindBugs™.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)