[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2019-02-08 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-462017703
 
 
   @fhueske 
   Thanks for the review. I removed all of unused fields, main function and 
test cases. To have better code coverage, I also added test cases for projected 
selection for each subclass of ParquetInputFormat.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2019-02-18 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-464962219
 
 
   @fhueske 
   I finished rebase lastest upstream master just now. The compile error 
probably comes from the generated avro classes are committed in last several 
diffs. I removed them also. Thanks for the your effort of reviewing this PR.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-12-10 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-445726623
 
 
   @fhueske 
   Thanks for pointing out so many missing parts. As you pointed out, I added 
SqlTypeInfo conversion and enforce List and Map schema convention in Diff. 
Please review it when you have time.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2019-01-06 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-451798255
 
 
   @fhueske I refined test cases. Would you please take a last round of review?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-11-23 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-441301540
 
 
   @fhueske 
   Resolved all of the comments except the one for timestamp rewrite. It is 
needed for time field of window functions. Do you prefer to use timestamp udf 
SQL directly in this case?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-11-26 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-441920259
 
 
   @fhueske 
   I removed timestamp override, and also update the failure recovery test case 
to test recovery reading file with 10 row group. Please review it as your most 
convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-21 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737413
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-21 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737461
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-21 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737435
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-21 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737481
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-21 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737523
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-21 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737558
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-21 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737590
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-21 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737642
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-21 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737615
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-21 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737854
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-09 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-428274826
 
 
   @fhueske 
   Thanks for reviewing this PR. I can't agree more on offering a similar 
experience for both input formats (Parquet and ORC). I will resolve your 
comments in code tonight.
   
   Best Regards
   Peter Huang


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-10 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-428462374
 
 
   @fhueske 
   Resolved most of your comments. The major blocker is probably the splittable 
processing of Parquet files. It will probably have a big change on the PR and 
more test case to cover. Given this is already a big PR, how about let me work 
on the improvement on another PR?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-12 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-429234463
 
 
   @fhueske 
   Thanks for the review. Resolved all of the comments except unit tests for 
the checkpointing logic 
   1) For the question of "instead of always reading as Row and from there 
converting to the other types?"
   
   In Parquet's interface, a converter is needed for each type of result. 
Record can be convert to row by recursively put children in particular index, 
but Map has to do it with Key. To reduce code duplication, I use the row as 
intermediate representation. So type conversion can be put in sub class of 
ParquetInputFormat.
   
   2) I will add unit test for checkpoint logic tomorrow night. 
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-13 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-429517418
 
 
   @fhueske 
   Add unit test for failure recovery logic. Please review it again after the 
travis check turns to green.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [Flink-7243][flink-formats] Add parquet input format

2018-08-02 Thread GitBox
HuangZhenQiu commented on issue #6483: [Flink-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-410152524
 
 
   @suez1224  Would you please have a look this PR?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [Flink-7243][flink-formats] Add parquet input format

2018-08-19 Thread GitBox
HuangZhenQiu commented on issue #6483: [Flink-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-414184561
 
 
   @walterddr @suez1224 
   Fixed Rong's comments. Please continue with the review process at your most 
convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-08-28 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-416702085
 
 
   @docete @fhueske 
   To make a clean interface for filterable parquet input format, I needs add 
lots of code in this PR. After considering the size of the PR, I would like to 
put all of the filter pushdown in the PR of ParquetTableSource.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-09-05 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-418951266
 
 
   @lvhuyen 
   Thanks for using parquet input format and give me feedback. So the type 
timstamp is logic type in parquet. It is internally stored as primitive type 
int64. So it should be read out as long. From the error, it looks like the 
timestamp is read as String and try to set to a field of BigInteger. Would you 
please paste me the parquet schema for the file? Thanks
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-09-07 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-419613742
 
 
   @lvhuyen 
   Thanks for digging out the root cause. I guess I should pass the logic type 
into RowPrimitiveConverter. So that different type of data stored as Binary can 
be handled differently. I am working on fix for it with more test case. Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-09-11 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-420505649
 
 
   @lvhuyen 
   Thanks for the quick patch. I think the data conversion should be handled in 
RowConverter. I will ship a fix tonight. For the issue of the array type in 
ParquetMapInputFormat, I will look a look later.



This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-09-12 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-420872269
 
 
   @lvhuyen 
   Last fix should resolve the problem you met in PoJoInputFormat. Would you 
please try it out. For primitive array handling, why the problem only happens 
in ParquetMapInputFormat?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-09-16 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-421894204
 
 
   @lvhuyen 
   For the Array handling issue, I figured it out. it is a List back 
compatibility issue. When I do internal testing at my company, there is only 
one type of list schema needs to be handled. Thanks for digging it out.  
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#lists
   
   I created a fix. Please have a look.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services