[ 
https://issues.apache.org/jira/browse/PARQUET-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259333#comment-17259333
 ] 

ASF GitHub Bot commented on PARQUET-1951:
-----------------------------------------

satishkotha commented on pull request #847:
URL: https://github.com/apache/parquet-mr/pull/847#issuecomment-755018753


   > @satishkotha, thanks for explaining your use case. I understand that you 
need to inject your own implementation to make it work. What I would like to 
achieve is to keep parquet-tools easy to use as a command.
   > 
   > I do not have a strong opinion to not to allow the runtime extension of 
parquet-tools functionality but it have to be well documented and easy to use 
from the command line. Please keep in mind that parquet-tools might be shipped 
as a command to an environment (e.g. via brew) and the user might not have the 
knowledge or even the privilege to modify the classpath.
   > 
   > I would suggest adding two separate arguments for merge. One is for 
selecting a strategy that is available in parquet-mr. For this case we shall 
list the available options and the descriptions of these options. Another 
argument might be available for the user's own implementation. For this one it 
needs to be documented which interface is needed to be implemented and the fact 
that the implementations needs to be added to the classpath of parquet-tools. 
(It would be hard to give a step-by-step guide for a beginner as you would not 
know anything about the environment and paths but I think it is enough to keep 
this option for experts.)
   
   These are great points for usability. I updated MergeCommand to take in 
additional option and updated documentation. Please take a look. I'm happy to 
add any additional documentation if needed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Allow different strategies to combine key values when merging parquet files
> ---------------------------------------------------------------------------
>
>                 Key: PARQUET-1951
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1951
>             Project: Parquet
>          Issue Type: Improvement
>            Reporter: satish
>            Priority: Minor
>
> I work on Apache Hudi project. We store some additional metadata in parquet 
> files (key range in the file, for example).  So the metadata is different in 
> different parquet files that we want to merge these files. 
> Here is what I'm thinking:
> 1) Merge command takes additional command line option: --strategy 
> <StrategyClassName>. 
> 2) We introduce new strategy class in parquet-hadoop to keep the same 
> behavior as today.  
> We can extend that class and provide our custom implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to