[jira] [Commented] (PARQUET-2380) Decouple RewriteOptions from Hadoop classes

ASF GitHub Bot (Jira) Fri, 17 Nov 2023 07:16:34 -0800


    [ 
https://issues.apache.org/jira/browse/PARQUET-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17787257#comment-17787257
 ]


ASF GitHub Bot commented on PARQUET-2380:
-----------------------------------------

amousavigourabi commented on code in PR #1195:
URL: https://github.com/apache/parquet-mr/pull/1195#discussion_r1397467309


##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopOutputFile.java:
##########
@@ -54,11 +54,19 @@ private static boolean supportsBlockSize(FileSystem fs) {
   private final Configuration conf;
 
   public static HadoopOutputFile fromPath(Path path, Configuration conf)
-      throws IOException {
+    throws IOException {
     FileSystem fs = path.getFileSystem(conf);
     return new HadoopOutputFile(fs, fs.makeQualified(path), conf);
   }
 
+  public static HadoopOutputFile fromPathUnchecked(Path path, Configuration 
conf) {

Review Comment:
   As we cannot yet get rid of some methods without(!) throws using Hadoop's 
Path where we do want to transition to the InputFile/OutputFile interfaces in 
the background, this is a DRYer way to avoid try catching `IOExceptions` and 
converting them to `RuntimeExceptions` (to avoid adding incompatible throws 
clauses) all the time. As an added bonus, this also allows for concisely 
converting collections of Paths to HadoopOutputFiles/HadoopInputFiles using 
Java Stream HOFs such as map.





> Decouple RewriteOptions from Hadoop classes
> -------------------------------------------
>
>                 Key: PARQUET-2380
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2380
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>            Reporter: Atour Mousavi Gourabi
>            Priority: Major
>
> ParquetRewriter's RewriteOptions makes use of Hadoop Path and Configuration, 
> where it could instead allow users to specify these using the Parquet 
> interface methods as well. This would allow for proper decoupling Hadoop for 
> rewriting in a later stage as well.
> This is part of a larger effort to decouple Parquet from Hadoop libraries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-2380) Decouple RewriteOptions from Hadoop classes

Reply via email to