[jira] [Commented] (NIFI-3724) Add Put/Fetch Parquet Processors

ASF GitHub Bot (JIRA) Mon, 01 May 2017 09:15:36 -0700

    [ 
https://issues.apache.org/jira/browse/NIFI-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15991012#comment-15991012
 ]


ASF GitHub Bot commented on NIFI-3724:
--------------------------------------

Github user alopresto commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1712#discussion_r114147981
  
    --- Diff: 
nifi-nar-bundles/nifi-extension-utils/nifi-hadoop-utils/src/main/java/org/apache/nifi/processors/hadoop/AbstractHadoopProcessor.java
 ---
    @@ -67,40 +61,14 @@
      */
     @RequiresInstanceClassLoading(cloneAncestorResources = true)
     public abstract class AbstractHadoopProcessor extends AbstractProcessor {
    -    /**
    -     * Compression Type Enum
    -     */
    -    public enum CompressionType {
    -        NONE,
    -        DEFAULT,
    -        BZIP,
    -        GZIP,
    -        LZ4,
    -        SNAPPY,
    -        AUTOMATIC;
    -
    -        @Override
    -        public String toString() {
    -            switch (this) {
    -                case NONE: return "NONE";
    -                case DEFAULT: return DefaultCodec.class.getName();
    -                case BZIP: return BZip2Codec.class.getName();
    -                case GZIP: return GzipCodec.class.getName();
    -                case LZ4: return Lz4Codec.class.getName();
    -                case SNAPPY: return SnappyCodec.class.getName();
    -                case AUTOMATIC: return "Automatically Detected";
    -            }
    -            return null;
    -        }
    -    }
     
         // properties
         public static final PropertyDescriptor HADOOP_CONFIGURATION_RESOURCES 
= new PropertyDescriptor.Builder()
                 .name("Hadoop Configuration Resources")
                 .description("A file or comma separated list of files which 
contains the Hadoop file system configuration. Without this, Hadoop "
                         + "will search the classpath for a 'core-site.xml' and 
'hdfs-site.xml' file or will revert to a default configuration.")
                 .required(false)
    -            .addValidator(createMultipleFilesExistValidator())
    +            .addValidator(HadoopValidators.MULTIPLE_FILE_EXISTS_VALIDATOR)
    --- End diff --
    
    Minor comment -- until I read the source code for this, my interpretation 
was that this validator ensured that *multiple files existed* -- i.e. one file 
provided would fail. Perhaps we can rename this 
`ONE_OR_MORE_FILES_EXIST_VALIDATOR`? Not a giant issue but potentially 
confusing. 


> Add Put/Fetch Parquet Processors
> --------------------------------
>
>                 Key: NIFI-3724
>                 URL: https://issues.apache.org/jira/browse/NIFI-3724
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Bryan Bende
>            Assignee: Bryan Bende
>            Priority: Minor
>             Fix For: 1.2.0
>
>
> Now that we have the record reader/writer services currently in master, it 
> would be nice to have reader and writers for Parquet. Since Parquet's API is 
> based on the Hadoop Path object, and not InputStreams/OutputStreams, we can't 
> really implement direct conversions to and from Parquet in the middle of a 
> flow, but we can we can perform the conversion by taking any record format 
> and writing to a Path as Parquet, or reading Parquet from a Path and writing 
> it out as another record format.
> We should add a PutParquet that uses a record reader and writes records to a 
> Path as Parquet, and a FetchParquet that reads Parquet from a path and writes 
> out records to a flow file using a record writer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (NIFI-3724) Add Put/Fetch Parquet Processors

Reply via email to