Re: CompressContent hadoop-snappy

2019-11-26 Thread Noe Detore
In my use case, I am compressing data than storing data in S3.
Unfortunately, hadoop-snappy is not able to uncompress snappy-java. So
using snappy-java files in Hive is not possible.

It would be nice to have the option to select hadoop-snappy from
CompressContent and just add the native libs to the jvm similar to PutHdfs.
I will also look into SnappyHadoopCompatibleOutputStream.

I will make the effort to contribute back if I go this route.

Thank you
Noe

On Tue, Nov 26, 2019 at 12:54 PM Bryan Bende  wrote:

> Not sure if this is relevant, but snappy-java has a specific
> SnappyHadoopCompatibleOutputStream so CompressContent could offer a
> third snappy option like "snappy-hadoop" which used that.
>
> Shawn is correct though that we wouldn't want to introduce Hadoop libs
> into CompressContent.
>
> [1]
> https://github.com/xerial/snappy-java/blob/73c67c70303e509be1642af5e302411d39434249/src/main/java/org/xerial/snappy/SnappyHadoopCompatibleOutputStream.java
>
> On Tue, Nov 26, 2019 at 11:51 AM Shawn Weeks 
> wrote:
> >
> > It uses snappy-java to get around the native class path issues that
> would exist otherwise. What’s wrong with snappy-java?
> >
> >
> >
> > Thanks
> >
> > Shawn
> >
> >
> >
> > From: Noe Detore 
> > Reply-To: "users@nifi.apache.org" 
> > Date: Monday, November 25, 2019 at 2:16 PM
> > To: "users@nifi.apache.org" 
> > Subject: CompressContent hadoop-snappy
> >
> >
> >
> > Hello
> >
> >
> >
> > CompressContent ver 1.9 uses snappy-java. Is there an easy way to change
> it to hadoop-snappy? Or a custom processor needs to be created?
> >
> >
> >
> > thank you
> >
> > Noe
>


Re: Reading Parquet Files from S3 using Apache Nifi

2019-11-26 Thread Bryan Bende
Hi,

In NiFi 1.10.0 there is a parquet record reader so your flow could be
ListS3 -> FetchS3 -> ConvertRecord (parquet reader, json writer) ->
whatever else.

-Bryan

On Tue, Nov 26, 2019 at 3:55 PM Chowdhury, Rifat
 wrote:
>
> Hi,
>
> Is there an easy way to read Parquet data from S3 Directly using Nifi?
>
> I am using ListS3 -> FetchParquet and it's not working. Or would you 
> recommend trying the following: ListS3 -> FetchS3 -> PutFile(it would put it 
> in one of the nodes) -> FetchParquet (grab from where we put it)
>
> My goal is to retrieve some of the columns of that parquet data, maybe 
> convert it to JSON and then do further data processing using Nifi. Any 
> suggestions would be greatly appreciated.
>
> Best Regards, Rifat
>


Reading Parquet Files from S3 using Apache Nifi

2019-11-26 Thread Chowdhury, Rifat
Hi,

Is there an easy way to read Parquet data from S3 Directly using Nifi?

I am using ListS3 -> FetchParquet and it's not working. Or would you recommend 
trying the following: ListS3 -> FetchS3 -> PutFile(it would put it in one of 
the nodes) -> FetchParquet (grab from where we put it)

My goal is to retrieve some of the columns of that parquet data, maybe convert 
it to JSON and then do further data processing using Nifi. Any suggestions 
would be greatly appreciated.

Best Regards, Rifat



Re: CompressContent hadoop-snappy

2019-11-26 Thread Bryan Bende
Not sure if this is relevant, but snappy-java has a specific
SnappyHadoopCompatibleOutputStream so CompressContent could offer a
third snappy option like "snappy-hadoop" which used that.

Shawn is correct though that we wouldn't want to introduce Hadoop libs
into CompressContent.

[1] 
https://github.com/xerial/snappy-java/blob/73c67c70303e509be1642af5e302411d39434249/src/main/java/org/xerial/snappy/SnappyHadoopCompatibleOutputStream.java

On Tue, Nov 26, 2019 at 11:51 AM Shawn Weeks  wrote:
>
> It uses snappy-java to get around the native class path issues that would 
> exist otherwise. What’s wrong with snappy-java?
>
>
>
> Thanks
>
> Shawn
>
>
>
> From: Noe Detore 
> Reply-To: "users@nifi.apache.org" 
> Date: Monday, November 25, 2019 at 2:16 PM
> To: "users@nifi.apache.org" 
> Subject: CompressContent hadoop-snappy
>
>
>
> Hello
>
>
>
> CompressContent ver 1.9 uses snappy-java. Is there an easy way to change it 
> to hadoop-snappy? Or a custom processor needs to be created?
>
>
>
> thank you
>
> Noe


Re: CompressContent hadoop-snappy

2019-11-26 Thread Shawn Weeks
It uses snappy-java to get around the native class path issues that would exist 
otherwise. What’s wrong with snappy-java?

Thanks
Shawn

From: Noe Detore 
Reply-To: "users@nifi.apache.org" 
Date: Monday, November 25, 2019 at 2:16 PM
To: "users@nifi.apache.org" 
Subject: CompressContent hadoop-snappy

Hello

CompressContent ver 1.9 uses snappy-java. Is there an easy way to change it to 
hadoop-snappy? Or a custom processor needs to be created?

thank you
Noe