Re: CompressContent hadoop-snappy
In my use case, I am compressing data than storing data in S3. Unfortunately, hadoop-snappy is not able to uncompress snappy-java. So using snappy-java files in Hive is not possible. It would be nice to have the option to select hadoop-snappy from CompressContent and just add the native libs to the jvm similar to PutHdfs. I will also look into SnappyHadoopCompatibleOutputStream. I will make the effort to contribute back if I go this route. Thank you Noe On Tue, Nov 26, 2019 at 12:54 PM Bryan Bende wrote: > Not sure if this is relevant, but snappy-java has a specific > SnappyHadoopCompatibleOutputStream so CompressContent could offer a > third snappy option like "snappy-hadoop" which used that. > > Shawn is correct though that we wouldn't want to introduce Hadoop libs > into CompressContent. > > [1] > https://github.com/xerial/snappy-java/blob/73c67c70303e509be1642af5e302411d39434249/src/main/java/org/xerial/snappy/SnappyHadoopCompatibleOutputStream.java > > On Tue, Nov 26, 2019 at 11:51 AM Shawn Weeks > wrote: > > > > It uses snappy-java to get around the native class path issues that > would exist otherwise. What’s wrong with snappy-java? > > > > > > > > Thanks > > > > Shawn > > > > > > > > From: Noe Detore > > Reply-To: "users@nifi.apache.org" > > Date: Monday, November 25, 2019 at 2:16 PM > > To: "users@nifi.apache.org" > > Subject: CompressContent hadoop-snappy > > > > > > > > Hello > > > > > > > > CompressContent ver 1.9 uses snappy-java. Is there an easy way to change > it to hadoop-snappy? Or a custom processor needs to be created? > > > > > > > > thank you > > > > Noe >
Re: Reading Parquet Files from S3 using Apache Nifi
Hi, In NiFi 1.10.0 there is a parquet record reader so your flow could be ListS3 -> FetchS3 -> ConvertRecord (parquet reader, json writer) -> whatever else. -Bryan On Tue, Nov 26, 2019 at 3:55 PM Chowdhury, Rifat wrote: > > Hi, > > Is there an easy way to read Parquet data from S3 Directly using Nifi? > > I am using ListS3 -> FetchParquet and it's not working. Or would you > recommend trying the following: ListS3 -> FetchS3 -> PutFile(it would put it > in one of the nodes) -> FetchParquet (grab from where we put it) > > My goal is to retrieve some of the columns of that parquet data, maybe > convert it to JSON and then do further data processing using Nifi. Any > suggestions would be greatly appreciated. > > Best Regards, Rifat >
Reading Parquet Files from S3 using Apache Nifi
Hi, Is there an easy way to read Parquet data from S3 Directly using Nifi? I am using ListS3 -> FetchParquet and it's not working. Or would you recommend trying the following: ListS3 -> FetchS3 -> PutFile(it would put it in one of the nodes) -> FetchParquet (grab from where we put it) My goal is to retrieve some of the columns of that parquet data, maybe convert it to JSON and then do further data processing using Nifi. Any suggestions would be greatly appreciated. Best Regards, Rifat
Re: CompressContent hadoop-snappy
Not sure if this is relevant, but snappy-java has a specific SnappyHadoopCompatibleOutputStream so CompressContent could offer a third snappy option like "snappy-hadoop" which used that. Shawn is correct though that we wouldn't want to introduce Hadoop libs into CompressContent. [1] https://github.com/xerial/snappy-java/blob/73c67c70303e509be1642af5e302411d39434249/src/main/java/org/xerial/snappy/SnappyHadoopCompatibleOutputStream.java On Tue, Nov 26, 2019 at 11:51 AM Shawn Weeks wrote: > > It uses snappy-java to get around the native class path issues that would > exist otherwise. What’s wrong with snappy-java? > > > > Thanks > > Shawn > > > > From: Noe Detore > Reply-To: "users@nifi.apache.org" > Date: Monday, November 25, 2019 at 2:16 PM > To: "users@nifi.apache.org" > Subject: CompressContent hadoop-snappy > > > > Hello > > > > CompressContent ver 1.9 uses snappy-java. Is there an easy way to change it > to hadoop-snappy? Or a custom processor needs to be created? > > > > thank you > > Noe
Re: CompressContent hadoop-snappy
It uses snappy-java to get around the native class path issues that would exist otherwise. What’s wrong with snappy-java? Thanks Shawn From: Noe Detore Reply-To: "users@nifi.apache.org" Date: Monday, November 25, 2019 at 2:16 PM To: "users@nifi.apache.org" Subject: CompressContent hadoop-snappy Hello CompressContent ver 1.9 uses snappy-java. Is there an easy way to change it to hadoop-snappy? Or a custom processor needs to be created? thank you Noe