Re: Append to Parquet

2017-12-01 Thread Giovanni Lanzani

On 1 Dec 2017, at 3:44, VinShar wrote:

yes this was my understanding also but then i found that Spark's 
DataFrame
does has a method which appends to Parquet ( 
df.write.parquet(destName,
mode="append")). below is an article that throws some light on this. i 
was

wondering if there is a way to achieve the same through NiFi.

http://aseigneurin.github.io/2017/03/14/incrementally-loaded-parquet-files.html


You should not believe all that bloggers write :)

In the blog they are writing to the `permit-inspections.parquet` 
**folder**. It’s not a parquet file.


The parquet files are contained in the folder. The append mode you are 
referring to simply writes new parquet files in the folder, without 
touching the existing ones.


If they would have used the `overwrite` option, then the existing folder 
would have been emptied before.


Cheers,

Giovanni

Re: Append to Parquet

2017-11-30 Thread VinShar
yes this was my understanding also but then i found that Spark's DataFrame
does has a method which appends to Parquet ( df.write.parquet(destName,
mode="append")). below is an article that throws some light on this. i was
wondering if there is a way to achieve the same through NiFi.

http://aseigneurin.github.io/2017/03/14/incrementally-loaded-parquet-files.html

I have a workaround in mind for this where i can save data i want to append
to parque in a file (say in avro format) and then execute a script through
ExecuteProcess to launch a spark job to read avro and append to an existing
Parquet file and then delete avro. I am looking for a simpler way than this.



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Append to Parquet

2017-11-30 Thread Bryan Bende
Hello,

As far as I know there is not an option in Parquet to append due to
the way it's internal format works.

The ParquetFileWriter has a mode which only has CREATE and OVERWRITE:

https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java#L105-L107

-Bryan


On Thu, Nov 30, 2017 at 5:12 PM, VinShar  wrote:
> Hi,
>
> Is there any way to use PutParquet to append to an existing parquet file? i
> know that i can create a Kite DataSet and write parques to it but i am
> looking for an alternate to Spark's DataFrame.write.parquet (destination,
> mode="overwrite")
>
> Regards,
> Vinay
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/