Re: Saving Parquet files to S3
Hi Ankur, I also tried setting a property to write parquet file size of 256MB. I am using pyspark below is how I set the property but it's not working for me. How did you set the property? spark_context._jsc.hadoopConfiguration().setInt( "dfs.blocksize", 268435456) spark_context._jsc.hadoopConfiguration().setInt( "parquet.block.size", 268435) Thanks, Bijay On Fri, Jun 10, 2016 at 5:24 AM, Ankur Jain <ankur.j...@yash.com> wrote: > Thanks maropu.. It worked… > > > > *From:* Takeshi Yamamuro [mailto:linguin@gmail.com] > *Sent:* 10 June 2016 11:47 AM > *To:* Ankur Jain > *Cc:* user@spark.apache.org > *Subject:* Re: Saving Parquet files to S3 > > > > Hi, > > > > You'd better off `setting parquet.block.size`. > > > > // maropu > > > > On Thu, Jun 9, 2016 at 7:48 AM, Daniel Siegmann < > daniel.siegm...@teamaol.com> wrote: > > I don't believe there's anyway to output files of a specific size. What > you can do is partition your data into a number of partitions such that the > amount of data they each contain is around 1 GB. > > > > On Thu, Jun 9, 2016 at 7:51 AM, Ankur Jain <ankur.j...@yash.com> wrote: > > Hello Team, > > > > I want to write parquet files to AWS S3, but I want to size each file size > to 1 GB. > > Can someone please guide me on how I can achieve the same? > > > > I am using AWS EMR with spark 1.6.1. > > > > Thanks, > > Ankur > > Information transmitted by this e-mail is proprietary to YASH Technologies > and/ or its Customers and is intended for use only by the individual or > entity to which it is addressed, and may contain information that is > privileged, confidential or exempt from disclosure under applicable law. If > you are not the intended recipient or it appears that this mail has been > forwarded to you without proper authority, you are notified that any use or > dissemination of this information in any manner is strictly prohibited. In > such cases, please notify us immediately at i...@yash.com and delete this > mail from your records. > > > > > > > > -- > > --- > Takeshi Yamamuro > Information transmitted by this e-mail is proprietary to YASH Technologies > and/ or its Customers and is intended for use only by the individual or > entity to which it is addressed, and may contain information that is > privileged, confidential or exempt from disclosure under applicable law. If > you are not the intended recipient or it appears that this mail has been > forwarded to you without proper authority, you are notified that any use or > dissemination of this information in any manner is strictly prohibited. In > such cases, please notify us immediately at i...@yash.com and delete this > mail from your records. >
RE: Saving Parquet files to S3
Thanks maropu.. It worked… From: Takeshi Yamamuro [mailto:linguin@gmail.com] Sent: 10 June 2016 11:47 AM To: Ankur Jain Cc: user@spark.apache.org Subject: Re: Saving Parquet files to S3 Hi, You'd better off `setting parquet.block.size`. // maropu On Thu, Jun 9, 2016 at 7:48 AM, Daniel Siegmann <daniel.siegm...@teamaol.com<mailto:daniel.siegm...@teamaol.com>> wrote: I don't believe there's anyway to output files of a specific size. What you can do is partition your data into a number of partitions such that the amount of data they each contain is around 1 GB. On Thu, Jun 9, 2016 at 7:51 AM, Ankur Jain <ankur.j...@yash.com<mailto:ankur.j...@yash.com>> wrote: Hello Team, I want to write parquet files to AWS S3, but I want to size each file size to 1 GB. Can someone please guide me on how I can achieve the same? I am using AWS EMR with spark 1.6.1. Thanks, Ankur Information transmitted by this e-mail is proprietary to YASH Technologies and/ or its Customers and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly prohibited. In such cases, please notify us immediately at i...@yash.com<mailto:i...@yash.com> and delete this mail from your records. -- --- Takeshi Yamamuro Information transmitted by this e-mail is proprietary to YASH Technologies and/ or its Customers and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly prohibited. In such cases, please notify us immediately at i...@yash.com and delete this mail from your records.
Re: Saving Parquet files to S3
Hi, You'd better off `setting parquet.block.size`. // maropu On Thu, Jun 9, 2016 at 7:48 AM, Daniel Siegmannwrote: > I don't believe there's anyway to output files of a specific size. What > you can do is partition your data into a number of partitions such that the > amount of data they each contain is around 1 GB. > > On Thu, Jun 9, 2016 at 7:51 AM, Ankur Jain wrote: > >> Hello Team, >> >> >> >> I want to write parquet files to AWS S3, but I want to size each file >> size to 1 GB. >> >> Can someone please guide me on how I can achieve the same? >> >> >> >> I am using AWS EMR with spark 1.6.1. >> >> >> >> Thanks, >> >> Ankur >> Information transmitted by this e-mail is proprietary to YASH >> Technologies and/ or its Customers and is intended for use only by the >> individual or entity to which it is addressed, and may contain information >> that is privileged, confidential or exempt from disclosure under applicable >> law. If you are not the intended recipient or it appears that this mail has >> been forwarded to you without proper authority, you are notified that any >> use or dissemination of this information in any manner is strictly >> prohibited. In such cases, please notify us immediately at i...@yash.com >> and delete this mail from your records. >> > > -- --- Takeshi Yamamuro
Re: Saving Parquet files to S3
I don't believe there's anyway to output files of a specific size. What you can do is partition your data into a number of partitions such that the amount of data they each contain is around 1 GB. On Thu, Jun 9, 2016 at 7:51 AM, Ankur Jainwrote: > Hello Team, > > > > I want to write parquet files to AWS S3, but I want to size each file size > to 1 GB. > > Can someone please guide me on how I can achieve the same? > > > > I am using AWS EMR with spark 1.6.1. > > > > Thanks, > > Ankur > Information transmitted by this e-mail is proprietary to YASH Technologies > and/ or its Customers and is intended for use only by the individual or > entity to which it is addressed, and may contain information that is > privileged, confidential or exempt from disclosure under applicable law. If > you are not the intended recipient or it appears that this mail has been > forwarded to you without proper authority, you are notified that any use or > dissemination of this information in any manner is strictly prohibited. In > such cases, please notify us immediately at i...@yash.com and delete this > mail from your records. >