Re: Saving Parquet files to S3

2016-06-10 Thread Bijay Kumar Pathak
Hi Ankur,

I also tried setting a property to write parquet file size of 256MB. I am
using pyspark below is how I set the property but it's not working for me.
How did you set the property?


spark_context._jsc.hadoopConfiguration().setInt( "dfs.blocksize", 268435456)
spark_context._jsc.hadoopConfiguration().setInt( "parquet.block.size",
268435)

Thanks,
Bijay

On Fri, Jun 10, 2016 at 5:24 AM, Ankur Jain <ankur.j...@yash.com> wrote:

> Thanks maropu.. It worked…
>
>
>
> *From:* Takeshi Yamamuro [mailto:linguin@gmail.com]
> *Sent:* 10 June 2016 11:47 AM
> *To:* Ankur Jain
> *Cc:* user@spark.apache.org
> *Subject:* Re: Saving Parquet files to S3
>
>
>
> Hi,
>
>
>
> You'd better off `setting parquet.block.size`.
>
>
>
> // maropu
>
>
>
> On Thu, Jun 9, 2016 at 7:48 AM, Daniel Siegmann <
> daniel.siegm...@teamaol.com> wrote:
>
> I don't believe there's anyway to output files of a specific size. What
> you can do is partition your data into a number of partitions such that the
> amount of data they each contain is around 1 GB.
>
>
>
> On Thu, Jun 9, 2016 at 7:51 AM, Ankur Jain <ankur.j...@yash.com> wrote:
>
> Hello Team,
>
>
>
> I want to write parquet files to AWS S3, but I want to size each file size
> to 1 GB.
>
> Can someone please guide me on how I can achieve the same?
>
>
>
> I am using AWS EMR with spark 1.6.1.
>
>
>
> Thanks,
>
> Ankur
>
> Information transmitted by this e-mail is proprietary to YASH Technologies
> and/ or its Customers and is intended for use only by the individual or
> entity to which it is addressed, and may contain information that is
> privileged, confidential or exempt from disclosure under applicable law. If
> you are not the intended recipient or it appears that this mail has been
> forwarded to you without proper authority, you are notified that any use or
> dissemination of this information in any manner is strictly prohibited. In
> such cases, please notify us immediately at i...@yash.com and delete this
> mail from your records.
>
>
>
>
>
>
>
> --
>
> ---
> Takeshi Yamamuro
> Information transmitted by this e-mail is proprietary to YASH Technologies
> and/ or its Customers and is intended for use only by the individual or
> entity to which it is addressed, and may contain information that is
> privileged, confidential or exempt from disclosure under applicable law. If
> you are not the intended recipient or it appears that this mail has been
> forwarded to you without proper authority, you are notified that any use or
> dissemination of this information in any manner is strictly prohibited. In
> such cases, please notify us immediately at i...@yash.com and delete this
> mail from your records.
>


RE: Saving Parquet files to S3

2016-06-10 Thread Ankur Jain
Thanks maropu.. It worked…

From: Takeshi Yamamuro [mailto:linguin@gmail.com]
Sent: 10 June 2016 11:47 AM
To: Ankur Jain
Cc: user@spark.apache.org
Subject: Re: Saving Parquet files to S3

Hi,

You'd better off `setting parquet.block.size`.

// maropu

On Thu, Jun 9, 2016 at 7:48 AM, Daniel Siegmann 
<daniel.siegm...@teamaol.com<mailto:daniel.siegm...@teamaol.com>> wrote:
I don't believe there's anyway to output files of a specific size. What you can 
do is partition your data into a number of partitions such that the amount of 
data they each contain is around 1 GB.

On Thu, Jun 9, 2016 at 7:51 AM, Ankur Jain 
<ankur.j...@yash.com<mailto:ankur.j...@yash.com>> wrote:
Hello Team,

I want to write parquet files to AWS S3, but I want to size each file size to 1 
GB.
Can someone please guide me on how I can achieve the same?

I am using AWS EMR with spark 1.6.1.

Thanks,
Ankur
Information transmitted by this e-mail is proprietary to YASH Technologies and/ 
or its Customers and is intended for use only by the individual or entity to 
which it is addressed, and may contain information that is privileged, 
confidential or exempt from disclosure under applicable law. If you are not the 
intended recipient or it appears that this mail has been forwarded to you 
without proper authority, you are notified that any use or dissemination of 
this information in any manner is strictly prohibited. In such cases, please 
notify us immediately at i...@yash.com<mailto:i...@yash.com> and delete this 
mail from your records.




--
---
Takeshi Yamamuro
Information transmitted by this e-mail is proprietary to YASH Technologies and/ 
or its Customers and is intended for use only by the individual or entity to 
which it is addressed, and may contain information that is privileged, 
confidential or exempt from disclosure under applicable law. If you are not the 
intended recipient or it appears that this mail has been forwarded to you 
without proper authority, you are notified that any use or dissemination of 
this information in any manner is strictly prohibited. In such cases, please 
notify us immediately at i...@yash.com and delete this mail from your records.


Re: Saving Parquet files to S3

2016-06-10 Thread Takeshi Yamamuro
Hi,

You'd better off `setting parquet.block.size`.

// maropu

On Thu, Jun 9, 2016 at 7:48 AM, Daniel Siegmann  wrote:

> I don't believe there's anyway to output files of a specific size. What
> you can do is partition your data into a number of partitions such that the
> amount of data they each contain is around 1 GB.
>
> On Thu, Jun 9, 2016 at 7:51 AM, Ankur Jain  wrote:
>
>> Hello Team,
>>
>>
>>
>> I want to write parquet files to AWS S3, but I want to size each file
>> size to 1 GB.
>>
>> Can someone please guide me on how I can achieve the same?
>>
>>
>>
>> I am using AWS EMR with spark 1.6.1.
>>
>>
>>
>> Thanks,
>>
>> Ankur
>> Information transmitted by this e-mail is proprietary to YASH
>> Technologies and/ or its Customers and is intended for use only by the
>> individual or entity to which it is addressed, and may contain information
>> that is privileged, confidential or exempt from disclosure under applicable
>> law. If you are not the intended recipient or it appears that this mail has
>> been forwarded to you without proper authority, you are notified that any
>> use or dissemination of this information in any manner is strictly
>> prohibited. In such cases, please notify us immediately at i...@yash.com
>> and delete this mail from your records.
>>
>
>


-- 
---
Takeshi Yamamuro


Re: Saving Parquet files to S3

2016-06-09 Thread Daniel Siegmann
I don't believe there's anyway to output files of a specific size. What you
can do is partition your data into a number of partitions such that the
amount of data they each contain is around 1 GB.

On Thu, Jun 9, 2016 at 7:51 AM, Ankur Jain  wrote:

> Hello Team,
>
>
>
> I want to write parquet files to AWS S3, but I want to size each file size
> to 1 GB.
>
> Can someone please guide me on how I can achieve the same?
>
>
>
> I am using AWS EMR with spark 1.6.1.
>
>
>
> Thanks,
>
> Ankur
> Information transmitted by this e-mail is proprietary to YASH Technologies
> and/ or its Customers and is intended for use only by the individual or
> entity to which it is addressed, and may contain information that is
> privileged, confidential or exempt from disclosure under applicable law. If
> you are not the intended recipient or it appears that this mail has been
> forwarded to you without proper authority, you are notified that any use or
> dissemination of this information in any manner is strictly prohibited. In
> such cases, please notify us immediately at i...@yash.com and delete this
> mail from your records.
>