Ok.
I came across this issue.
Not sure if you already assessed this:
https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-6921

The workaround mentioned may work for you .

Thanks
Deepak
On 1 Jul 2016 9:34 am, "Chanh Le" <giaosu...@gmail.com> wrote:

> Hi Deepark,
> Thank for replying. The way to write into alluxio is
> df.write.mode(SaveMode.Append).partitionBy("network_id", "time").parquet("
> alluxio://master1:19999/FACT_ADMIN_HOURLYā€¯)
>
>
> I partition by 2 columns and store. I just want when I write it automatic
> write a size properly for what I already set in Alluxio 512MB per block.
>
>
> On Jul 1, 2016, at 11:01 AM, Deepak Sharma <deepakmc...@gmail.com> wrote:
>
> Before writing coalesing your rdd to 1 .
> It will create only 1 output file .
> Multiple part file happens as all your executors will be writing their
> partitions to separate part files.
>
> Thanks
> Deepak
> On 1 Jul 2016 8:01 am, "Chanh Le" <giaosu...@gmail.com> wrote:
>
> Hi everyone,
> I am using Alluxio for storage. But I am little bit confuse why I am do
> set block size of alluxio is 512MB and my file part only few KB and too
> many part.
> Is that normal? Because I want to read it fast? Is that many part effect
> the read operation?
> How to set the size of file part?
>
> Thanks.
> Chanh
>
>
>
>
>
> <Screen_Shot_2016-07-01_at_9_24_55_AM.png>
>
>
>

Reply via email to