Hi Gene,
Thank for your support. I agree with you because of number executor but many
parquet files influence to read performance so I need a way to improve that. So
the way I work around is
df.coalesce(1)
.write.mode(SaveMode.Overwrite).partitionBy("network_id")
Hi Chanh,
You should be able to set the Alluxio block size with:
sc.hadoopConfiguration.set("alluxio.user.block.size.bytes.default", "256mb")
I think you have many parquet files because you have many Spark executors
writing out their partition of the files.
Hope that helps,
Gene
On Sun, Jul
Hi Gene,
Could you give some suggestions on that?
> On Jul 1, 2016, at 5:31 PM, Ted Yu wrote:
>
> The comment from zhangxiongfei was from a year ago.
>
> Maybe something changed since them ?
>
> On Fri, Jul 1, 2016 at 12:07 AM, Chanh Le
Looking under Alluxio source, it seems only "fs.hdfs.impl.disable.cache" is
in use.
FYI
On Thu, Jun 30, 2016 at 9:30 PM, Deepak Sharma
wrote:
> Ok.
> I came across this issue.
> Not sure if you already assessed this:
>
Ok.
I came across this issue.
Not sure if you already assessed this:
https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-6921
The workaround mentioned may work for you .
Thanks
Deepak
On 1 Jul 2016 9:34 am, "Chanh Le" wrote:
> Hi Deepark,
> Thank for
Hi Deepark,
Thank for replying. The way to write into alluxio is
df.write.mode(SaveMode.Append).partitionBy("network_id",
"time").parquet("alluxio://master1:1/FACT_ADMIN_HOURLYā€¯)
I partition by 2 columns and store. I just want when I write it automatic write
a size properly for what I