Redriver created TINKERPOP-3133:
-----------------------------------
Summary: Customize the file count by repartition the OutputRDD in
Spark to reduce HDFS small files
Key: TINKERPOP-3133
URL: https://issues.apache.org/jira/browse/TINKERPOP-3133
Project: TinkerPop
Issue Type: Improvement
Components: hadoop
Affects Versions: 3.7.3
Reporter: Redriver
The Graph export to HDFS through OutputRDD, but we often saw there are many
small files in production environment. For example, there are more than 50,000
files and each is about 17 MB, which will trigger HDFS small files alerts. So,
it is better allow customize the output file numbers by repartition the
OutputRDD.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)