Redriver created TINKERPOP-3133: ----------------------------------- Summary: Customize the file count by repartition the OutputRDD in Spark to reduce HDFS small files Key: TINKERPOP-3133 URL: https://issues.apache.org/jira/browse/TINKERPOP-3133 Project: TinkerPop Issue Type: Improvement Components: hadoop Affects Versions: 3.7.3 Reporter: Redriver
The Graph export to HDFS through OutputRDD, but we often saw there are many small files in production environment. For example, there are more than 50,000 files and each is about 17 MB, which will trigger HDFS small files alerts. So, it is better allow customize the output file numbers by repartition the OutputRDD. -- This message was sent by Atlassian Jira (v8.20.10#820010)