[jira] [Created] (TINKERPOP-3133) Customize the file count by repartition the OutputRDD in Spark to reduce HDFS small files

Redriver (Jira) Sat, 08 Feb 2025 23:09:29 -0800

Redriver created TINKERPOP-3133:
-----------------------------------

             Summary: Customize the file count by repartition the OutputRDD in 
Spark to reduce HDFS small files
                 Key: TINKERPOP-3133
                 URL: https://issues.apache.org/jira/browse/TINKERPOP-3133
             Project: TinkerPop
          Issue Type: Improvement
          Components: hadoop
    Affects Versions: 3.7.3
            Reporter: Redriver



The Graph export to HDFS through OutputRDD, but we often saw there are many 
small files in production environment. For example, there are more than 50,000 
files and each is about 17 MB, which will trigger HDFS small files alerts. So, 
it is better allow customize the output file numbers by repartition the 
OutputRDD.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (TINKERPOP-3133) Customize the file count by repartition the OutputRDD in Spark to reduce HDFS small files

Reply via email to