[ https://issues.apache.org/jira/browse/TINKERPOP-3133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17927203#comment-17927203 ]
ASF GitHub Bot commented on TINKERPOP-3133: ------------------------------------------- Cole-Greer commented on PR #3026: URL: https://github.com/apache/tinkerpop/pull/3026#issuecomment-2659935336 >> Also is this intended to be targeting the master branch or is it intended for 3.7-dev? > > In my previous PR, I targeted 3.7-dev, so I follow it here. Shall I change to target for master? I'm not sure which previous PR you're referencing here. Our branching strategy is such that any change to an older development branch will be merged up into newer dev branches such that it gets included in all upcoming releases. A PR which targets 3.7-dev will also get merged up into master, but a PR which targets master will only be merged there. In other words, a PR which targets 3.7-dev will be included in the upcoming 3.7.4 and 4.0.0 releases, where a PR targeting master will only be included in the 4.0.0 release. This is a non-breaking change so it is ok to target 3.7-dev if you would like it included in 3.7.4. VOTE +1 (pending confirmation of desired target branch) > Customize the file count by repartition the OutputRDD in Spark to reduce HDFS > small files > ----------------------------------------------------------------------------------------- > > Key: TINKERPOP-3133 > URL: https://issues.apache.org/jira/browse/TINKERPOP-3133 > Project: TinkerPop > Issue Type: Improvement > Components: hadoop > Affects Versions: 3.7.3 > Reporter: Redriver > Priority: Major > > The Graph export to HDFS through OutputRDD, but we often saw there are many > small files in production environment. For example, there are more than > 50,000 files and each is about 17 MB, which will trigger HDFS small files > alerts. So, it is better allow customize the output file numbers by > repartition the OutputRDD. -- This message was sent by Atlassian Jira (v8.20.10#820010)