[ https://issues.apache.org/jira/browse/KYLIN-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094228#comment-17094228 ]
ASF subversion and git services commented on KYLIN-4320: -------------------------------------------------------- Commit 78afb52b57736bf0bfd10a0299ac9b44f1119400 in kylin's branch refs/heads/master from Shao Feng Shi [ https://gitbox.apache.org/repos/asf?p=kylin.git;h=78afb52 ] Revert "KYLIN-4320 number of replicas of Cuboid files cannot be configured for Spark engine" This reverts commit 926515bfc217167fe570c0cf21a39f54e5b5d1ff. > number of replicas of Cuboid files cannot be configured for Spark engine > ------------------------------------------------------------------------ > > Key: KYLIN-4320 > URL: https://issues.apache.org/jira/browse/KYLIN-4320 > Project: Kylin > Issue Type: Bug > Components: Job Engine > Affects Versions: v3.0.1 > Reporter: Congling Xia > Assignee: Yaqian Zhang > Priority: Major > Fix For: v3.1.0 > > Attachments: cuboid_replications.png > > Original Estimate: 4h > Remaining Estimate: 4h > > Hi, team. I try to change `dfs.replication` to 3 by adding the following > config override > {code:java} > kylin.engine.spark-conf.spark.hadoop.dfs.replication=3 > {code} > Then, I get a strange result - numbers of replicas of cuboid files varies > even though they are in the same level. > !cuboid_replications.png! > I guess it is due to the conflicting settings in SparkUtil: > {code:java} > public static void modifySparkHadoopConfiguration(SparkContext sc) throws > Exception { > sc.hadoopConfiguration().set("dfs.replication", "2"); // cuboid > intermediate files, replication=2 > > sc.hadoopConfiguration().set("mapreduce.output.fileoutputformat.compress", > "true"); > > sc.hadoopConfiguration().set("mapreduce.output.fileoutputformat.compress.type", > "BLOCK"); > > sc.hadoopConfiguration().set("mapreduce.output.fileoutputformat.compress.codec", > "org.apache.hadoop.io.compress.DefaultCodec"); // or > org.apache.hadoop.io.compress.SnappyCodec > } > {code} > It may be a bug for Spark property precedence. After checking [Spark > documents|#dynamically-loading-spark-properties]], it seems that some > programmatically set properties may not take effect and it is not a > recommended way for Spark job configuration. > > Anyway, cuboid files may survive for weeks until expired or been merged, the > configuration rewrite in > `org.apache.kylin.engine.spark.SparkUtil#modifySparkHadoopConfiguration` > makes those files less reliable. > Is there any way to force cuboid files to remain 3 replicas? or shall we > remove the code in SparkUtil to make > kylin.engine.spark-conf.spark.hadoop.dfs.replication work properly? -- This message was sent by Atlassian Jira (v8.3.4#803005)