[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort
[ https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269638#comment-16269638 ] Daniel Pol commented on HDFS-8198: -- [~eddyxu] What's considered "client-side" for Teragen ? For example when I run this from an external node "yarn jar hadoop-mapreduce-examples-3.0.0.jar teragen". Is the external node the "client-side" or all the YARN nodemanagers that execute the map tasks ? I'm asking because I don't see any cpu used on the external node while I'm running hundreds of Teragen threads. Thanks for the info on codec performance impact on the overall IO path. > Erasure Coding: system test of TeraSort > --- > > Key: HDFS-8198 > URL: https://issues.apache.org/jira/browse/HDFS-8198 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7285 >Reporter: Kai Sasaki > Attachments: ec.out, replication.out > > > Functional system test of TeraSort on EC files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort
[ https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269608#comment-16269608 ] Lei (Eddy) Xu commented on HDFS-8198: - >From our experience, the codec performance is negligible on the IO path, >comparing to network and disks overhead. We dont have existing metrics of the time spent on RS encoding / decoding, as these happen on client side most of the time, so it is not convenient to expose that via JMX. If it is feasible to your environment, you can checkout {{RawErasureEncoder / Decoder}} and add some measurements there, re-compile it and replace JARs in your environment for today. [~xiaochen] is also looking into exposing these client-side metrics, I believe him can give a better answer here. > Erasure Coding: system test of TeraSort > --- > > Key: HDFS-8198 > URL: https://issues.apache.org/jira/browse/HDFS-8198 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7285 >Reporter: Kai Sasaki > Attachments: ec.out, replication.out > > > Functional system test of TeraSort on EC files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort
[ https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269487#comment-16269487 ] Daniel Pol commented on HDFS-8198: -- [~eddyxu] Yes, I did change io.erasurecode.codec.rs.rawcoders . Wanted to see the performance delta between rs_native and rs_java. Yes, I'm using Terasort 3TB for my tests. Where can I find the time spent in RS encoding/decoding ? I'm very interested in that type of info. > Erasure Coding: system test of TeraSort > --- > > Key: HDFS-8198 > URL: https://issues.apache.org/jira/browse/HDFS-8198 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7285 >Reporter: Kai Sasaki > Attachments: ec.out, replication.out > > > Functional system test of TeraSort on EC files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort
[ https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269477#comment-16269477 ] Lei (Eddy) Xu commented on HDFS-8198: - If you did not change {{io.erasurecode.codec.rs.rawcoders}}, HDFS will give you {{rs_native}} when ISA-L is available. You can change a {{io.erasurecode.codec.rs.rawcoders}} to let HDFS load rs_java instead. bq. Would be even better if there was a DEBUG message that shows exactly which coder is used for EC encoding. This is a good suggestion, I will file a JIRA for this. Performance wise, did you test against macro benchmarks like terasort? If so, it'd be nice to have the number about how much time was in RS encoding / decoding for native code to make a difference. > Erasure Coding: system test of TeraSort > --- > > Key: HDFS-8198 > URL: https://issues.apache.org/jira/browse/HDFS-8198 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7285 >Reporter: Kai Sasaki > Attachments: ec.out, replication.out > > > Functional system test of TeraSort on EC files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort
[ https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269434#comment-16269434 ] Daniel Pol commented on HDFS-8198: -- [~eddyxu] Key work being "should". checknative shows ISA-L is loaded but I don't see any performance difference between running with rs_native vs rs_java. So I'm wondering if it's really using the coder I specify. Even with DEBUG log level, the map output for Teragen shows both rs_native and rs_java is loaded, but no clear way to know which one is used. Even with the io.erasurecode.codec.rs.rawcoders set to one of these coders only, I still see the same debug message. Was wondering if there's some output with DEBUG or INFO log level that would give me a hint which coder is using. Would be even better if there was a DEBUG message that shows exactly which coder is used for EC encoding. > Erasure Coding: system test of TeraSort > --- > > Key: HDFS-8198 > URL: https://issues.apache.org/jira/browse/HDFS-8198 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7285 >Reporter: Kai Sasaki > Attachments: ec.out, replication.out > > > Functional system test of TeraSort on EC files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort
[ https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269422#comment-16269422 ] Lei (Eddy) Xu commented on HDFS-8198: - You can use {{hadoop checknative}} to see whether {{ISA-L}} is loaded. If it is available, HDFS should use rs_native. > Erasure Coding: system test of TeraSort > --- > > Key: HDFS-8198 > URL: https://issues.apache.org/jira/browse/HDFS-8198 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7285 >Reporter: Kai Sasaki > Attachments: ec.out, replication.out > > > Functional system test of TeraSort on EC files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort
[ https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269413#comment-16269413 ] Daniel Pol commented on HDFS-8198: -- [~eddyxu] I was referring to checking if it ended up using rs_native or rs_java for example. (ISA-L library or not) > Erasure Coding: system test of TeraSort > --- > > Key: HDFS-8198 > URL: https://issues.apache.org/jira/browse/HDFS-8198 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7285 >Reporter: Kai Sasaki > Attachments: ec.out, replication.out > > > Functional system test of TeraSort on EC files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort
[ https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269409#comment-16269409 ] Lei (Eddy) Xu commented on HDFS-8198: - [~danielpol] You can use {{hdfs ec -getPolicy}} to see the ec policy used. > Erasure Coding: system test of TeraSort > --- > > Key: HDFS-8198 > URL: https://issues.apache.org/jira/browse/HDFS-8198 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7285 >Reporter: Kai Sasaki > Attachments: ec.out, replication.out > > > Functional system test of TeraSort on EC files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort
[ https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269347#comment-16269347 ] Daniel Pol commented on HDFS-8198: -- By the way, is there a way to check which coder was used for EC ? I know I can set the list to use, but can't find a clear way to see in the logs which coder was used. Want to check the speed of different coders and wanted to make sure I'm measuring correctly because I don't see a lot of delta in performance. > Erasure Coding: system test of TeraSort > --- > > Key: HDFS-8198 > URL: https://issues.apache.org/jira/browse/HDFS-8198 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7285 >Reporter: Kai Sasaki > Attachments: ec.out, replication.out > > > Functional system test of TeraSort on EC files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort
[ https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248304#comment-16248304 ] Xiao Chen commented on HDFS-8198: - Thanks Daniel for reporting the issue and details. bq. I can't seem to find the proper way to upload Probably due to jira permissions. I just added you to the HDFS contributor role, could you see the 'Attach Files' option now? We will try to reproduce this in our cluster too. > Erasure Coding: system test of TeraSort > --- > > Key: HDFS-8198 > URL: https://issues.apache.org/jira/browse/HDFS-8198 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7285 >Reporter: Kai Sasaki > > Functional system test of TeraSort on EC files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort
[ https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247558#comment-16247558 ] Daniel Pol commented on HDFS-8198: -- [~eddyxu] I have 7 datanodes. I'm new to the JIRA system and I can't seem to find the proper way to upload the terasort output file. Please let me know how I can do that. The relevant error from the terasort output is: 17/11/04 09:36:15 INFO mapreduce.Job: Task Id : attempt_1509761319113_0021_m_02_0, Status : FAILEDError: java.io.IOException: 3 missing blocks, the stripe is: Offset=77594624, length=1048576, fetchedChunksNum=1, missingChunksNum=3; locatedBlocks is: LocatedBlocks{ fileLength=50 underConstruction=false blocks=[LocatedStripedBlock{BP-260511027-172.30.253.91-1487788944154:blk_-9223372036852841888_5101378; getBlockSize()=1610612736; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[172.30.253.6:50010,DS-780df34f-44c3-4c67-b7dc-f901bc12a957,DISK], DatanodeInfoWithStorage[172.30.253.5:50010,DS-c5e33c96-3df3-480b-80aa-fe97a3b8e3b4,DISK], DatanodeInfoWithStorage[172.30.253.3:50010,DS-4cd5c037-9dcb-488c-81c2-0aa8ff1cbd2f,DISK], DatanodeInfoWithStorage[172.30.253.4:50010,DS-6bac2c0f-f8c6-4a67-8801-f2a7a74279a6,DISK], DatanodeInfoWithStorage[172.30.253.7:50010,DS-0ee9e606-db4b-4df6-b180-fedb696c5e4f,DISK]]; indices=[0, 1, 2, 3, 4]}, LocatedStripedBlock{BP-260511027-172.30.253.91-1487788944154:blk_-9223372036852841856_5101380; getBlockSize()=1610612736; corrupt=false; offset=1610612736; locs=[DatanodeInfoWithStorage[172.30.253.2:50010,DS-f053781f-b2c4-41e9-8960-745b3fe8ef50,DISK], DatanodeInfoWithStorage[172.30.253.5:50010,DS-4efc46be-5769-4a2f-9cf6-736b3d56edaf,DISK], DatanodeInfoWithStorage[172.30.253.3:50010,DS-74b0796e-425d-4fa6-9309-247271f63f53,DISK], DatanodeInfoWithStorage[172.30.253.4:50010,DS-ddfc805a-9ed9-4493-921d-acc169787683,DISK], DatanodeInfoWithStorage[172.30.253.7:50010,DS-c3be97ce-660a-4c98-9f71-5c2f76236dc4,DISK]]; indices=[0, 1, 2, 3, 4]}, LocatedStripedBlock{BP-260511027-172.30.253.91-1487788944154:blk_-9223372036852841824_5101382; getBlockSize()=1610612736; corrupt=false; offset=3221225472; locs=[DatanodeInfoWithStorage[172.30.253.1:50010,DS-336c025e-f04b-475f-b051-d7a4d1b7669f,DISK], DatanodeInfoWithStorage[172.30.253.5:50010,DS-dab6afcd-bf22-4d1d-b878-d52ee0b5bcd9,DISK], DatanodeInfoWithStorage[172.30.253.7:50010,DS-16ade97a-978c-4a83-aae4-f25e861d63f5,DISK], DatanodeInfoWithStorage[172.30.253.2:50010,DS-176f2769-3236-4548-94df-74de95171cdd,DISK], DatanodeInfoWithStorage[172.30.253.3:50010,DS-2350ab83-f4bd-49f1-aa29-f8d4b5de5f78,DISK]]; indices=[0, 1, 2, 3, 4]}, LocatedStripedBlock{BP-260511027-172.30.253.91-1487788944154:blk_-9223372036852841792_5101384; getBlockSize()=168161792; corrupt=false; offset=4831838208; locs=[DatanodeInfoWithStorage[172.30.253.5:50010,DS-b63b7da0-20b7-4480-b80a-cb0491c4e17f,DISK], DatanodeInfoWithStorage[172.30.253.2:50010,DS-dcb3d66b-ee0f-4e4d-b5c8-611498227092,DISK], DatanodeInfoWithStorage[172.30.253.1:50010,DS-bc0b4749-6599-4691-98b6-35623ce8c08d,DISK], DatanodeInfoWithStorage[172.30.253.7:50010,DS-1029b9e5-abff-4c63-bb9f-7986d1729e03,DISK], DatanodeInfoWithStorage[172.30.253.4:50010,DS-6fa25607-f980-4a15-8592-d31ef51a48ba,DISK]]; indices=[0, 1, 2, 3, 4]}] lastLocatedBlock=LocatedStripedBlock{BP-260511027-172.30.253.91-1487788944154:blk_-9223372036852841792_5101384; getBlockSize()=168161792; corrupt=false; offset=4831838208; locs=[DatanodeInfoWithStorage[172.30.253.5:50010,DS-b63b7da0-20b7-4480-b80a-cb0491c4e17f,DISK], DatanodeInfoWithStorage[172.30.253.2:50010,DS-dcb3d66b-ee0f-4e4d-b5c8-611498227092,DISK], DatanodeInfoWithStorage[172.30.253.1:50010,DS-bc0b4749-6599-4691-98b6-35623ce8c08d,DISK], DatanodeInfoWithStorage[172.30.253.7:50010,DS-1029b9e5-abff-4c63-bb9f-7986d1729e03,DISK], DatanodeInfoWithStorage[172.30.253.4:50010,DS-6fa25607-f980-4a15-8592-d31ef51a48ba,DISK]]; indices=[0, 1, 2, 3, 4]} isLastBlockComplete=true} at org.apache.hadoop.hdfs.StripeReader.checkMissingBlocks(StripeReader.java:175) at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:366) at org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:315) at org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:388) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:813) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.examples.terasort.TeraInputFormat$TeraRecordReader.nextKeyValue(TeraInputFormat.java:257) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:562) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at
[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort
[ https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247101#comment-16247101 ] Lei (Eddy) Xu commented on HDFS-8198: - Hi, [~danielpol] Thanks a lot for reporting this. To help us better understand the problem, could you provide the following information: * The cluster size (number of datanodes) * The output of terasort? Thanks ! > Erasure Coding: system test of TeraSort > --- > > Key: HDFS-8198 > URL: https://issues.apache.org/jira/browse/HDFS-8198 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7285 >Reporter: Kai Sasaki > > Functional system test of TeraSort on EC files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort
[ https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238998#comment-16238998 ] Daniel Pol commented on HDFS-8198: -- Terasort doesn't seem to work on my system with EC in beta1. Here's a small script to reproduce the issue: sudo -u hdfs bin/hdfs dfs -rm -r -skipTrash /ectest sudo -u hdfs bin/hdfs dfs -mkdir /ectest #sudo -u hdfs bin/hdfs ec -setPolicy -path /ectest -policy RS-3-2-1024k sleep 5 sudo -u hdfs bin/yarn jar /ec/hadoop-3.0.0-beta1/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-beta1.jar teragen 1 /ectest/Input sleep 30 sudo -u hdfs bin/yarn jar /ec/hadoop-3.0.0-beta1/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-beta1.jar teravalidate /ectest/Input /ectest/Validate sleep 30 sudo -u hdfs bin/yarn jar /ec/hadoop-3.0.0-beta1/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-beta1.jar terasort /ectest/Input /ectest/Output It works fine like this (with the set EC policy commented out) but it fails when you uncomment the set policy line. Interestingly enough the it fails only at Terasort step when reading the input files, but Teravalidate that runs before it reads the same files and it doesn't fail. Fsck shows everything find and checking the nodes individually, all the files are there. I've tried all default codecs and policies (native and java), they all give me the same error. Missing blocks. Error shows up only when the amount of data becomes big enough, so make sure you use the number of records I have in my script or higher. > Erasure Coding: system test of TeraSort > --- > > Key: HDFS-8198 > URL: https://issues.apache.org/jira/browse/HDFS-8198 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7285 >Reporter: Kai Sasaki >Priority: Major > > Functional system test of TeraSort on EC files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort
[ https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641198#comment-14641198 ] Zhe Zhang commented on HDFS-8198: - Moving system test JIRAs as follow-ons. Erasure Coding: system test of TeraSort --- Key: HDFS-8198 URL: https://issues.apache.org/jira/browse/HDFS-8198 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7285 Reporter: Kai Sasaki Functional system test of TeraSort on EC files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort
[ https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568681#comment-14568681 ] Kai Zheng commented on HDFS-8198: - Good work here, and we can see significant performance overhead incurred. HDFS-8425 did some great profiling and analysis. I guess we could track the performance tune and investigation there. Erasure Coding: system test of TeraSort --- Key: HDFS-8198 URL: https://issues.apache.org/jira/browse/HDFS-8198 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7285 Reporter: Kai Sasaki Functional system test of TeraSort on EC files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort
[ https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568346#comment-14568346 ] Zhe Zhang commented on HDFS-8198: - Just had an offline discussion with [~hitliuyi]; Yi made a good point that we can make use of HTrace to generate fine grained traces to facilitate performance analysis and tuning. Erasure Coding: system test of TeraSort --- Key: HDFS-8198 URL: https://issues.apache.org/jira/browse/HDFS-8198 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7285 Reporter: Kai Sasaki Functional system test of TeraSort on EC files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort
[ https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564106#comment-14564106 ] Takuya Fukudome commented on HDFS-8198: --- I report the results I ran teragen and terasort on our test cluster. The number of rows, teragen parameter was set 100m(It wrote 10G byte data). Result _elapsed time_ || || non EC teragen || EC teragen || non EC terasort || EC terasort || || 1 | 1m2.486s | 3m3.966s | 2m56.277s | 6m45.136s | || 2 | 1m2.609s | 2m55.928s | 3m4.428s | 6m11.019s | || 3 | 1m8.516s | 2m51.004s | 2m58.427s | 6m3.055s | And I checked Total time spent by all maps/reduces in occupied slots(ms) _Maps_ || || non EC teragen || EC teragen || non EC terasort || EC terasort || || 1 | 103591 | 335320 | 628538 | 701388 | || 2 | 102937 | 322062 | 640839 | 719531 | || 3 | 113472 | 313274 | 631408 | 654707 | _Reduces_ || || non EC teargen || EC teragen || non EC terasort || EC terasort || || 1 | \- | \- | 14 | 383402 | || 2 | \- | \- | 162759 | 348135 | || 3 | \- | \- | 156585 | 340584 | About our test cluster || CPU |2CPU(Xeon E5-2660v2 2.2GHz) | || RAM |128GB | The number of Data Nodes: 39 Network bandwidth: 10Gbps Erasure Coding: system test of TeraSort --- Key: HDFS-8198 URL: https://issues.apache.org/jira/browse/HDFS-8198 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7285 Reporter: Kai Sasaki Functional system test of TeraSort on EC files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)