[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort

2017-11-28 Thread Daniel Pol (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269638#comment-16269638
 ] 

Daniel Pol commented on HDFS-8198:
--

[~eddyxu] What's considered "client-side" for Teragen ? For example when I run 
this from an external node "yarn jar hadoop-mapreduce-examples-3.0.0.jar 
teragen". Is the external node the "client-side" or all the YARN nodemanagers 
that execute the map tasks ? I'm asking because I don't see any cpu used on the 
external node while I'm running hundreds of Teragen threads. Thanks for the 
info on codec performance impact on the overall IO path. 

> Erasure Coding: system test of TeraSort
> ---
>
> Key: HDFS-8198
> URL: https://issues.apache.org/jira/browse/HDFS-8198
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7285
>Reporter: Kai Sasaki
> Attachments: ec.out, replication.out
>
>
> Functional system test of TeraSort on EC files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort

2017-11-28 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269608#comment-16269608
 ] 

Lei (Eddy) Xu commented on HDFS-8198:
-

>From our experience, the codec performance is negligible on the IO path, 
>comparing to network and disks overhead. 

We dont have existing metrics of the time spent on RS encoding / decoding, as 
these happen on client side most of the time, so it is not convenient  to 
expose that via JMX. If it is feasible to your environment, you can checkout 
{{RawErasureEncoder / Decoder}} and add some measurements there, re-compile it 
and replace JARs in your environment for today. 

[~xiaochen] is also looking into exposing these client-side metrics, I believe 
him can give a better answer here. 

> Erasure Coding: system test of TeraSort
> ---
>
> Key: HDFS-8198
> URL: https://issues.apache.org/jira/browse/HDFS-8198
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7285
>Reporter: Kai Sasaki
> Attachments: ec.out, replication.out
>
>
> Functional system test of TeraSort on EC files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort

2017-11-28 Thread Daniel Pol (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269487#comment-16269487
 ] 

Daniel Pol commented on HDFS-8198:
--

[~eddyxu]  Yes, I did change io.erasurecode.codec.rs.rawcoders . Wanted to see 
the performance delta between rs_native and rs_java. Yes, I'm using Terasort 
3TB for my tests. Where can I find the time spent in RS encoding/decoding ? I'm 
very interested in that type of info. 

> Erasure Coding: system test of TeraSort
> ---
>
> Key: HDFS-8198
> URL: https://issues.apache.org/jira/browse/HDFS-8198
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7285
>Reporter: Kai Sasaki
> Attachments: ec.out, replication.out
>
>
> Functional system test of TeraSort on EC files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort

2017-11-28 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269477#comment-16269477
 ] 

Lei (Eddy) Xu commented on HDFS-8198:
-

If you did not change {{io.erasurecode.codec.rs.rawcoders}}, HDFS will give you 
{{rs_native}} when ISA-L is available. 
You can change a {{io.erasurecode.codec.rs.rawcoders}} to let HDFS load rs_java 
instead. 

bq. Would be even better if there was a DEBUG message that shows exactly which 
coder is used for EC encoding.

This is a good suggestion, I will file a JIRA for this.

Performance wise, did you test against macro benchmarks like terasort? If so, 
it'd be nice to have the number about how much time was in RS encoding / 
decoding for native code to make a difference. 


> Erasure Coding: system test of TeraSort
> ---
>
> Key: HDFS-8198
> URL: https://issues.apache.org/jira/browse/HDFS-8198
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7285
>Reporter: Kai Sasaki
> Attachments: ec.out, replication.out
>
>
> Functional system test of TeraSort on EC files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort

2017-11-28 Thread Daniel Pol (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269434#comment-16269434
 ] 

Daniel Pol commented on HDFS-8198:
--

[~eddyxu] Key work being "should". checknative shows ISA-L is loaded but I 
don't see any performance difference between running with rs_native vs rs_java. 
So I'm wondering if it's really using the coder I specify. Even with DEBUG log 
level, the map output for Teragen shows both rs_native and rs_java is loaded, 
but no clear way to know which one is used. Even with the 
io.erasurecode.codec.rs.rawcoders set to one of these coders only, I still see 
the same debug message. Was wondering if there's some output with DEBUG or INFO 
log level that would give me a hint which coder is using. Would be even better 
if there was a DEBUG message that shows exactly which coder is used for EC 
encoding. 

> Erasure Coding: system test of TeraSort
> ---
>
> Key: HDFS-8198
> URL: https://issues.apache.org/jira/browse/HDFS-8198
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7285
>Reporter: Kai Sasaki
> Attachments: ec.out, replication.out
>
>
> Functional system test of TeraSort on EC files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort

2017-11-28 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269422#comment-16269422
 ] 

Lei (Eddy) Xu commented on HDFS-8198:
-

You can use {{hadoop checknative}} to see whether {{ISA-L}} is loaded. If it is 
available, HDFS should use rs_native. 

> Erasure Coding: system test of TeraSort
> ---
>
> Key: HDFS-8198
> URL: https://issues.apache.org/jira/browse/HDFS-8198
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7285
>Reporter: Kai Sasaki
> Attachments: ec.out, replication.out
>
>
> Functional system test of TeraSort on EC files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort

2017-11-28 Thread Daniel Pol (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269413#comment-16269413
 ] 

Daniel Pol commented on HDFS-8198:
--

[~eddyxu] I was referring to checking if it ended up using rs_native or rs_java 
for example. (ISA-L library or not)

> Erasure Coding: system test of TeraSort
> ---
>
> Key: HDFS-8198
> URL: https://issues.apache.org/jira/browse/HDFS-8198
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7285
>Reporter: Kai Sasaki
> Attachments: ec.out, replication.out
>
>
> Functional system test of TeraSort on EC files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort

2017-11-28 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269409#comment-16269409
 ] 

Lei (Eddy) Xu commented on HDFS-8198:
-

[~danielpol] You can use {{hdfs ec -getPolicy}} to see the ec policy used. 

> Erasure Coding: system test of TeraSort
> ---
>
> Key: HDFS-8198
> URL: https://issues.apache.org/jira/browse/HDFS-8198
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7285
>Reporter: Kai Sasaki
> Attachments: ec.out, replication.out
>
>
> Functional system test of TeraSort on EC files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort

2017-11-28 Thread Daniel Pol (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269347#comment-16269347
 ] 

Daniel Pol commented on HDFS-8198:
--

By the way, is there a way to check which coder was used for EC ? I know I can 
set the list to use, but can't find a clear way to see in the logs which coder 
was used. Want to check the speed of different coders and wanted to make sure 
I'm measuring correctly because I don't see a lot of delta in performance. 

> Erasure Coding: system test of TeraSort
> ---
>
> Key: HDFS-8198
> URL: https://issues.apache.org/jira/browse/HDFS-8198
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7285
>Reporter: Kai Sasaki
> Attachments: ec.out, replication.out
>
>
> Functional system test of TeraSort on EC files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort

2017-11-10 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248304#comment-16248304
 ] 

Xiao Chen commented on HDFS-8198:
-

Thanks Daniel for reporting the issue and details.
bq. I can't seem to find the proper way to upload
Probably due to jira permissions. I just added you to the HDFS contributor 
role, could you see the 'Attach Files' option now?

We will try to reproduce this in our cluster too.

> Erasure Coding: system test of TeraSort
> ---
>
> Key: HDFS-8198
> URL: https://issues.apache.org/jira/browse/HDFS-8198
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7285
>Reporter: Kai Sasaki
>
> Functional system test of TeraSort on EC files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort

2017-11-10 Thread Daniel Pol (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247558#comment-16247558
 ] 

Daniel Pol commented on HDFS-8198:
--

[~eddyxu] I have 7 datanodes. I'm new to the JIRA system and I can't seem to 
find the proper way to upload the terasort output file. Please let me know how 
I can do that. The relevant error from the terasort output is:
17/11/04 09:36:15 INFO mapreduce.Job: Task Id : 
attempt_1509761319113_0021_m_02_0, Status : FAILEDError: 
java.io.IOException: 3 missing blocks, the stripe is: Offset=77594624, 
length=1048576, fetchedChunksNum=1, missingChunksNum=3; locatedBlocks is: 
LocatedBlocks{  fileLength=50  underConstruction=false  
blocks=[LocatedStripedBlock{BP-260511027-172.30.253.91-1487788944154:blk_-9223372036852841888_5101378;
 getBlockSize()=1610612736; corrupt=false; offset=0; 
locs=[DatanodeInfoWithStorage[172.30.253.6:50010,DS-780df34f-44c3-4c67-b7dc-f901bc12a957,DISK],
 
DatanodeInfoWithStorage[172.30.253.5:50010,DS-c5e33c96-3df3-480b-80aa-fe97a3b8e3b4,DISK],
 
DatanodeInfoWithStorage[172.30.253.3:50010,DS-4cd5c037-9dcb-488c-81c2-0aa8ff1cbd2f,DISK],
 
DatanodeInfoWithStorage[172.30.253.4:50010,DS-6bac2c0f-f8c6-4a67-8801-f2a7a74279a6,DISK],
 
DatanodeInfoWithStorage[172.30.253.7:50010,DS-0ee9e606-db4b-4df6-b180-fedb696c5e4f,DISK]];
 indices=[0, 1, 2, 3, 4]}, 
LocatedStripedBlock{BP-260511027-172.30.253.91-1487788944154:blk_-9223372036852841856_5101380;
 getBlockSize()=1610612736; corrupt=false; offset=1610612736; 
locs=[DatanodeInfoWithStorage[172.30.253.2:50010,DS-f053781f-b2c4-41e9-8960-745b3fe8ef50,DISK],
 
DatanodeInfoWithStorage[172.30.253.5:50010,DS-4efc46be-5769-4a2f-9cf6-736b3d56edaf,DISK],
 
DatanodeInfoWithStorage[172.30.253.3:50010,DS-74b0796e-425d-4fa6-9309-247271f63f53,DISK],
 
DatanodeInfoWithStorage[172.30.253.4:50010,DS-ddfc805a-9ed9-4493-921d-acc169787683,DISK],
 
DatanodeInfoWithStorage[172.30.253.7:50010,DS-c3be97ce-660a-4c98-9f71-5c2f76236dc4,DISK]];
 indices=[0, 1, 2, 3, 4]}, 
LocatedStripedBlock{BP-260511027-172.30.253.91-1487788944154:blk_-9223372036852841824_5101382;
 getBlockSize()=1610612736; corrupt=false; offset=3221225472; 
locs=[DatanodeInfoWithStorage[172.30.253.1:50010,DS-336c025e-f04b-475f-b051-d7a4d1b7669f,DISK],
 
DatanodeInfoWithStorage[172.30.253.5:50010,DS-dab6afcd-bf22-4d1d-b878-d52ee0b5bcd9,DISK],
 
DatanodeInfoWithStorage[172.30.253.7:50010,DS-16ade97a-978c-4a83-aae4-f25e861d63f5,DISK],
 
DatanodeInfoWithStorage[172.30.253.2:50010,DS-176f2769-3236-4548-94df-74de95171cdd,DISK],
 
DatanodeInfoWithStorage[172.30.253.3:50010,DS-2350ab83-f4bd-49f1-aa29-f8d4b5de5f78,DISK]];
 indices=[0, 1, 2, 3, 4]}, 
LocatedStripedBlock{BP-260511027-172.30.253.91-1487788944154:blk_-9223372036852841792_5101384;
 getBlockSize()=168161792; corrupt=false; offset=4831838208; 
locs=[DatanodeInfoWithStorage[172.30.253.5:50010,DS-b63b7da0-20b7-4480-b80a-cb0491c4e17f,DISK],
 
DatanodeInfoWithStorage[172.30.253.2:50010,DS-dcb3d66b-ee0f-4e4d-b5c8-611498227092,DISK],
 
DatanodeInfoWithStorage[172.30.253.1:50010,DS-bc0b4749-6599-4691-98b6-35623ce8c08d,DISK],
 
DatanodeInfoWithStorage[172.30.253.7:50010,DS-1029b9e5-abff-4c63-bb9f-7986d1729e03,DISK],
 
DatanodeInfoWithStorage[172.30.253.4:50010,DS-6fa25607-f980-4a15-8592-d31ef51a48ba,DISK]];
 indices=[0, 1, 2, 3, 4]}]  
lastLocatedBlock=LocatedStripedBlock{BP-260511027-172.30.253.91-1487788944154:blk_-9223372036852841792_5101384;
 getBlockSize()=168161792; corrupt=false; offset=4831838208; 
locs=[DatanodeInfoWithStorage[172.30.253.5:50010,DS-b63b7da0-20b7-4480-b80a-cb0491c4e17f,DISK],
 
DatanodeInfoWithStorage[172.30.253.2:50010,DS-dcb3d66b-ee0f-4e4d-b5c8-611498227092,DISK],
 
DatanodeInfoWithStorage[172.30.253.1:50010,DS-bc0b4749-6599-4691-98b6-35623ce8c08d,DISK],
 
DatanodeInfoWithStorage[172.30.253.7:50010,DS-1029b9e5-abff-4c63-bb9f-7986d1729e03,DISK],
 
DatanodeInfoWithStorage[172.30.253.4:50010,DS-6fa25607-f980-4a15-8592-d31ef51a48ba,DISK]];
 indices=[0, 1, 2, 3, 4]}  isLastBlockComplete=true} at 
org.apache.hadoop.hdfs.StripeReader.checkMissingBlocks(StripeReader.java:175) 
at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:366) at 
org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:315)
 at 
org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:388)
 at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:813) at 
java.io.DataInputStream.read(DataInputStream.java:149) at 
org.apache.hadoop.examples.terasort.TeraInputFormat$TeraRecordReader.nextKeyValue(TeraInputFormat.java:257)
 at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:562)
 at 
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
 at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
 at 

[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort

2017-11-09 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247101#comment-16247101
 ] 

Lei (Eddy) Xu commented on HDFS-8198:
-

Hi, [~danielpol]

Thanks a lot for reporting this. To help us better understand the problem, 
could you provide the following information:

* The cluster size (number of datanodes)
* The output of terasort?

Thanks !

> Erasure Coding: system test of TeraSort
> ---
>
> Key: HDFS-8198
> URL: https://issues.apache.org/jira/browse/HDFS-8198
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7285
>Reporter: Kai Sasaki
>
> Functional system test of TeraSort on EC files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort

2017-11-04 Thread Daniel Pol (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238998#comment-16238998
 ] 

Daniel Pol commented on HDFS-8198:
--

Terasort doesn't seem to work on my system with EC in beta1. Here's a small 
script to reproduce the issue:

sudo -u hdfs bin/hdfs dfs -rm -r -skipTrash /ectest
sudo -u hdfs bin/hdfs dfs -mkdir /ectest
#sudo -u hdfs bin/hdfs ec -setPolicy -path /ectest -policy RS-3-2-1024k
sleep 5
sudo -u hdfs bin/yarn jar  
/ec/hadoop-3.0.0-beta1/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-beta1.jar
 teragen 1 /ectest/Input
sleep 30
sudo -u hdfs bin/yarn jar  
/ec/hadoop-3.0.0-beta1/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-beta1.jar
 teravalidate /ectest/Input /ectest/Validate
sleep 30
sudo -u hdfs bin/yarn jar  
/ec/hadoop-3.0.0-beta1/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-beta1.jar
 terasort /ectest/Input /ectest/Output

It works fine like this (with the set EC policy commented out) but it fails 
when you uncomment the set policy line. Interestingly enough the it fails only 
at Terasort step when reading the input files, but Teravalidate that runs 
before it reads the same files and it doesn't fail. Fsck shows everything find 
and checking the nodes individually, all the files are there. I've tried all 
default codecs and policies (native and java), they all give me the same error. 
Missing blocks. Error shows up only when the amount of data becomes big enough, 
so make sure you use the number of records I have in my script or higher.


> Erasure Coding: system test of TeraSort
> ---
>
> Key: HDFS-8198
> URL: https://issues.apache.org/jira/browse/HDFS-8198
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7285
>Reporter: Kai Sasaki
>Priority: Major
>
> Functional system test of TeraSort on EC files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort

2015-07-24 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641198#comment-14641198
 ] 

Zhe Zhang commented on HDFS-8198:
-

Moving system test JIRAs as follow-ons.

 Erasure Coding: system test of TeraSort
 ---

 Key: HDFS-8198
 URL: https://issues.apache.org/jira/browse/HDFS-8198
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7285
Reporter: Kai Sasaki

 Functional system test of TeraSort on EC files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort

2015-06-02 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568681#comment-14568681
 ] 

Kai Zheng commented on HDFS-8198:
-

Good work here, and we can see significant performance overhead incurred. 
HDFS-8425 did some great profiling and analysis. I guess we could track the 
performance tune and investigation there.

 Erasure Coding: system test of TeraSort
 ---

 Key: HDFS-8198
 URL: https://issues.apache.org/jira/browse/HDFS-8198
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7285
Reporter: Kai Sasaki

 Functional system test of TeraSort on EC files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort

2015-06-01 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568346#comment-14568346
 ] 

Zhe Zhang commented on HDFS-8198:
-

Just had an offline discussion with [~hitliuyi]; Yi made a good point that we 
can make use of HTrace to generate fine grained traces to facilitate 
performance analysis and tuning.

 Erasure Coding: system test of TeraSort
 ---

 Key: HDFS-8198
 URL: https://issues.apache.org/jira/browse/HDFS-8198
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7285
Reporter: Kai Sasaki

 Functional system test of TeraSort on EC files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8198) Erasure Coding: system test of TeraSort

2015-05-28 Thread Takuya Fukudome (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564106#comment-14564106
 ] 

Takuya Fukudome commented on HDFS-8198:
---

I report the results I ran teragen and terasort on our test cluster. The number 
of rows, teragen parameter was set 100m(It wrote 10G byte data).

Result
_elapsed time_
|| || non EC teragen || EC teragen || non EC terasort || EC terasort ||
|| 1 | 1m2.486s | 3m3.966s | 2m56.277s | 6m45.136s |
|| 2 | 1m2.609s | 2m55.928s | 3m4.428s | 6m11.019s |
|| 3 | 1m8.516s | 2m51.004s | 2m58.427s | 6m3.055s |

And I checked Total time spent by all maps/reduces in occupied slots(ms)
_Maps_
|| || non EC teragen || EC teragen || non EC terasort || EC terasort ||
|| 1 | 103591 | 335320 | 628538 | 701388 |
|| 2 | 102937 | 322062 | 640839 | 719531 |
|| 3 | 113472 | 313274 | 631408 | 654707 |
_Reduces_
|| || non EC teargen || EC teragen || non EC terasort || EC terasort ||
|| 1 | \- | \- | 14 | 383402 |
|| 2 | \- | \- | 162759 | 348135 |
|| 3 | \- | \- | 156585 | 340584 |

About our test cluster
|| CPU |2CPU(Xeon E5-2660v2 2.2GHz) |
|| RAM |128GB |
The number of Data Nodes: 39
Network bandwidth: 10Gbps

 Erasure Coding: system test of TeraSort
 ---

 Key: HDFS-8198
 URL: https://issues.apache.org/jira/browse/HDFS-8198
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7285
Reporter: Kai Sasaki

 Functional system test of TeraSort on EC files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)