[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093007#comment-13093007 ] He Yongqiang commented on MAPREDUCE-2841: - we are also evaluating the approach of optimizing the existing Hadoop Java map side sort algorithms (like playing the same set of tricks used in this c++ impl: bucket sort, prefix key comparison, a better crc32 etc). The main problem we are interested is how big is the memory problem for the java impl. Also it will be very useful here to define an open benchmark. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-2841.v1.patch, dualpivot-0.patch, dualpivotv20-0.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093174#comment-13093174 ] He Yongqiang commented on MAPREDUCE-2841: - bq. The bucketed sort used from 0.10 to 0.16 had more internal fragmentation and a less predictable memory footprint (particularly for jobs with lots of reducers). If the java impl use the similar impl as the c++ one here, the only difference will be language. right? Sorry, can you explain more about how the c++ can do a better job here for predictable memory footprint? in the current java impl, all records (no matter which reducer it is going) are stored in a central byte array. In the c++ impl, on one mapper task, each reducer will have one corresponding partition bucket which maintains its own memory buffer. From what i understand, one partition bucket is for one reducer. and all records going to that reducer from the current maptask are stored there, will be sorted and spilled from there. From the sort part is that it save the number of comparison since the original sort will need to compared records from difference reducers. And the c++ impl has trick of doing prefix comparison which reduces the number of cpu ops (8 bytes compare - one long cmp op). bq. Subsequent implementations focused on reducing the number of spills for each task, because the cost of spilling dominated the cost of the sort.Even with a significant speedup in the sort step, avoiding a merge by managing memory more carefully usually effects faster task times. I totally agree the spill will be the dominate factor if it is there. So here comes the problem that how much more memory the java impl will need compared to the c++ one. 20% or 50% or 100%? so we can calculate the chance of avoidable spilling if using the c++ impl. (Note: based on our analysis on jobs running during the past one month, most jobs need to shuffle less than 700MB data per mapper.) Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-2841.v1.patch, dualpivot-0.patch, dualpivotv20-0.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093324#comment-13093324 ] He Yongqiang commented on MAPREDUCE-2841: - sorry, i am kind of confused. i may should make me more clear: we are trying to evaluate and compare the c++ impl in HCE (and also this jira) and doing a pure java re-impl. So the thing that we mostly cared about is that is there sth that the c++ impl can do and a java re-impl can not. And if there is, we need to find out how much is that difference. And from there we can have a better understand of each approach and decide which approach to go. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-2841.v1.patch, dualpivot-0.patch, dualpivotv20-0.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-1270) Hadoop C++ Extention
[ https://issues.apache.org/jira/browse/MAPREDUCE-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805457#action_12805457 ] He Yongqiang commented on MAPREDUCE-1270: - Hi Dong / Shouyan, Are you going to open source this? If yes, can you update the recent work? This can help others to better understand. Hadoop C++ Extention Key: MAPREDUCE-1270 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1270 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Affects Versions: 0.20.1 Environment: hadoop linux Reporter: Wang Shouyan Hadoop C++ extension is an internal project in baidu, We start it for these reasons: 1 To provide C++ API. We mostly use Streaming before, and we also try to use PIPES, but we do not find PIPES is more efficient than Streaming. So we think a new C++ extention is needed for us. 2 Even using PIPES or Streaming, it is hard to control memory of hadoop map/reduce Child JVM. 3 It costs so much to read/write/sort TB/PB data by Java. When using PIPES or Streaming, pipe or socket is not efficient to carry so huge data. What we want to do: 1 We do not use map/reduce Child JVM to do any data processing, which just prepares environment, starts C++ mapper, tells mapper which split it should deal with, and reads report from mapper until that finished. The mapper will read record, ivoke user defined map, to do partition, write spill, combine and merge into file.out. We think these operations can be done by C++ code. 2 Reducer is similar to mapper, it was started after sort finished, it read from sorted files, ivoke user difined reduce, and write to user defined record writer. 3 We also intend to rewrite shuffle and sort with C++, for efficience and memory control. at first, 1 and 2, then 3. What's the difference with PIPES: 1 Yes, We will reuse most PIPES code. 2 And, We should do it more completely, nothing changed in scheduling and management, but everything in execution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-765) eliminate the usage of FileSystem.create( ) depracated by Hadoop-5438
[ https://issues.apache.org/jira/browse/MAPREDUCE-765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated MAPREDUCE-765: --- Attachment: mapreduce-765-2009-07-18.patch Incorporates Nicholas's comments eliminate the usage of FileSystem.create( ) depracated by Hadoop-5438 -- Key: MAPREDUCE-765 URL: https://issues.apache.org/jira/browse/MAPREDUCE-765 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: He Yongqiang Priority: Minor Attachments: mapreduce-765-2009-07-15.patch, mapreduce-765-2009-07-18.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-765) eliminate the usage of FileSystem.create( ) depracated by Hadoop-5438
[ https://issues.apache.org/jira/browse/MAPREDUCE-765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated MAPREDUCE-765: --- Status: Patch Available (was: Open) eliminate the usage of FileSystem.create( ) depracated by Hadoop-5438 -- Key: MAPREDUCE-765 URL: https://issues.apache.org/jira/browse/MAPREDUCE-765 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: He Yongqiang Priority: Minor Attachments: mapreduce-765-2009-07-15.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-765) eliminate the usage of FileSystem.create( ) depracated by Hadoop-5438
eliminate the usage of FileSystem.create( ) depracated by Hadoop-5438 -- Key: MAPREDUCE-765 URL: https://issues.apache.org/jira/browse/MAPREDUCE-765 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: He Yongqiang Priority: Minor -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-765) eliminate the usage of FileSystem.create( ) depracated by Hadoop-5438
[ https://issues.apache.org/jira/browse/MAPREDUCE-765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated MAPREDUCE-765: --- Attachment: mapreduce-765-2009-07-15.patch eliminate the usage of FileSystem.create( ) depracated by Hadoop-5438 -- Key: MAPREDUCE-765 URL: https://issues.apache.org/jira/browse/MAPREDUCE-765 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: He Yongqiang Priority: Minor Attachments: mapreduce-765-2009-07-15.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.