[jira] [Commented] (MAPREDUCE-2841) Task level native optimization

2012-02-02 Thread Binglin Chang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199498#comment-13199498
 ] 

Binglin Chang commented on MAPREDUCE-2841:
--

@Yongqiang
It seams BlockMapOutputBuffer only support BytesWritable currently? So 
Wordcount & Terasort can't run directly .  From the test result, Map Avg time 
and sort time, I would say sort take about 40-50% time of whole map task, 
because sort is CPU intensive, the CPU time should be more, about 50%-60% 
maybe. 
If my compiler assumption is right, total speedup for Wordcount mapper should 
be 10x, sort speedup should be 10x-12x, and the rest(reader, mapper, merge, 
spill combined) should be (10-0.5*12)/(1-0.5)=8. 
I must say using quicksort to sort small buffers fit into cache then merge them 
is a good idea, I should make this optimization too. 
NativeTask currently use single thread currently, but I think all partition 
based collector, including BlockMapOutputCollector) can take advantage of 
parallel sort & spill I mentioned in the design doc, this needs code changes to 
other part(TaskTracker,IndexCache maybe), and change map output file to a 
directory.



> Task level native optimization
> --
>
> Key: MAPREDUCE-2841
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
> Environment: x86-64 Linux
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, 
> MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch
>
>
> I'm recently working on native optimization for MapTask based on JNI. 
> The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs 
> emitted by mapper, therefore sort, spill, IFile serialization can all be done 
> in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising 
> results:
> 1. Sort is about 3x-10x as fast as java(only binary string compare is 
> supported)
> 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware 
> CRC32C is used, things can get much faster(1G/s).
> 3. Merge code is not completed yet, so the test use enough io.sort.mb to 
> prevent mid-spill
> This leads to a total speed up of 2x~3x for the whole MapTask, if 
> IdentityMapper(mapper does nothing) is used.
> There are limitations of course, currently only Text and BytesWritable is 
> supported, and I have not think through many things right now, such as how to 
> support map side combine. I had some discussion with somebody familiar with 
> hive, it seems that these limitations won't be much problem for Hive to 
> benefit from those optimizations, at least. Advices or discussions about 
> improving compatibility are most welcome:) 
> Currently NativeMapOutputCollector has a static method called canEnable(), 
> which checks if key/value type, comparator type, combiner are all compatible, 
> then MapTask can choose to enable NativeMapOutputCollector.
> This is only a preliminary test, more work need to be done. I expect better 
> final results, and I believe similar optimization can be adopt to reduce task 
> and shuffle too. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3397) Support no sort dataflow in map output and reduce merge phrase

2012-01-11 Thread Binglin Chang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184719#comment-13184719
 ] 

Binglin Chang commented on MAPREDUCE-3397:
--

This is already fully functional patch. I have done some simple tests both 
using localRunner and on a real 17 node cluster. To disable sort, set jobconf 
mapred.map.output.sort=false. Then the reduce phrase will start before all Map 
Task have done, and reducer can not expect input k/v pairs to be grouped by key 
& sorted.



> Support no sort dataflow in map output and reduce merge phrase
> --
>
> Key: MAPREDUCE-3397
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3397
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: task
>Affects Versions: 0.20.205.0
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: MAPREDUCE-3397-nosort.v1.patch
>
>
> In our experience, many data aggregation style queries/jobs don't need to 
> sort the intermediate data. In fact reducer side can use hashmap or even 
> array to do application level aggregations. For example, consider computing 
> CTR using display log & click log in sponsored search. Map side just emit 
> (adv_id, clk_cnt, dis_cnt), reduce side aggregate clk_cnt and dis_cnt for 
> every adv_id, cause adv_id is integer, we can partition adv_id by range:
> ** reduce0: 0-10
> ** reduce1: 10-20
> ** ...
> ** reduceM: xxx-max adv-id
> Then the reducer can use an array(for example: int [100][2]) to store the 
> aggregated clk_cnt & dis_cnt, and we don't need the framework to sort 
> intermediate data anymore.
> By supporting no sort, we can gain a lot of performance improvements:
> # Eliminate map side sort & merge. 
>   KV paris need to sort by partition first, but this can be done using a 
> liner time counting sort, which is much faster than quick sort.
>   Just merge spill segments one by one, doesn't need to use heap merge.
> # Eliminate shuffle phrase barrier, reducer can start to processing data 
> before all map output data are copied & merged.
> For most cases, memory won't be a problem, cause keys are divided to many 
> partitions, each reducers only process a small subset of the global key set. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2841) Task level native optimization

2011-12-14 Thread Binglin Chang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169965#comment-13169965
 ] 

Binglin Chang commented on MAPREDUCE-2841:
--

bq. Dual-pivot quicksort causes the exact same number of comparisons as 
ordinary quicksort. However, it should have fewer swaps (0.80 times as many). 
Unfortunately, the main overhead of current sort in Hadoop comes from 
comparison, the swaps just swap two integer index, I thinks that's why 
Dual-pivot quicksort don't show any improvements. As Todd's results in 
MAPREDUCE-3235 and I experienced in this issue, the main overhead for the 
current Hadoop sort implementation is cache miss, nearly all comparison 
operations cause 2 random memory access in a huge memory area(typically X00MB). 
So it's not language differentes, just implementation differentes, to get 
better performance, we can:
Add index in MAPREDUCE-3235, or use partition bucket based sort.

I port DualPivotQuickSort java code to C++ and tested it on my intel i5 
macbookpro, with terasort 10bytes key type and word key type in 
RandomTextWriter.

TeraSort input data 50MB 50 key/value pair
11/12/15 12:18:16 INFO qsort time: 0.23108s
11/12/15 12:18:16 INFO std::sort time: 0.18266s
11/12/15 12:18:17 INFO DualPivotQuicksort time: 0.17167s

About 6% faster, I think sorting an array of ints can get much better results, 
cause compare two inplace ints is much faster than campare two indexed binary 
string.

Some updates about my work, I almost finished whole native mapTask, and part of 
native reduce task.
As for native MapTask with C++ RecordReader, Mapper, Partitioner, 
MapOutputCollector, a native MapTask now can process 250MB(47MB compressed) 
terasort input data in just 1.6s, comparing this with the earlier test 
results(14s for java, 7s for java with NativeMapOutputCollector), it is a huge 
speed up, and it can be further optimized.



> Task level native optimization
> --
>
> Key: MAPREDUCE-2841
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
> Environment: x86-64 Linux
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, 
> dualpivot-0.patch, dualpivotv20-0.patch
>
>
> I'm recently working on native optimization for MapTask based on JNI. 
> The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs 
> emitted by mapper, therefore sort, spill, IFile serialization can all be done 
> in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising 
> results:
> 1. Sort is about 3x-10x as fast as java(only binary string compare is 
> supported)
> 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware 
> CRC32C is used, things can get much faster(1G/s).
> 3. Merge code is not completed yet, so the test use enough io.sort.mb to 
> prevent mid-spill
> This leads to a total speed up of 2x~3x for the whole MapTask, if 
> IdentityMapper(mapper does nothing) is used.
> There are limitations of course, currently only Text and BytesWritable is 
> supported, and I have not think through many things right now, such as how to 
> support map side combine. I had some discussion with somebody familiar with 
> hive, it seems that these limitations won't be much problem for Hive to 
> benefit from those optimizations, at least. Advices or discussions about 
> improving compatibility are most welcome:) 
> Currently NativeMapOutputCollector has a static method called canEnable(), 
> which checks if key/value type, comparator type, combiner are all compatible, 
> then MapTask can choose to enable NativeMapOutputCollector.
> This is only a preliminary test, more work need to be done. I expect better 
> final results, and I believe similar optimization can be adopt to reduce task 
> and shuffle too. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3397) Support no sort dataflow in map output and reduce merge phrase

2011-11-14 Thread Binglin Chang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150208#comment-13150208
 ] 

Binglin Chang commented on MAPREDUCE-3397:
--

I think why no sort make sense is that, in many cases application has a more 
efficient way to process data(such as do aggregation on the fly), they don't 
want the framework to do some sort of heavy weighted data preprocessing, cause 
they have better prior knowledge/understanding about the data and the goal.


> Support no sort dataflow in map output and reduce merge phrase
> --
>
> Key: MAPREDUCE-3397
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3397
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: task
>Affects Versions: 0.20.205.0
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: MAPREDUCE-3397-nosort.v1.patch
>
>
> In our experience, many data aggregation style queries/jobs don't need to 
> sort the intermediate data. In fact reducer side can use hashmap or even 
> array to do application level aggregations. For example, consider computing 
> CTR using display log & click log in sponsored search. Map side just emit 
> (adv_id, clk_cnt, dis_cnt), reduce side aggregate clk_cnt and dis_cnt for 
> every adv_id, cause adv_id is integer, we can partition adv_id by range:
> ** reduce0: 0-10
> ** reduce1: 10-20
> ** ...
> ** reduceM: xxx-max adv-id
> Then the reducer can use an array(for example: int [100][2]) to store the 
> aggregated clk_cnt & dis_cnt, and we don't need the framework to sort 
> intermediate data anymore.
> By supporting no sort, we can gain a lot of performance improvements:
> # Eliminate map side sort & merge. 
>   KV paris need to sort by partition first, but this can be done using a 
> liner time counting sort, which is much faster than quick sort.
>   Just merge spill segments one by one, doesn't need to use heap merge.
> # Eliminate shuffle phrase barrier, reducer can start to processing data 
> before all map output data are copied & merged.
> For most cases, memory won't be a problem, cause keys are divided to many 
> partitions, each reducers only process a small subset of the global key set. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3397) Support no sort dataflow in map output and reduce merge phrase

2011-11-14 Thread Binglin Chang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150201#comment-13150201
 ] 

Binglin Chang commented on MAPREDUCE-3397:
--

No, grouping is not the same as no sort:
# Grouping still needs shuffle phrase barrier;
# In grouping kv pairs of the same key are grouped together, but in no sort kv 
pairs of the same key may not grouped together, framework only promise they are 
in the same partition(reduce).



> Support no sort dataflow in map output and reduce merge phrase
> --
>
> Key: MAPREDUCE-3397
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3397
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: task
>Affects Versions: 0.20.205.0
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: MAPREDUCE-3397-nosort.v1.patch
>
>
> In our experience, many data aggregation style queries/jobs don't need to 
> sort the intermediate data. In fact reducer side can use hashmap or even 
> array to do application level aggregations. For example, consider computing 
> CTR using display log & click log in sponsored search. Map side just emit 
> (adv_id, clk_cnt, dis_cnt), reduce side aggregate clk_cnt and dis_cnt for 
> every adv_id, cause adv_id is integer, we can partition adv_id by range:
> ** reduce0: 0-10
> ** reduce1: 10-20
> ** ...
> ** reduceM: xxx-max adv-id
> Then the reducer can use an array(for example: int [100][2]) to store the 
> aggregated clk_cnt & dis_cnt, and we don't need the framework to sort 
> intermediate data anymore.
> By supporting no sort, we can gain a lot of performance improvements:
> # Eliminate map side sort & merge. 
>   KV paris need to sort by partition first, but this can be done using a 
> liner time counting sort, which is much faster than quick sort.
>   Just merge spill segments one by one, doesn't need to use heap merge.
> # Eliminate shuffle phrase barrier, reducer can start to processing data 
> before all map output data are copied & merged.
> For most cases, memory won't be a problem, cause keys are divided to many 
> partitions, each reducers only process a small subset of the global key set. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3246) Make Task extensible to support modifications of Task or even alternate programming paradigms

2011-11-01 Thread Binglin Chang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141039#comment-13141039
 ] 

Binglin Chang commented on MAPREDUCE-3246:
--

The extra warnings are deprecation warnings and I can't make them disappear.

org/apache/hadoop/mapred/MapTask.java:[109,36] [deprecation] 
org.apache.hadoop.mapred.JobConf in org.apache.hadoop.mapred has been deprecated



> Make Task extensible to support modifications of Task or even alternate 
> programming paradigms
> -
>
> Key: MAPREDUCE-3246
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3246
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 0.23.0
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: MAPREDUCE-3246-extensible-task.patch, 
> MAPREDUCE-3246-extensible-task.v2.patch, 
> MAPREDUCE-3246-extensible-task.v3.patch
>
>
> One of MRv2's goal is to support alternate programming paradigms, but 
> building a application using YARN from the bottom is not trivial. In fact 
> most component of MapReduce can be reused, mostly the scheduler/master side, 
> and we can make changes/extensions only on the task/slave side, such as 
> native tasks, hash-aggregation style combiner/reducer interfaces.
> The first thing to do I think is to make task/slave side extensible, more 
> specific, the Task in JvmTask should serialized with class name, not simply a 
> boolean isMap, and make task class name configurable in JobConf, there maybe 
> other minor changes. By doing so, developers can at least extends their own 
> MapTask/ReduceTask.
> I just post my initial thoughts here for opinions. If this change is OK, I 
> can submit a patch, this is just a trivial work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3246) Make Task extensible to support modifications of Task or even alternate programming paradigms

2011-10-31 Thread Binglin Chang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140982#comment-13140982
 ] 

Binglin Chang commented on MAPREDUCE-3246:
--

Some code just committed(LocalJobRunner) break the build. Make some change to 
make it comptable with LocalJobRunner


> Make Task extensible to support modifications of Task or even alternate 
> programming paradigms
> -
>
> Key: MAPREDUCE-3246
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3246
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 0.23.0
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: MAPREDUCE-3246-extensible-task.patch, 
> MAPREDUCE-3246-extensible-task.v2.patch, 
> MAPREDUCE-3246-extensible-task.v3.patch
>
>
> One of MRv2's goal is to support alternate programming paradigms, but 
> building a application using YARN from the bottom is not trivial. In fact 
> most component of MapReduce can be reused, mostly the scheduler/master side, 
> and we can make changes/extensions only on the task/slave side, such as 
> native tasks, hash-aggregation style combiner/reducer interfaces.
> The first thing to do I think is to make task/slave side extensible, more 
> specific, the Task in JvmTask should serialized with class name, not simply a 
> boolean isMap, and make task class name configurable in JobConf, there maybe 
> other minor changes. By doing so, developers can at least extends their own 
> MapTask/ReduceTask.
> I just post my initial thoughts here for opinions. If this change is OK, I 
> can submit a patch, this is just a trivial work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-1639) Grouping using hashing instead of sorting

2011-10-22 Thread Binglin Chang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133364#comment-13133364
 ] 

Binglin Chang commented on MAPREDUCE-1639:
--

I created MAPREDUCE-3247 for the discussion about hashtable based 
join/aggregation.


> Grouping using hashing instead of sorting
> -
>
> Key: MAPREDUCE-1639
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1639
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Joydeep Sen Sarma
>
> most applications of map-reduce care about grouping and not sorting. Sorting 
> is a (relatively expensive) way to achieve grouping. In order to achieve just 
> grouping - one can:
> - replace the sort on the Mappers with a HashTable - and maintain lists of 
> key-values against each hash-bucket.
> - key-value tuples inside each hash bucket are sorted - before spilling or 
> sending to Reducer. Anytime this is done - Combiner can be invoked.
> - HashTable is serialized by hash-bucketid. So merges (of either spills or 
> Map Outputs) works similar to today (at least there's no change in overall 
> compute complexity of merge)
> Of course this hashtable has nothing to do with partitioning. it's just a 
> replacement for map-side sort.
> --
> this is (pretty much) straight from the MARS project paper: 
> http://www.cse.ust.hk/catalac/papers/mars_pact08.pdf. They report a 45% 
> speedup in inverted index calculation using hashing instead of sorting 
> (reference implementation is NOT against Hadoop though).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-1639) Grouping using hashing instead of sorting

2011-10-22 Thread Binglin Chang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133329#comment-13133329
 ] 

Binglin Chang commented on MAPREDUCE-1639:
--

@YongQiang 
Thanks! 
Originally, I plan to support grouping in nativetask, but after optimizing 
sort, I found the sort phrase no longer a bottleneck anymore, especially in 
large jobs with many reduce tasks. In fact the current sort implementation is 
not optimized at all(just use stl::sort), so replacing sort with grouping may 
not have a big effect after sort is optimized. Above all, I prefer hash 
aggregation, the bad thing about it is it must change combiner/reducer api, 
this won't be a problem for the under dev nativetask, but is much complicated 
in java, I will create a issue for discussion.
I prefer sent to me offline, can't wait to see :)


> Grouping using hashing instead of sorting
> -
>
> Key: MAPREDUCE-1639
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1639
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Joydeep Sen Sarma
>
> most applications of map-reduce care about grouping and not sorting. Sorting 
> is a (relatively expensive) way to achieve grouping. In order to achieve just 
> grouping - one can:
> - replace the sort on the Mappers with a HashTable - and maintain lists of 
> key-values against each hash-bucket.
> - key-value tuples inside each hash bucket are sorted - before spilling or 
> sending to Reducer. Anytime this is done - Combiner can be invoked.
> - HashTable is serialized by hash-bucketid. So merges (of either spills or 
> Map Outputs) works similar to today (at least there's no change in overall 
> compute complexity of merge)
> Of course this hashtable has nothing to do with partitioning. it's just a 
> replacement for map-side sort.
> --
> this is (pretty much) straight from the MARS project paper: 
> http://www.cse.ust.hk/catalac/papers/mars_pact08.pdf. They report a 45% 
> speedup in inverted index calculation using hashing instead of sorting 
> (reference implementation is NOT against Hadoop though).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3235) Improve CPU cache behavior in map side sort

2011-10-21 Thread Binglin Chang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132531#comment-13132531
 ] 

Binglin Chang commented on MAPREDUCE-3235:
--

Very cool, again.
bq. On a terasort benchmark, these optimizations plus an optimization to 
WritableComparator.compareBytes dropped the aggregate mapside CPU millis by 40%
Would you please give the test results about run 
WritableComparator.compareBytes optimization or key index optimizations alone? 
This would be useful. As I experienced in MAPREDUCE-2841, cache miss matters 
most.

> Improve CPU cache behavior in map side sort
> ---
>
> Key: MAPREDUCE-3235
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3235
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: performance, task
>Affects Versions: 0.23.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: mr-3235-poc.txt
>
>
> When running oprofile on a terasort workload, I noticed that a large amount 
> of CPU usage was going to MapTask$MapOutputBuffer.compare. Upon disassembling 
> this and looking at cycle counters, most of the cycles were going to memory 
> loads dereferencing into the array of key-value data -- implying expensive 
> cache misses. This can be avoided as follows:
> - rather than simply swapping indexes into the kv array, swap the entire meta 
> entries in the meta array. Swapping 16 bytes is only negligibly slower than 
> swapping 4 bytes. This requires adding the value-length into the meta array, 
> since we used to rely on the previous-in-the-array meta entry to determine 
> this. So we replace INDEX with VALUELEN and avoid one layer of indirection.
> - introduce an interface which allows key types to provide a 4-byte 
> comparison proxy. For string keys, this can simply be the first 4 bytes of 
> the string. The idea is that, if stringCompare(key1.proxy(), key2.proxy()) != 
> 0, then compare(key1, key2) should have the same result. If the proxies are 
> equal, the normal comparison method is used. We then include the 4-byte proxy 
> as part of the metadata entry, so that for many cases the indirection into 
> the data buffer can be avoided.
> On a terasort benchmark, these optimizations plus an optimization to 
> WritableComparator.compareBytes dropped the aggregate mapside CPU millis by 
> 40%, and the compare() routine mostly dropped off the oprofile results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-1639) Grouping using hashing instead of sorting

2011-10-20 Thread Binglin Chang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132350#comment-13132350
 ] 

Binglin Chang commented on MAPREDUCE-1639:
--

bq. we are trying to experiment on a new output collector. Here are some of our 
thoughts:

Nice work, I'm very interested in how it is done exactly:) 
Actually I'm considering further optimizations, not just grouping, but to do 
"foldl" style aggregation operations directly in HashTable liked data structure 
at map output collector stage and reducer side.
It seems hyracks already to that, and google mentioned this in the paper 
"Tenzing A SQL Implementation On The MapReduce Framework".



> Grouping using hashing instead of sorting
> -
>
> Key: MAPREDUCE-1639
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1639
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Joydeep Sen Sarma
>
> most applications of map-reduce care about grouping and not sorting. Sorting 
> is a (relatively expensive) way to achieve grouping. In order to achieve just 
> grouping - one can:
> - replace the sort on the Mappers with a HashTable - and maintain lists of 
> key-values against each hash-bucket.
> - key-value tuples inside each hash bucket are sorted - before spilling or 
> sending to Reducer. Anytime this is done - Combiner can be invoked.
> - HashTable is serialized by hash-bucketid. So merges (of either spills or 
> Map Outputs) works similar to today (at least there's no change in overall 
> compute complexity of merge)
> Of course this hashtable has nothing to do with partitioning. it's just a 
> replacement for map-side sort.
> --
> this is (pretty much) straight from the MARS project paper: 
> http://www.cse.ust.hk/catalac/papers/mars_pact08.pdf. They report a 45% 
> speedup in inverted index calculation using hashing instead of sorting 
> (reference implementation is NOT against Hadoop though).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3086) Supporting range scan using TFile, TotalOrderPartitioner and partition index

2011-10-12 Thread Binglin Chang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125673#comment-13125673
 ] 

Binglin Chang commented on MAPREDUCE-3086:
--

It seems that most test cases for o.a.h.mapred are not integrated into 
hadoop-mapreduce-client-core, and not activated yet. If I want to make this 
patch against trunk and enable unit test, where do I put the source files to?


> Supporting range scan using TFile, TotalOrderPartitioner and partition index
> 
>
> Key: MAPREDUCE-3086
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3086
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.20.205.0
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: MAPREDUCE-3086.v1.patch
>
>
> Hive/HBase already has similar or more powerful functionality, but using 
> hive/hbase is overkill or inconvenient for some cases, so add some 
> lightweight utility classes to only support range scan should be reasonable. 
> The utility classes include:
> # InputFormat supporting range scan: Indexed(Text|Binary)InputFormat
>   The input directory for IndexInputFormat should contain one partition index 
> and many tfiles, each tfile store a certain range of keys, not overlapping 
> with other tfiles, the boundaries are stored in partition index.
>   Add 4 jobconfs: mapred.indexed(text|binary)inputformat.key.(start|end), 
> indicate range scan parameters. 
>   For a mapreduce job using IndexedInputFormat, IndexedInputFormat.getSplits 
> filter out tfiles which are not in the scan range using partition index
>   IndexedInputFormat do not support multi directory & splitting in single 
> file, these can be added in future.
> # Tool to convert data of other format into IndexedInputForamt: 
> TotalOrderIndexBuilder
>   If the input data is already total order partitioned and is tfile format, 
> just add partition index to input directory
>   Or run InputSampler to generate partiton index, then run mapreduce job with 
> TotalOrder partitioner to generate tfile backed data, finally move partition 
> index to output directory. 
> # Client tool to scan/search indexed data directory

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira