Hi All, Do we have some statistics about current bottlenecks which part is taking more time??
*@Ravindra* Please correct me If I am wrong I think our current unsafe sort is also in-place it is only swapping the offsets not data. Only from comparison it is getting the data from off-heap to on-heap. Our current *GC* bottleneck is in writing step, there we are keeping the data for some time. There also we can store the data in off-heap and we can go for unsafe sort while sorting the column level data. It's better to get the current bottlenecks and proceed further. -Regards Kumar Vishal On Mon, May 22, 2017 at 4:37 PM, Jacky Li <jacky.li...@qq.com> wrote: > Yes, and if after dictionary encoding, SORT_COLUMNS can fit in 6 bytes, > our approach can be even better, because the 8 bytes data can be put in > cache totally, without the remaining portion in memory. > > Regards, > Jacky > > > 在 2017年5月22日,下午5:23,Ravindra Pesala <ravi.pes...@gmail.com> 写道: > > > > Hi, > > > > I think you are referring to tungsten sort, there they tried keep pointer > > and key together to simulate cache aware computation. It is only possible > > if the sort keys are always starts with fixed keys like dictionary keys. > So > > basically first encountered few dictionary columns can be kept along with > > pointer and starts sorting, if that is equal then we can go and retrieve > > remaining key and compare it. > > It is simple to implement in our current design as our current > > implementation of unsafe sort is also inspired from tungsten sort. > > > > Regards, > > Ravindra. > > > > On 22 May 2017 at 09:31, Jacky Li <jacky.li...@qq.com> wrote: > > > >> For sorting, I think more optimization we can do, I am currently > thinking > >> these: > >> 1. Do not sort the whole TablePage, only KeyPage is required as the sort > >> key > >> > >> 2. Should find a more memory efficient sorting algorithm than > >> System.arraycopy which requires doubling space. > >> > >> 3. Should try to hold the KeyPage as well as the RowId in a compact data > >> structure, it is best if it fits in CPU cache. Modern L3 CPU is larger > than > >> 8MB. For this, I am thinking to have a 8 bytes encoded format that > includes > >> RowID and SORT_COLUMNS (partial or full), for example, 2 bytes for > RowId, > >> remaining 6 bytes for 2 to 3 columns after dictionary encoding. > >> a) If we can hold the RowID + whole SORT_COLUMNS in 8 bytes, it will > >> be most efficient to leverage CPU cache to do sorting, use in-place > update > >> approach while sorting. So no extra storage is needed, and the RowID + > >> whole SORT_COLUMNS will be sorted. > >> b) If we can only hold the RowID + partial SORT_COLUMNS in 8 bytes, > >> we can employ strategy like the sorting in Spark Tungsten project. > (first > >> compare the 8 bytes in cache, if it equals then compare remaining > bytes in > >> memory) > >> > >> Regards, > >> Jacky > >> > >>> 在 2017年5月22日,上午10:19,David CaiQiang <david.c...@gmail.com> 写道: > >>> > >>> As I known, System.arrayCopy of object array is a shallow copy, so I > >> think > >>> both KeyPage and TablePage maybe have the same performance on > >> Arrays.sort. > >>> > >>> > >>> ----- > >>> Best Regards > >>> David Cai > >>> -- > >>> View this message in context: http://apache-carbondata-dev-m > >> ailing-list-archive.1130556.n5.nabble.com/DISCUSS-Data-loadi > >> ng-improvement-tp11429p13056.html > >>> Sent from the Apache CarbonData Dev Mailing List archive mailing list > >> archive at Nabble.com. > >> > >> > > > > > > -- > > Thanks & Regards, > > Ravi > > > >