[ 
https://issues.apache.org/jira/browse/MAHOUT-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997143#comment-13997143
 ] 

Dmitriy Lyubimov commented on MAHOUT-1490:
------------------------------------------

so h2o guys seem to use unsafe to slurp the data off memory arrays. Not sure 
what is the factor here, but in my tests on just 4 cpus i cannot saturate 
memory bandwidth for simple sums. I have to guess that the Unsafe factor seems 
to account for efficiency of reading 8-byte values (doubles, longs etc off an a 
super array), otherwise there seems to be not much incentive to do any 
compression until we have significantly larger number of cores. Given my 
hadrdware, it seems the bandwidth saturation is not going to start happening 
until i have at least 7-8 cores. 

I need to write up more tests to see if using Unsafe will throw off that 
estimate towards lower number of cores.

however, even if it does, Unsafe binds to a particular jvm flavor (and even 
perhaps version of it). which means there has to be a pure java fallback 
implementation. 

As for compression itself, i can't tell much due to lack of comments and formal 
methodology references in the code. in the public code i fail to find actually 
any notion of compression at all other than big-decimal kind of stuff.

> Data frame R-like bindings
> --------------------------
>
>                 Key: MAHOUT-1490
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1490
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Saikat Kanjilal
>            Assignee: Dmitriy Lyubimov
>             Fix For: 1.0
>
>   Original Estimate: 20h
>  Remaining Estimate: 20h
>
> Create Data frame R-like bindings for spark



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to