[
https://issues.apache.org/jira/browse/MAHOUT-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997143#comment-13997143
]
Dmitriy Lyubimov commented on MAHOUT-1490:
------------------------------------------
so h2o guys seem to use unsafe to slurp the data off memory arrays. Not sure
what is the factor here, but in my tests on just 4 cpus i cannot saturate
memory bandwidth for simple sums. I have to guess that the Unsafe factor seems
to account for efficiency of reading 8-byte values (doubles, longs etc off an a
super array), otherwise there seems to be not much incentive to do any
compression until we have significantly larger number of cores. Given my
hadrdware, it seems the bandwidth saturation is not going to start happening
until i have at least 7-8 cores.
I need to write up more tests to see if using Unsafe will throw off that
estimate towards lower number of cores.
however, even if it does, Unsafe binds to a particular jvm flavor (and even
perhaps version of it). which means there has to be a pure java fallback
implementation.
As for compression itself, i can't tell much due to lack of comments and formal
methodology references in the code. in the public code i fail to find actually
any notion of compression at all other than big-decimal kind of stuff.
> Data frame R-like bindings
> --------------------------
>
> Key: MAHOUT-1490
> URL: https://issues.apache.org/jira/browse/MAHOUT-1490
> Project: Mahout
> Issue Type: New Feature
> Reporter: Saikat Kanjilal
> Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
> Original Estimate: 20h
> Remaining Estimate: 20h
>
> Create Data frame R-like bindings for spark
--
This message was sent by Atlassian JIRA
(v6.2#6252)