[ https://issues.apache.org/jira/browse/MAHOUT-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003903#comment-14003903 ]
Anand Avati commented on MAHOUT-1490: ------------------------------------- [~dlyubimov], Compression does not make it read-only, certainly not read-only like Spark's RDD. Data in a Frame is mutable. Depending on the type of update either the update is cheap (if a new value can replace old value in-place) or expensive (inflate, update1, update2, update3 .. deflate) but in any case happens transparently behind the scene. User just calls set(). However, for the DSL backend I intend to _not_ mutate Frames and treat them read-only to be compatible with the Spark RDD model (even though it might not be the most efficient in certain cases in terms of performance). Speed to access data is constant time for dense compressed data with negligible decompression overhead (one multiplaction and one addition instruction with operands in registers). The chunk header knows the scale-down factor of compression, so it is a deterministic offset lookup to fetch the compressed value as well. For sparse data however the worst case is a binary search to find the physical offset within a Chunk, though there are optimizations to make further accesses in the same vicinity to happen in constant time. > Data frame R-like bindings > -------------------------- > > Key: MAHOUT-1490 > URL: https://issues.apache.org/jira/browse/MAHOUT-1490 > Project: Mahout > Issue Type: New Feature > Reporter: Saikat Kanjilal > Assignee: Dmitriy Lyubimov > Fix For: 1.0 > > Original Estimate: 20h > Remaining Estimate: 20h > > Create Data frame R-like bindings for spark -- This message was sent by Atlassian JIRA (v6.2#6252)