[ 
https://issues.apache.org/jira/browse/MAHOUT-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000080#comment-14000080
 ] 

Dmitriy Lyubimov commented on MAHOUT-1490:
------------------------------------------

Cool, thanks for doing this. 

Here is what i've been thinking. 

let's try and bring columnar vector representations(DataFrameVectorLike trait), 
which at this point will just extend Iterable[T], where T can be one of Long, 
Double, Int, String or Byte[]. 

Let's start with simple column represenations, non-variable length, e.g. 
LongDataFrameVector. While doing so, let's try to engage Unsafe class as in 
H20. (actually i talked to various people and they are telling me that 
surprisingly many jvm vendors actually adhere to Unsafe api). Obviously such 
vector must not use actual objects (such as of Double type) but rather use 
Unsafe to pick values from/to backing byte array. 

At this point we will ignore any compression (but we may want to start using 
some ideas along the VLQ and perhaps prefix tries for unordered collections).

Let's also assume that vector length is constant for now (i.e. we can read and 
update random element but we can't change the number of elements in it).



> Data frame R-like bindings
> --------------------------
>
>                 Key: MAHOUT-1490
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1490
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Saikat Kanjilal
>            Assignee: Dmitriy Lyubimov
>             Fix For: 1.0
>
>   Original Estimate: 20h
>  Remaining Estimate: 20h
>
> Create Data frame R-like bindings for spark



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to