[jira] [Commented] (ARROW-725) [Format] Constant length list type

2017-04-10 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963154#comment-15963154
 ] 

Wes McKinney commented on ARROW-725:


Sure thing, just added the link

> [Format] Constant length list type
> --
>
> Key: ARROW-725
> URL: https://issues.apache.org/jira/browse/ARROW-725
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Brian Hulette
>Assignee: Emilio Lahr-Vivaz
>Priority: Trivial
>
> It makes sense to store some data in a row-based format. For example, a 
> position might be stored as two or three coordinates per row, and all of them 
> will almost always be accessed simultaneously. Currently, arrow must store 
> these as two or three separate vectors, but cache performance could 
> potentially be improved if every coordinate for a given row were in the same 
> location in memory.
> The List type could satisfy this requirement, but it requires an additional 
> offset vector which isn't necessary when every element is the same size. I 
> think it would be helpful to define a new type that is essentially a List 
> with every element having the same length. I think "Tuple" would be a natural 
> fit for this type but I'm open to other suggestions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-725) [Format] Constant length list type

2017-04-10 Thread Emilio Lahr-Vivaz (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963146#comment-15963146
 ] 

Emilio Lahr-Vivaz commented on ARROW-725:
-

I'd like to get this into the 0.3 release if possible - can I add it as a 
blocker?

> [Format] Constant length list type
> --
>
> Key: ARROW-725
> URL: https://issues.apache.org/jira/browse/ARROW-725
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Brian Hulette
>Assignee: Emilio Lahr-Vivaz
>Priority: Trivial
>
> It makes sense to store some data in a row-based format. For example, a 
> position might be stored as two or three coordinates per row, and all of them 
> will almost always be accessed simultaneously. Currently, arrow must store 
> these as two or three separate vectors, but cache performance could 
> potentially be improved if every coordinate for a given row were in the same 
> location in memory.
> The List type could satisfy this requirement, but it requires an additional 
> offset vector which isn't necessary when every element is the same size. I 
> think it would be helpful to define a new type that is essentially a List 
> with every element having the same length. I think "Tuple" would be a natural 
> fit for this type but I'm open to other suggestions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-725) [Format] Constant length list type

2017-03-29 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947555#comment-15947555
 ] 

Wes McKinney commented on ARROW-725:


I opened ARROW-733, will take care of that soon

> [Format] Constant length list type
> --
>
> Key: ARROW-725
> URL: https://issues.apache.org/jira/browse/ARROW-725
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Brian Hulette
>Assignee: Emilio Lahr-Vivaz
>Priority: Trivial
>
> It makes sense to store some data in a row-based format. For example, a 
> position might be stored as two or three coordinates per row, and all of them 
> will almost always be accessed simultaneously. Currently, arrow must store 
> these as two or three separate vectors, but cache performance could 
> potentially be improved if every coordinate for a given row were in the same 
> location in memory.
> The List type could satisfy this requirement, but it requires an additional 
> offset vector which isn't necessary when every element is the same size. I 
> think it would be helpful to define a new type that is essentially a List 
> with every element having the same length. I think "Tuple" would be a natural 
> fit for this type but I'm open to other suggestions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-725) [Format] Constant length list type

2017-03-29 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947137#comment-15947137
 ] 

Wes McKinney commented on ARROW-725:


You can leave it for now, then I can do search-and-replace afterward

> [Format] Constant length list type
> --
>
> Key: ARROW-725
> URL: https://issues.apache.org/jira/browse/ARROW-725
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Brian Hulette
>Assignee: Emilio Lahr-Vivaz
>Priority: Trivial
>
> It makes sense to store some data in a row-based format. For example, a 
> position might be stored as two or three coordinates per row, and all of them 
> will almost always be accessed simultaneously. Currently, arrow must store 
> these as two or three separate vectors, but cache performance could 
> potentially be improved if every coordinate for a given row were in the same 
> location in memory.
> The List type could satisfy this requirement, but it requires an additional 
> offset vector which isn't necessary when every element is the same size. I 
> think it would be helpful to define a new type that is essentially a List 
> with every element having the same length. I think "Tuple" would be a natural 
> fit for this type but I'm open to other suggestions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-725) [Format] Constant length list type

2017-03-29 Thread Emilio Lahr-Vivaz (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947126#comment-15947126
 ] 

Emilio Lahr-Vivaz commented on ARROW-725:
-

Should I make that change as part of adding the list? Or leave it for now?

> [Format] Constant length list type
> --
>
> Key: ARROW-725
> URL: https://issues.apache.org/jira/browse/ARROW-725
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Brian Hulette
>Assignee: Emilio Lahr-Vivaz
>Priority: Trivial
>
> It makes sense to store some data in a row-based format. For example, a 
> position might be stored as two or three coordinates per row, and all of them 
> will almost always be accessed simultaneously. Currently, arrow must store 
> these as two or three separate vectors, but cache performance could 
> potentially be improved if every coordinate for a given row were in the same 
> location in memory.
> The List type could satisfy this requirement, but it requires an additional 
> offset vector which isn't necessary when every element is the same size. I 
> think it would be helpful to define a new type that is essentially a List 
> with every element having the same length. I think "Tuple" would be a natural 
> fit for this type but I'm open to other suggestions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-725) [Format] Constant length list type

2017-03-29 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947110#comment-15947110
 ] 

Wes McKinney commented on ARROW-725:


Maybe in the interest of naming consistency, we should rename 
"FIxedWidthBinary" to "FixedSizeBinary", so we can have that and "FixedSizeList"

> [Format] Constant length list type
> --
>
> Key: ARROW-725
> URL: https://issues.apache.org/jira/browse/ARROW-725
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Brian Hulette
>Assignee: Emilio Lahr-Vivaz
>Priority: Trivial
>
> It makes sense to store some data in a row-based format. For example, a 
> position might be stored as two or three coordinates per row, and all of them 
> will almost always be accessed simultaneously. Currently, arrow must store 
> these as two or three separate vectors, but cache performance could 
> potentially be improved if every coordinate for a given row were in the same 
> location in memory.
> The List type could satisfy this requirement, but it requires an additional 
> offset vector which isn't necessary when every element is the same size. I 
> think it would be helpful to define a new type that is essentially a List 
> with every element having the same length. I think "Tuple" would be a natural 
> fit for this type but I'm open to other suggestions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-725) [Format] Constant length list type

2017-03-29 Thread Emilio Lahr-Vivaz (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947105#comment-15947105
 ] 

Emilio Lahr-Vivaz commented on ARROW-725:
-

Yeah, I'll try to put something up in the next day or two

> [Format] Constant length list type
> --
>
> Key: ARROW-725
> URL: https://issues.apache.org/jira/browse/ARROW-725
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Brian Hulette
>Priority: Trivial
>
> It makes sense to store some data in a row-based format. For example, a 
> position might be stored as two or three coordinates per row, and all of them 
> will almost always be accessed simultaneously. Currently, arrow must store 
> these as two or three separate vectors, but cache performance could 
> potentially be improved if every coordinate for a given row were in the same 
> location in memory.
> The List type could satisfy this requirement, but it requires an additional 
> offset vector which isn't necessary when every element is the same size. I 
> think it would be helpful to define a new type that is essentially a List 
> with every element having the same length. I think "Tuple" would be a natural 
> fit for this type but I'm open to other suggestions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-725) [Format] Constant length list type

2017-03-28 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945795#comment-15945795
 ] 

Jacques Nadeau commented on ARROW-725:
--

Yeah, makes sense to me. [~bhulette], do you want to take a shot on the format? 
An initial implementation (java or c++)?

> [Format] Constant length list type
> --
>
> Key: ARROW-725
> URL: https://issues.apache.org/jira/browse/ARROW-725
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Brian Hulette
>Priority: Trivial
>
> It makes sense to store some data in a row-based format. For example, a 
> position might be stored as two or three coordinates per row, and all of them 
> will almost always be accessed simultaneously. Currently, arrow must store 
> these as two or three separate vectors, but cache performance could 
> potentially be improved if every coordinate for a given row were in the same 
> location in memory.
> The List type could satisfy this requirement, but it requires an additional 
> offset vector which isn't necessary when every element is the same size. I 
> think it would be helpful to define a new type that is essentially a List 
> with every element having the same length. I think "Tuple" would be a natural 
> fit for this type but I'm open to other suggestions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)