[jira] [Commented] (ARROW-725) [Format] Constant length list type
[ https://issues.apache.org/jira/browse/ARROW-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963154#comment-15963154 ] Wes McKinney commented on ARROW-725: Sure thing, just added the link > [Format] Constant length list type > -- > > Key: ARROW-725 > URL: https://issues.apache.org/jira/browse/ARROW-725 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Reporter: Brian Hulette >Assignee: Emilio Lahr-Vivaz >Priority: Trivial > > It makes sense to store some data in a row-based format. For example, a > position might be stored as two or three coordinates per row, and all of them > will almost always be accessed simultaneously. Currently, arrow must store > these as two or three separate vectors, but cache performance could > potentially be improved if every coordinate for a given row were in the same > location in memory. > The List type could satisfy this requirement, but it requires an additional > offset vector which isn't necessary when every element is the same size. I > think it would be helpful to define a new type that is essentially a List > with every element having the same length. I think "Tuple" would be a natural > fit for this type but I'm open to other suggestions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-725) [Format] Constant length list type
[ https://issues.apache.org/jira/browse/ARROW-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963146#comment-15963146 ] Emilio Lahr-Vivaz commented on ARROW-725: - I'd like to get this into the 0.3 release if possible - can I add it as a blocker? > [Format] Constant length list type > -- > > Key: ARROW-725 > URL: https://issues.apache.org/jira/browse/ARROW-725 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Reporter: Brian Hulette >Assignee: Emilio Lahr-Vivaz >Priority: Trivial > > It makes sense to store some data in a row-based format. For example, a > position might be stored as two or three coordinates per row, and all of them > will almost always be accessed simultaneously. Currently, arrow must store > these as two or three separate vectors, but cache performance could > potentially be improved if every coordinate for a given row were in the same > location in memory. > The List type could satisfy this requirement, but it requires an additional > offset vector which isn't necessary when every element is the same size. I > think it would be helpful to define a new type that is essentially a List > with every element having the same length. I think "Tuple" would be a natural > fit for this type but I'm open to other suggestions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-725) [Format] Constant length list type
[ https://issues.apache.org/jira/browse/ARROW-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947555#comment-15947555 ] Wes McKinney commented on ARROW-725: I opened ARROW-733, will take care of that soon > [Format] Constant length list type > -- > > Key: ARROW-725 > URL: https://issues.apache.org/jira/browse/ARROW-725 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Reporter: Brian Hulette >Assignee: Emilio Lahr-Vivaz >Priority: Trivial > > It makes sense to store some data in a row-based format. For example, a > position might be stored as two or three coordinates per row, and all of them > will almost always be accessed simultaneously. Currently, arrow must store > these as two or three separate vectors, but cache performance could > potentially be improved if every coordinate for a given row were in the same > location in memory. > The List type could satisfy this requirement, but it requires an additional > offset vector which isn't necessary when every element is the same size. I > think it would be helpful to define a new type that is essentially a List > with every element having the same length. I think "Tuple" would be a natural > fit for this type but I'm open to other suggestions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-725) [Format] Constant length list type
[ https://issues.apache.org/jira/browse/ARROW-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947137#comment-15947137 ] Wes McKinney commented on ARROW-725: You can leave it for now, then I can do search-and-replace afterward > [Format] Constant length list type > -- > > Key: ARROW-725 > URL: https://issues.apache.org/jira/browse/ARROW-725 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Reporter: Brian Hulette >Assignee: Emilio Lahr-Vivaz >Priority: Trivial > > It makes sense to store some data in a row-based format. For example, a > position might be stored as two or three coordinates per row, and all of them > will almost always be accessed simultaneously. Currently, arrow must store > these as two or three separate vectors, but cache performance could > potentially be improved if every coordinate for a given row were in the same > location in memory. > The List type could satisfy this requirement, but it requires an additional > offset vector which isn't necessary when every element is the same size. I > think it would be helpful to define a new type that is essentially a List > with every element having the same length. I think "Tuple" would be a natural > fit for this type but I'm open to other suggestions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-725) [Format] Constant length list type
[ https://issues.apache.org/jira/browse/ARROW-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947126#comment-15947126 ] Emilio Lahr-Vivaz commented on ARROW-725: - Should I make that change as part of adding the list? Or leave it for now? > [Format] Constant length list type > -- > > Key: ARROW-725 > URL: https://issues.apache.org/jira/browse/ARROW-725 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Reporter: Brian Hulette >Assignee: Emilio Lahr-Vivaz >Priority: Trivial > > It makes sense to store some data in a row-based format. For example, a > position might be stored as two or three coordinates per row, and all of them > will almost always be accessed simultaneously. Currently, arrow must store > these as two or three separate vectors, but cache performance could > potentially be improved if every coordinate for a given row were in the same > location in memory. > The List type could satisfy this requirement, but it requires an additional > offset vector which isn't necessary when every element is the same size. I > think it would be helpful to define a new type that is essentially a List > with every element having the same length. I think "Tuple" would be a natural > fit for this type but I'm open to other suggestions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-725) [Format] Constant length list type
[ https://issues.apache.org/jira/browse/ARROW-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947110#comment-15947110 ] Wes McKinney commented on ARROW-725: Maybe in the interest of naming consistency, we should rename "FIxedWidthBinary" to "FixedSizeBinary", so we can have that and "FixedSizeList" > [Format] Constant length list type > -- > > Key: ARROW-725 > URL: https://issues.apache.org/jira/browse/ARROW-725 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Reporter: Brian Hulette >Assignee: Emilio Lahr-Vivaz >Priority: Trivial > > It makes sense to store some data in a row-based format. For example, a > position might be stored as two or three coordinates per row, and all of them > will almost always be accessed simultaneously. Currently, arrow must store > these as two or three separate vectors, but cache performance could > potentially be improved if every coordinate for a given row were in the same > location in memory. > The List type could satisfy this requirement, but it requires an additional > offset vector which isn't necessary when every element is the same size. I > think it would be helpful to define a new type that is essentially a List > with every element having the same length. I think "Tuple" would be a natural > fit for this type but I'm open to other suggestions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-725) [Format] Constant length list type
[ https://issues.apache.org/jira/browse/ARROW-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947105#comment-15947105 ] Emilio Lahr-Vivaz commented on ARROW-725: - Yeah, I'll try to put something up in the next day or two > [Format] Constant length list type > -- > > Key: ARROW-725 > URL: https://issues.apache.org/jira/browse/ARROW-725 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Reporter: Brian Hulette >Priority: Trivial > > It makes sense to store some data in a row-based format. For example, a > position might be stored as two or three coordinates per row, and all of them > will almost always be accessed simultaneously. Currently, arrow must store > these as two or three separate vectors, but cache performance could > potentially be improved if every coordinate for a given row were in the same > location in memory. > The List type could satisfy this requirement, but it requires an additional > offset vector which isn't necessary when every element is the same size. I > think it would be helpful to define a new type that is essentially a List > with every element having the same length. I think "Tuple" would be a natural > fit for this type but I'm open to other suggestions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-725) [Format] Constant length list type
[ https://issues.apache.org/jira/browse/ARROW-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945795#comment-15945795 ] Jacques Nadeau commented on ARROW-725: -- Yeah, makes sense to me. [~bhulette], do you want to take a shot on the format? An initial implementation (java or c++)? > [Format] Constant length list type > -- > > Key: ARROW-725 > URL: https://issues.apache.org/jira/browse/ARROW-725 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Reporter: Brian Hulette >Priority: Trivial > > It makes sense to store some data in a row-based format. For example, a > position might be stored as two or three coordinates per row, and all of them > will almost always be accessed simultaneously. Currently, arrow must store > these as two or three separate vectors, but cache performance could > potentially be improved if every coordinate for a given row were in the same > location in memory. > The List type could satisfy this requirement, but it requires an additional > offset vector which isn't necessary when every element is the same size. I > think it would be helpful to define a new type that is essentially a List > with every element having the same length. I think "Tuple" would be a natural > fit for this type but I'm open to other suggestions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)