[jira] [Commented] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
[ https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172007#comment-16172007 ] Tom Augspurger commented on ARROW-1557: --- I can probably submit a fix on Thursday or Friday. > [PYTHON] pyarrow.Table.from_arrays doesn't validate names length > > > Key: ARROW-1557 > URL: https://issues.apache.org/jira/browse/ARROW-1557 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Priority: Minor > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > pa.Table.from_arrays doesn't validate that the length of {{arrays}} and > {{names}} matches. I think this should raise with a {{ValueError}}: > {{ > In [1]: import pyarrow as pa > In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], > names=['a', 'b', 'c']) > Out[2]: > pyarrow.Table > a: int64 > b: int64 > In [3]: pa.__version__ > Out[3]: '0.7.0' > }} > (This is my first time using JIRA, hopefully I didn't mess up too badly) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
[ https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172019#comment-16172019 ] Wes McKinney commented on ARROW-1557: - Agreed! thanks for the bug report > [PYTHON] pyarrow.Table.from_arrays doesn't validate names length > > > Key: ARROW-1557 > URL: https://issues.apache.org/jira/browse/ARROW-1557 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Priority: Minor > Fix For: 0.8.0 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > pa.Table.from_arrays doesn't validate that the length of {{arrays}} and > {{names}} matches. I think this should raise with a {{ValueError}}: > {code} > In [1]: import pyarrow as pa > In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], > names=['a', 'b', 'c']) > Out[2]: > pyarrow.Table > a: int64 > b: int64 > In [3]: pa.__version__ > Out[3]: '0.7.0' > {code} > (This is my first time using JIRA, hopefully I didn't mess up too badly) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
[ https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172608#comment-16172608 ] ASF GitHub Bot commented on ARROW-1557: --- GitHub user TomAugspurger opened a pull request: https://github.com/apache/arrow/pull/1117 ARROW-1557 [Python] Validate names length in Table.from_arrays We now raise a ValueError when the length of the names doesn't match the length of the arrays. ```python In [1]: import pyarrow as pa In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], names=['a', 'b', 'c']) --- ValueErrorTraceback (most recent call last) in () > 1 pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], names=['a', 'b', 'c']) table.pxi in pyarrow.lib.Table.from_arrays() table.pxi in pyarrow.lib._schema_from_arrays() ValueError: Length of names (3) does not match length of arrays (2) ``` This affected `RecordBatch.from_arrays` and `Table.from_arrays`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/TomAugspurger/arrow validate-names Alternatively you can review and apply these changes as the patch at: https://github.com/apache/arrow/pull/1117.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1117 commit ed74d52249fabde739cf0599be0210c818b5d272 Author: Tom Augspurger Date: 2017-09-20T01:44:44Z ARROW-1557 [Python] Validate names length in Table.from_arrays We now raise a ValueError when the length of the names doesn't match the length of the arrays. > [PYTHON] pyarrow.Table.from_arrays doesn't validate names length > > > Key: ARROW-1557 > URL: https://issues.apache.org/jira/browse/ARROW-1557 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Priority: Minor > Labels: pull-request-available > Fix For: 0.8.0 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > pa.Table.from_arrays doesn't validate that the length of {{arrays}} and > {{names}} matches. I think this should raise with a {{ValueError}}: > {code} > In [1]: import pyarrow as pa > In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], > names=['a', 'b', 'c']) > Out[2]: > pyarrow.Table > a: int64 > b: int64 > In [3]: pa.__version__ > Out[3]: '0.7.0' > {code} > (This is my first time using JIRA, hopefully I didn't mess up too badly) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
[ https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172745#comment-16172745 ] ASF GitHub Bot commented on ARROW-1557: --- Github user wesm commented on the issue: https://github.com/apache/arrow/pull/1117 Appears there is a test failure that was exposed by this patch, can you fix? > [PYTHON] pyarrow.Table.from_arrays doesn't validate names length > > > Key: ARROW-1557 > URL: https://issues.apache.org/jira/browse/ARROW-1557 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Assignee: Tom Augspurger >Priority: Minor > Labels: pull-request-available > Fix For: 0.8.0 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > pa.Table.from_arrays doesn't validate that the length of {{arrays}} and > {{names}} matches. I think this should raise with a {{ValueError}}: > {code} > In [1]: import pyarrow as pa > In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], > names=['a', 'b', 'c']) > Out[2]: > pyarrow.Table > a: int64 > b: int64 > In [3]: pa.__version__ > Out[3]: '0.7.0' > {code} > (This is my first time using JIRA, hopefully I didn't mess up too badly) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
[ https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172749#comment-16172749 ] ASF GitHub Bot commented on ARROW-1557: --- Github user wesm commented on the issue: https://github.com/apache/arrow/pull/1117 here's a fix to cherry pick https://github.com/wesm/arrow/commit/965a560867f45025dcbfe50c572593faa7d7cb33 > [PYTHON] pyarrow.Table.from_arrays doesn't validate names length > > > Key: ARROW-1557 > URL: https://issues.apache.org/jira/browse/ARROW-1557 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Assignee: Tom Augspurger >Priority: Minor > Labels: pull-request-available > Fix For: 0.8.0 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > pa.Table.from_arrays doesn't validate that the length of {{arrays}} and > {{names}} matches. I think this should raise with a {{ValueError}}: > {code} > In [1]: import pyarrow as pa > In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], > names=['a', 'b', 'c']) > Out[2]: > pyarrow.Table > a: int64 > b: int64 > In [3]: pa.__version__ > Out[3]: '0.7.0' > {code} > (This is my first time using JIRA, hopefully I didn't mess up too badly) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
[ https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16173024#comment-16173024 ] ASF GitHub Bot commented on ARROW-1557: --- Github user TomAugspurger commented on the issue: https://github.com/apache/arrow/pull/1117 Sure thing. While we have the chance, on https://github.com/apache/arrow/pull/1117/files#diff-434b799d30eaec7287bee5603a9c45beR318, should that be changed to `if not K`, since that's already `len(arrays)`? My intuition is that `len` is going to be pretty quick anyway, but thought I'd check. > [PYTHON] pyarrow.Table.from_arrays doesn't validate names length > > > Key: ARROW-1557 > URL: https://issues.apache.org/jira/browse/ARROW-1557 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Assignee: Tom Augspurger >Priority: Minor > Labels: pull-request-available > Fix For: 0.8.0 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > pa.Table.from_arrays doesn't validate that the length of {{arrays}} and > {{names}} matches. I think this should raise with a {{ValueError}}: > {code} > In [1]: import pyarrow as pa > In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], > names=['a', 'b', 'c']) > Out[2]: > pyarrow.Table > a: int64 > b: int64 > In [3]: pa.__version__ > Out[3]: '0.7.0' > {code} > (This is my first time using JIRA, hopefully I didn't mess up too badly) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
[ https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16173060#comment-16173060 ] ASF GitHub Bot commented on ARROW-1557: --- Github user wesm commented on the issue: https://github.com/apache/arrow/pull/1117 `if not K` is probably better, feel free to make that change too > [PYTHON] pyarrow.Table.from_arrays doesn't validate names length > > > Key: ARROW-1557 > URL: https://issues.apache.org/jira/browse/ARROW-1557 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Assignee: Tom Augspurger >Priority: Minor > Labels: pull-request-available > Fix For: 0.8.0 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > pa.Table.from_arrays doesn't validate that the length of {{arrays}} and > {{names}} matches. I think this should raise with a {{ValueError}}: > {code} > In [1]: import pyarrow as pa > In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], > names=['a', 'b', 'c']) > Out[2]: > pyarrow.Table > a: int64 > b: int64 > In [3]: pa.__version__ > Out[3]: '0.7.0' > {code} > (This is my first time using JIRA, hopefully I didn't mess up too badly) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
[ https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16173550#comment-16173550 ] ASF GitHub Bot commented on ARROW-1557: --- Github user asfgit closed the pull request at: https://github.com/apache/arrow/pull/1117 > [PYTHON] pyarrow.Table.from_arrays doesn't validate names length > > > Key: ARROW-1557 > URL: https://issues.apache.org/jira/browse/ARROW-1557 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Assignee: Tom Augspurger >Priority: Minor > Labels: pull-request-available > Fix For: 0.8.0 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > pa.Table.from_arrays doesn't validate that the length of {{arrays}} and > {{names}} matches. I think this should raise with a {{ValueError}}: > {code} > In [1]: import pyarrow as pa > In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], > names=['a', 'b', 'c']) > Out[2]: > pyarrow.Table > a: int64 > b: int64 > In [3]: pa.__version__ > Out[3]: '0.7.0' > {code} > (This is my first time using JIRA, hopefully I didn't mess up too badly) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
[ https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16173558#comment-16173558 ] ASF GitHub Bot commented on ARROW-1557: --- Github user wesm commented on the issue: https://github.com/apache/arrow/pull/1117 In case it's useful we have nightly dev builds > [PYTHON] pyarrow.Table.from_arrays doesn't validate names length > > > Key: ARROW-1557 > URL: https://issues.apache.org/jira/browse/ARROW-1557 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Assignee: Tom Augspurger >Priority: Minor > Labels: pull-request-available > Fix For: 0.8.0 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > pa.Table.from_arrays doesn't validate that the length of {{arrays}} and > {{names}} matches. I think this should raise with a {{ValueError}}: > {code} > In [1]: import pyarrow as pa > In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], > names=['a', 'b', 'c']) > Out[2]: > pyarrow.Table > a: int64 > b: int64 > In [3]: pa.__version__ > Out[3]: '0.7.0' > {code} > (This is my first time using JIRA, hopefully I didn't mess up too badly) -- This message was sent by Atlassian JIRA (v6.4.14#64029)