[jira] [Comment Edited] (ARROW-15554) [Format][C++] Add "LargeMap" type with 64-bit offsets
[ https://issues.apache.org/jira/browse/ARROW-15554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488413#comment-17488413 ] Sarah Gilmore edited comment on ARROW-15554 at 2/7/22, 8:21 PM: Will do! was (Author: sgilmore): Wil do! > [Format][C++] Add "LargeMap" type with 64-bit offsets > - > > Key: ARROW-15554 > URL: https://issues.apache.org/jira/browse/ARROW-15554 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Format >Reporter: Sarah Gilmore >Priority: Major > > It would be nice if a "LargeMap" type existed along side the "Map" type for > parity. For other datatypes that require offset arrays/buffers, such as > String, List, BinaryArray, provides a "large" version of these types, i.e. > LargeString, LargeList, and LargeBinaryArray. It would be nice to have a > "LargeMap" for parity. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (ARROW-15554) [Format][C++] Add "LargeMap" type with 64-bit offsets
[ https://issues.apache.org/jira/browse/ARROW-15554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488413#comment-17488413 ] Sarah Gilmore commented on ARROW-15554: --- Wil do! > [Format][C++] Add "LargeMap" type with 64-bit offsets > - > > Key: ARROW-15554 > URL: https://issues.apache.org/jira/browse/ARROW-15554 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Format >Reporter: Sarah Gilmore >Priority: Major > > It would be nice if a "LargeMap" type existed along side the "Map" type for > parity. For other datatypes that require offset arrays/buffers, such as > String, List, BinaryArray, provides a "large" version of these types, i.e. > LargeString, LargeList, and LargeBinaryArray. It would be nice to have a > "LargeMap" for parity. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (ARROW-15554) [Format][C++] Add "LargeMap" type with 64-bit offsets
[ https://issues.apache.org/jira/browse/ARROW-15554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487136#comment-17487136 ] Sarah Gilmore commented on ARROW-15554: --- Hi [~apitrou], I was more thinking about the future when I created this Jira issue. I don't have a concrete need now, but I can picture a few scenarios in which the size limitation imposed by MapArray's 32-bit offsets cannot be worked around. *Scenario 1:* Suppose you have a ListArray of MapArrays. If one of the maps requires more than int32::max key-value pairs, there's no way to do this currently. You could try using a ChunkedArray, but you would still need to split the large map across multiple rows in the list. *Scenario 2:* Even if the MapArray is at the top of the object hierarchy, the same problem could potentially arise if a row within the array needs to contain more than int32::max key-value pairs. You could try to use a ChunkedArray to resolve the issue, but the key-value pairs would still be split across multiple rows. I've seen Parquet files with MAP columns, and I can imagine a situation in which someone has a very large MAP as the top-most data structure or within a nested one. While running into a situation in which they can't use MapArrays to represent their data is probably rare, it's not entirely impossible given int32's size restrictions. I'd honestly be interested in looking into this myself. I hope this helps. Best, Sarah > [Format][C++] Add "LargeMap" type with 64-bit offsets > - > > Key: ARROW-15554 > URL: https://issues.apache.org/jira/browse/ARROW-15554 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Format >Reporter: Sarah Gilmore >Priority: Major > > It would be nice if a "LargeMap" type existed along side the "Map" type for > parity. For other datatypes that require offset arrays/buffers, such as > String, List, BinaryArray, provides a "large" version of these types, i.e. > LargeString, LargeList, and LargeBinaryArray. It would be nice to have a > "LargeMap" for parity. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15554) [Format][C++] Add "LargeMap" type with 64-bit offsets
Sarah Gilmore created ARROW-15554: - Summary: [Format][C++] Add "LargeMap" type with 64-bit offsets Key: ARROW-15554 URL: https://issues.apache.org/jira/browse/ARROW-15554 Project: Apache Arrow Issue Type: Improvement Components: C++, Format Reporter: Sarah Gilmore It would be nice if a "LargeMap" type existed along side the "Map" type for parity. For other datatypes that require offset arrays/buffers, such as String, List, BinaryArray, provides a "large" version of these types, i.e. LargeString, LargeList, and LargeBinaryArray. It would be nice to have a "LargeMap" for parity. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (ARROW-14723) [Python] pyarrow cannot import parquet files containing row groups whose lengths exceed int32 max.
[ https://issues.apache.org/jira/browse/ARROW-14723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17446022#comment-17446022 ] Sarah Gilmore commented on ARROW-14723: --- Hi [~jorisvandenbossche], Here's code you can use to generate to generate both files: [^main.cpp]. In the terminal, you'll be prompted to give the output filename and the number of rows you want the Parquet file to have. I noticed that if I linked against the latest version of Arrow (I believe 7.0.0), the files created by the program can be read in via pyarrow. However, if you link against 4.0.1, Parquet files with row groups that exceed 2147483647 in length cannot be read in via pyarrow. I suppose this issue has been resolved in a later release of Arrow? Best, Sarah > [Python] pyarrow cannot import parquet files containing row groups whose > lengths exceed int32 max. > --- > > Key: ARROW-14723 > URL: https://issues.apache.org/jira/browse/ARROW-14723 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 5.0.0 >Reporter: Sarah Gilmore >Priority: Minor > Attachments: intmax32.parq, intmax32plus1.parq, main.cpp > > > It's possible to create Parquet files containing row groups whose lengths are > greater than int32 max (2147483647). However, Pyarrow cannot read these > files. > {code:java} > >>> import pyarrow as pa > >>> import pyarrow.parquet as pq > # intmax32.parq can be read in without any issues > >>> t = pq.read_table("intmax32.parq"); > $ intmax32plus1.parq cannot be read in > >>> t = pq.read_table("intmax32plus1.parq"); > Traceback (most recent call last): > File "", line 1, in > File > "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pyarrow/parquet.py", > line 1895, in read_table > return dataset.read(columns=columns, use_threads=use_threads, > File > "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pyarrow/parquet.py", > line 1744, in read > table = self._dataset.to_table( > File "pyarrow/_dataset.pyx", line 465, in pyarrow._dataset.Dataset.to_table > File "pyarrow/_dataset.pyx", line 3075, in pyarrow._dataset.Scanner.to_table > File "pyarrow/error.pxi", line 143, in > pyarrow.lib.pyarrow_internal_check_status > File "pyarrow/error.pxi", line 114, in pyarrow.lib.check_status > OSError: Negative size (corrupt file?) > {code} > > However, both files can be imported via the C++ Arrow bindings without any > issues. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (ARROW-14723) [Python] pyarrow cannot import parquet files containing row groups whose lengths exceed int32 max.
[ https://issues.apache.org/jira/browse/ARROW-14723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sarah Gilmore updated ARROW-14723: -- Attachment: main.cpp > [Python] pyarrow cannot import parquet files containing row groups whose > lengths exceed int32 max. > --- > > Key: ARROW-14723 > URL: https://issues.apache.org/jira/browse/ARROW-14723 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 5.0.0 >Reporter: Sarah Gilmore >Priority: Minor > Attachments: intmax32.parq, intmax32plus1.parq, main.cpp > > > It's possible to create Parquet files containing row groups whose lengths are > greater than int32 max (2147483647). However, Pyarrow cannot read these > files. > {code:java} > >>> import pyarrow as pa > >>> import pyarrow.parquet as pq > # intmax32.parq can be read in without any issues > >>> t = pq.read_table("intmax32.parq"); > $ intmax32plus1.parq cannot be read in > >>> t = pq.read_table("intmax32plus1.parq"); > Traceback (most recent call last): > File "", line 1, in > File > "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pyarrow/parquet.py", > line 1895, in read_table > return dataset.read(columns=columns, use_threads=use_threads, > File > "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pyarrow/parquet.py", > line 1744, in read > table = self._dataset.to_table( > File "pyarrow/_dataset.pyx", line 465, in pyarrow._dataset.Dataset.to_table > File "pyarrow/_dataset.pyx", line 3075, in pyarrow._dataset.Scanner.to_table > File "pyarrow/error.pxi", line 143, in > pyarrow.lib.pyarrow_internal_check_status > File "pyarrow/error.pxi", line 114, in pyarrow.lib.check_status > OSError: Negative size (corrupt file?) > {code} > > However, both files can be imported via the C++ Arrow bindings without any > issues. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-14723) [Python] pyarrow cannot import parquet files containing row groups whose lengths exceed int32 max.
Sarah Gilmore created ARROW-14723: - Summary: [Python] pyarrow cannot import parquet files containing row groups whose lengths exceed int32 max. Key: ARROW-14723 URL: https://issues.apache.org/jira/browse/ARROW-14723 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 5.0.0 Reporter: Sarah Gilmore Attachments: intmax32.parq, intmax32plus1.parq It's possible to create Parquet files containing row groups whose lengths are greater than int32 max (2147483647). However, Pyarrow cannot read these files. {code:java} >>> import pyarrow as pa >>> import pyarrow.parquet as pq # intmax32.parq can be read in without any issues >>> t = pq.read_table("intmax32.parq"); $ intmax32plus1.parq cannot be read in >>> t = pq.read_table("intmax32plus1.parq"); Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pyarrow/parquet.py", line 1895, in read_table return dataset.read(columns=columns, use_threads=use_threads, File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pyarrow/parquet.py", line 1744, in read table = self._dataset.to_table( File "pyarrow/_dataset.pyx", line 465, in pyarrow._dataset.Dataset.to_table File "pyarrow/_dataset.pyx", line 3075, in pyarrow._dataset.Scanner.to_table File "pyarrow/error.pxi", line 143, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 114, in pyarrow.lib.check_status OSError: Negative size (corrupt file?) {code} However, both files can be imported via the C++ Arrow bindings without any issues. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (ARROW-14104) Reading Lists of Timestamps from parquet files in Arrow 5.0.0 fails to preserve the TimeZone - unlike in Arrow 4.0.0
[ https://issues.apache.org/jira/browse/ARROW-14104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434368#comment-17434368 ] Sarah Gilmore commented on ARROW-14104: --- Hi [~jorisvandenbossche], It actually looks like I was running an older version of pyarrow based on the output of {{pa.__version__}}. According to pip, I have pyarrow 5.0.0: {code:java} // code placeholder Name: pyarrow Version: 5.0.0 Summary: Python library for Apache Arrow Home-page: https://arrow.apache.org/ Author: Author-email: License: Apache License, Version 2.0 Location: /usr/local/lib/python3.9/site-packages Requires: numpy Required-by: parquet-tools {code} But {{pa.__version__}} returns {{'0.17.1'}}. It looks like my system configuration got messed up, though I'm not sure how. I was able to confirm that the TimeZone is round-tripped in pyarrow 5.0.0 by creating a virtual environment with python's venv module and installing pyarrow 5.0.0 there. I'm sorry for any confusion I caused. I'll close this issue. Best, Sarah > Reading Lists of Timestamps from parquet files in Arrow 5.0.0 fails to > preserve the TimeZone - unlike in Arrow 4.0.0 > > > Key: ARROW-14104 > URL: https://issues.apache.org/jira/browse/ARROW-14104 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Parquet, Python >Affects Versions: 5.0.0 >Reporter: Sarah Gilmore >Priority: Minor > Attachments: exampleArrow4.parq, exampleArrow5.parq > > > In Arrow 4.0.0 it is possible to round-trip the TimeZone property of > List columns to and from parquet files: > {code:java} > >>> import pyarrow as pa > >>> import pyarrow.parquet as pq > >>> import datetime > >>> column = pa.array([[datetime.datetime(2023, 9, 23, 11)]], > >>> pa.list_(pa.timestamp('us', 'America/New_York'))); > >>> t = pa.Table.from_arrays([column], names=['TimestampColumn']); > >>> pq.write_table(t, "example.parq"); > >>> t2 = pq.read_table("example.parq"); > >>> t2 > pyarrow.Table > Dates: list > child 0, item: timestamp[us, tz=America/New_York] > {code} > However, if you read the same parquet file in pyarrow 5.0.0, the TimeZone is > set to UTC: > {code:java} > >>> t3 = pq.read_table("example.parq"); > >>> t3 > pyarrow.Table > Dates: list > child 0, item: timestamp[us, tz=UTC] > {code} > > I noticed that the TimeZone is preserved in Arrow 5.0 when reading non-nested > timestamp columns. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (ARROW-14104) Reading Lists of Timestamps from parquet files in Arrow 5.0.0 fails to preserve the TimeZone - unlike in Arrow 4.0.0
[ https://issues.apache.org/jira/browse/ARROW-14104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sarah Gilmore closed ARROW-14104. - Resolution: Not A Problem This issue was a result of a configuration problem in my environment. > Reading Lists of Timestamps from parquet files in Arrow 5.0.0 fails to > preserve the TimeZone - unlike in Arrow 4.0.0 > > > Key: ARROW-14104 > URL: https://issues.apache.org/jira/browse/ARROW-14104 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Parquet, Python >Affects Versions: 5.0.0 >Reporter: Sarah Gilmore >Priority: Minor > Attachments: exampleArrow4.parq, exampleArrow5.parq > > > In Arrow 4.0.0 it is possible to round-trip the TimeZone property of > List columns to and from parquet files: > {code:java} > >>> import pyarrow as pa > >>> import pyarrow.parquet as pq > >>> import datetime > >>> column = pa.array([[datetime.datetime(2023, 9, 23, 11)]], > >>> pa.list_(pa.timestamp('us', 'America/New_York'))); > >>> t = pa.Table.from_arrays([column], names=['TimestampColumn']); > >>> pq.write_table(t, "example.parq"); > >>> t2 = pq.read_table("example.parq"); > >>> t2 > pyarrow.Table > Dates: list > child 0, item: timestamp[us, tz=America/New_York] > {code} > However, if you read the same parquet file in pyarrow 5.0.0, the TimeZone is > set to UTC: > {code:java} > >>> t3 = pq.read_table("example.parq"); > >>> t3 > pyarrow.Table > Dates: list > child 0, item: timestamp[us, tz=UTC] > {code} > > I noticed that the TimeZone is preserved in Arrow 5.0 when reading non-nested > timestamp columns. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-14104) Reading Lists of Timestamps from parquet files in Arrow 5.0.0 fails to preserve the TimeZone - unlike in Arrow 4.0.0
[ https://issues.apache.org/jira/browse/ARROW-14104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430530#comment-17430530 ] Sarah Gilmore edited comment on ARROW-14104 at 10/19/21, 12:57 PM: --- So sorry about the delay. [~jorisvandenbossche] and [~westonpace]. I've attached two files ([^exampleArrow4.parq] and [^exampleArrow5.parq]) that were both created with the following code: {code:java} // code placeholder import pyarrow as pa import pyarrow.parquet as pq import datetime >>> column = pa.array([[datetime.datetime(2023, 9, 23, 11)]], >>> pa.list_(pa.timestamp('us', 'America/New_York'))); >>> t = pa.Table.from_arrays([column], names=['TimestampColumn']); >>> pq.write_table(t, "example.parq"); {code} I ran the code snippet above in pyarrow 4.0.0 and pyarrow 5.0.0 to create exampleArrow4.parq and exampleArrow5.parq, respectively. Here's the output of reading both files in pyarrow 4.0.0: {code:java} // code placeholder import pyarrow as pa import pyarrow.parquet as pq >>> t1 = pq.read_table("exampleArrow4.parq") >>> t1 pyarrow.Table TimestampColumn: list child 0, item: timestamp[us, tz=America/New_York] >>> t2 = pq.read_table("exampleArrow5.parq") >>> t2 pyarrow.Table TimestampColumn: list child 0, item: timestamp[us, tz=America/New_York]{code} The TimeZone is read in properly for both files. Here's the output of reading both files in pyarrow 5.0.0: {code:java} // code placeholder import pyarrow as pa import pyarrow.parquet as pq >>> t1 = pq.read_table("exampleArrow4.parq") >>> t1 pyarrow.Table TimestampColumn: list child 0, item: timestamp[us, tz=UTC] >>> t2 = pq.read_table("exampleArrow5.parq") >>> t2 pyarrow.Table TimestampColumn: list child 0, item: timestamp[us, tz=UTC] {code} It looks like pyarrow 5.0.0 writes out the TimeZone information, but doesn't read it in properly. was (Author: sgilmore): So sorry about the delay. [~jorisvandenbossche] and [~westonpace]. I've attached two files ([^exampleArrow4.parq] and [^exampleArrow5.parq]) that were both created with the following code: {code:java} // code placeholder import pyarrow as pa import pyarrow.parquet as pq import datetime >>> column = pa.array([[datetime.datetime(2023, 9, 23, 11)]], >>> pa.list_(pa.timestamp('us', 'America/New_York'))); >>> t = pa.Table.from_arrays([column], names=['TimestampColumn']); >>> pq.write_table(t, "example.parq"); {code} I ran the code snippet above in pyarrow 4.0.0 and pyarrow 5.0.0 to create exampleArrow4.parq and exampleArrow5.parq, respectively. Here's the output of reading both files in pyarrow 4.0.0: {code:java} // code placeholder import pyarrow as pa import pyarrow.parquet as pq >>> t1 = pq.read_table("exampleArrow4.parq") >>> t1 pyarrow.Table TimestampColumn: list child 0, item: timestamp[us, tz=America/New_York] >>> t2 = pq.read_table("exampleArrow5.parq") >>> t2 pyarrow.Table TimestampColumn: list child 0, item: timestamp[us, tz=America/New_York]{code} The TimeZone is read in properly for both files. Here's the output of reading both files in pyarrow 5.0.0: {code:java} // code placeholder import pyarrow as pa import pyarrow.parquet as pq >>> t1 = pq.read_table("exampleArrow4.parq") >>> t1 pyarrow.Table TimestampColumn: list child 0, item: timestamp[us, tz=UTC] >>> t2 = pq.read_table("exampleArrow5.parq") >>> t2 pyarrow.Table TimestampColumn: list child 0, item: timestamp[us, tz=UTC] {code} It looks like pyarrow 5.0.0 writes out the TimeZone information, but doesn't read it in properly. > Reading Lists of Timestamps from parquet files in Arrow 5.0.0 fails to > preserve the TimeZone - unlike in Arrow 4.0.0 > > > Key: ARROW-14104 > URL: https://issues.apache.org/jira/browse/ARROW-14104 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Parquet, Python >Affects Versions: 5.0.0 >Reporter: Sarah Gilmore >Priority: Minor > Attachments: exampleArrow4.parq, exampleArrow5.parq > > > In Arrow 4.0.0 it is possible to round-trip the TimeZone property of > List columns to and from parquet files: > {code:java} > >>> import pyarrow as pa > >>> import pyarrow.parquet as pq > >>> import datetime > >>> column = pa.array([[datetime.datetime(2023, 9, 23, 11)]], > >>> pa.list_(pa.timestamp('us', 'America/New_York'))); > >>> t = pa.Table.from_arrays([column], names=['TimestampColumn']); > >>> pq.write_table(t, "example.parq"); > >>> t2 = pq.read_table("example.parq"); > >>> t2 > pyarrow.Table > Dates: list > child 0, item: timestamp[us, tz=America/New_York] > {code} > However, if you read the same parquet file in pyarrow 5.0.0, the TimeZone is > set to UTC: > {code:java} > >
[jira] [Commented] (ARROW-14104) Reading Lists of Timestamps from parquet files in Arrow 5.0.0 fails to preserve the TimeZone - unlike in Arrow 4.0.0
[ https://issues.apache.org/jira/browse/ARROW-14104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430530#comment-17430530 ] Sarah Gilmore commented on ARROW-14104: --- So sorry about the delay. [~jorisvandenbossche] and [~westonpace]. I've attached two files ([^exampleArrow4.parq] and [^exampleArrow5.parq]) that were both created with the following code: {code:java} // code placeholder import pyarrow as pa import pyarrow.parquet as pq import datetime >>> column = pa.array([[datetime.datetime(2023, 9, 23, 11)]], >>> pa.list_(pa.timestamp('us', 'America/New_York'))); >>> t = pa.Table.from_arrays([column], names=['TimestampColumn']); >>> pq.write_table(t, "example.parq"); {code} I ran the code snippet above in pyarrow 4.0.0 and pyarrow 5.0.0 to create exampleArrow4.parq and exampleArrow5.parq, respectively. Here's the output of reading both files in pyarrow 4.0.0: {code:java} // code placeholder import pyarrow as pa import pyarrow.parquet as pq >>> t1 = pq.read_table("exampleArrow4.parq") >>> t1 pyarrow.Table TimestampColumn: list child 0, item: timestamp[us, tz=America/New_York] >>> t2 = pq.read_table("exampleArrow5.parq") >>> t2 pyarrow.Table TimestampColumn: list child 0, item: timestamp[us, tz=America/New_York]{code} The TimeZone is read in properly for both files. Here's the output of reading both files in pyarrow 5.0.0: {code:java} // code placeholder import pyarrow as pa import pyarrow.parquet as pq >>> t1 = pq.read_table("exampleArrow4.parq") >>> t1 pyarrow.Table TimestampColumn: list child 0, item: timestamp[us, tz=UTC] >>> t2 = pq.read_table("exampleArrow5.parq") >>> t2 pyarrow.Table TimestampColumn: list child 0, item: timestamp[us, tz=UTC] {code} It looks like pyarrow 5.0.0 writes out the TimeZone information, but doesn't read it in properly. > Reading Lists of Timestamps from parquet files in Arrow 5.0.0 fails to > preserve the TimeZone - unlike in Arrow 4.0.0 > > > Key: ARROW-14104 > URL: https://issues.apache.org/jira/browse/ARROW-14104 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Parquet, Python >Affects Versions: 5.0.0 >Reporter: Sarah Gilmore >Priority: Minor > Attachments: exampleArrow4.parq, exampleArrow5.parq > > > In Arrow 4.0.0 it is possible to round-trip the TimeZone property of > List columns to and from parquet files: > {code:java} > >>> import pyarrow as pa > >>> import pyarrow.parquet as pq > >>> import datetime > >>> column = pa.array([[datetime.datetime(2023, 9, 23, 11)]], > >>> pa.list_(pa.timestamp('us', 'America/New_York'))); > >>> t = pa.Table.from_arrays([column], names=['TimestampColumn']); > >>> pq.write_table(t, "example.parq"); > >>> t2 = pq.read_table("example.parq"); > >>> t2 > pyarrow.Table > Dates: list > child 0, item: timestamp[us, tz=America/New_York] > {code} > However, if you read the same parquet file in pyarrow 5.0.0, the TimeZone is > set to UTC: > {code:java} > >>> t3 = pq.read_table("example.parq"); > >>> t3 > pyarrow.Table > Dates: list > child 0, item: timestamp[us, tz=UTC] > {code} > > I noticed that the TimeZone is preserved in Arrow 5.0 when reading non-nested > timestamp columns. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-14104) Reading Lists of Timestamps from parquet files in Arrow 5.0.0 fails to preserve the TimeZone - unlike in Arrow 4.0.0
[ https://issues.apache.org/jira/browse/ARROW-14104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sarah Gilmore updated ARROW-14104: -- Attachment: exampleArrow5.parq exampleArrow4.parq > Reading Lists of Timestamps from parquet files in Arrow 5.0.0 fails to > preserve the TimeZone - unlike in Arrow 4.0.0 > > > Key: ARROW-14104 > URL: https://issues.apache.org/jira/browse/ARROW-14104 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Parquet, Python >Affects Versions: 5.0.0 >Reporter: Sarah Gilmore >Priority: Minor > Attachments: exampleArrow4.parq, exampleArrow5.parq > > > In Arrow 4.0.0 it is possible to round-trip the TimeZone property of > List columns to and from parquet files: > {code:java} > >>> import pyarrow as pa > >>> import pyarrow.parquet as pq > >>> import datetime > >>> column = pa.array([[datetime.datetime(2023, 9, 23, 11)]], > >>> pa.list_(pa.timestamp('us', 'America/New_York'))); > >>> t = pa.Table.from_arrays([column], names=['TimestampColumn']); > >>> pq.write_table(t, "example.parq"); > >>> t2 = pq.read_table("example.parq"); > >>> t2 > pyarrow.Table > Dates: list > child 0, item: timestamp[us, tz=America/New_York] > {code} > However, if you read the same parquet file in pyarrow 5.0.0, the TimeZone is > set to UTC: > {code:java} > >>> t3 = pq.read_table("example.parq"); > >>> t3 > pyarrow.Table > Dates: list > child 0, item: timestamp[us, tz=UTC] > {code} > > I noticed that the TimeZone is preserved in Arrow 5.0 when reading non-nested > timestamp columns. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-14104) Reading Lists of Timestamps from parquet files in Arrow 5.0.0 fails to preserve the TimeZone - unlike in Arrow 4.0.0
[ https://issues.apache.org/jira/browse/ARROW-14104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sarah Gilmore updated ARROW-14104: -- Description: In Arrow 4.0.0 it is possible to round-trip the TimeZone property of List columns to and from parquet files: {code:java} >>> import pyarrow as pa >>> import pyarrow.parquet as pq >>> import datetime >>> column = pa.array([[datetime.datetime(2023, 9, 23, 11)]], >>> pa.list_(pa.timestamp('us', 'America/New_York'))); >>> t = pa.Table.from_arrays([column], names=['TimestampColumn']); >>> pq.write_table(t, "example.parq"); >>> t2 = pq.read_table("example.parq"); >>> t2 pyarrow.Table Dates: list child 0, item: timestamp[us, tz=America/New_York] {code} However, if you read the same parquet file in pyarrow 5.0.0, the TimeZone is set to UTC: {code:java} >>> t3 = pq.read_table("example.parq"); >>> t3 pyarrow.Table Dates: list child 0, item: timestamp[us, tz=UTC] {code} I noticed that the TimeZone is preserved in Arrow 5.0 when reading non-nested timestamp columns. was: In Arrow 4.0.0 it is possible to round-trip the TimeZone property of List columns to and from parquet files: {code:java} >>> import pyarrow as pa >>> import pyarrow.parquet as pq >>> import datetime >>> column = pa.array([[datetime.datetime(2023, 9, 23, 11)]], >>> pa.list_(pa.timestamp('us', 'America/New_York'))); >>> t = pa.Table.from_arrays([column], names=['TimestampColumn']); >>> pq.write_table(t, "example.parq"); >>> t2 = pq.read_table("example.parq"); >>> t2 pyarrow.Table Dates: list child 0, item: timestamp[us, tz=America/Denver] {code} However, if you read the same parquet file in pyarrow 5.0.0, the TimeZone is set to UTC: {code:java} >>> t3 = pq.read_table("example.parq"); >>> t3 pyarrow.Table Dates: list child 0, item: timestamp[us, tz=UTC] {code} I noticed that the TimeZone is preserved in Arrow 5.0 when reading non-nested timestamp columns. > Reading Lists of Timestamps from parquet files in Arrow 5.0.0 fails to > preserve the TimeZone - unlike in Arrow 4.0.0 > > > Key: ARROW-14104 > URL: https://issues.apache.org/jira/browse/ARROW-14104 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Parquet, Python >Affects Versions: 5.0.0 >Reporter: Sarah Gilmore >Priority: Minor > > In Arrow 4.0.0 it is possible to round-trip the TimeZone property of > List columns to and from parquet files: > {code:java} > >>> import pyarrow as pa > >>> import pyarrow.parquet as pq > >>> import datetime > >>> column = pa.array([[datetime.datetime(2023, 9, 23, 11)]], > >>> pa.list_(pa.timestamp('us', 'America/New_York'))); > >>> t = pa.Table.from_arrays([column], names=['TimestampColumn']); > >>> pq.write_table(t, "example.parq"); > >>> t2 = pq.read_table("example.parq"); > >>> t2 > pyarrow.Table > Dates: list > child 0, item: timestamp[us, tz=America/New_York] > {code} > However, if you read the same parquet file in pyarrow 5.0.0, the TimeZone is > set to UTC: > {code:java} > >>> t3 = pq.read_table("example.parq"); > >>> t3 > pyarrow.Table > Dates: list > child 0, item: timestamp[us, tz=UTC] > {code} > > I noticed that the TimeZone is preserved in Arrow 5.0 when reading non-nested > timestamp columns. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-14104) Reading Lists of Timestamps from parquet files in Arrow 5.0.0 fails to preserve the TimeZone - unlike in Arrow 4.0.0
[ https://issues.apache.org/jira/browse/ARROW-14104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sarah Gilmore updated ARROW-14104: -- Description: In Arrow 4.0.0 it is possible to round-trip the TimeZone property of List columns to and from parquet files: {code:java} >>> import pyarrow as pa >>> import pyarrow.parquet as pq >>> import datetime >>> column = pa.array([[datetime.datetime(2023, 9, 23, 11)]], >>> pa.list_(pa.timestamp('us', 'America/New_York'))); >>> t = pa.Table.from_arrays([column], names=['TimestampColumn']); >>> pq.write_table(t, "example.parq"); >>> t2 = pq.read_table("example.parq"); >>> t2 pyarrow.Table Dates: list child 0, item: timestamp[us, tz=America/Denver] {code} However, if you read the same parquet file in pyarrow 5.0.0, the TimeZone is set to UTC: {code:java} >>> t3 = pq.read_table("example.parq"); >>> t3 pyarrow.Table Dates: list child 0, item: timestamp[us, tz=UTC] {code} I noticed that the TimeZone is preserved in Arrow 5.0 when reading non-nested timestamp columns. was: In Arrow 4.0.0 it is possible to round-trip the TimeZone property of List columns to and from parquet files: {code:java} >>> import pyarrow as pa >>> import pyarrow.parquet as pq >>> import datetime >>> column = pa.array([[datetime.datetime(2023, 9, 23, 11)]], >>> pa.list_(pa.timestamp('us', 'America/New_York'))); >>> t = pa.Table.from_arrays([column], name=['TimestampColumn']); >>> pq.write_table(t, "example.parq", version='2.0'); >>> t2 = pq.read_table("example.parq"); >>> t2 pyarrow.Table Dates: list child 0, item: timestamp[us, tz=America/Denver] {code} However, if you read the same parquet file in pyarrow 5.0.0, the TimeZone is set to UTC: {code:java} >>> t3 = pq.read_table("example.parq"); >>> t3 pyarrow.Table Dates: list child 0, item: timestamp[us, tz=UTC] {code} I noticed that the TimeZone is preserved in Arrow 5.0 when reading non-nested timestamp columns. > Reading Lists of Timestamps from parquet files in Arrow 5.0.0 fails to > preserve the TimeZone - unlike in Arrow 4.0.0 > > > Key: ARROW-14104 > URL: https://issues.apache.org/jira/browse/ARROW-14104 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Parquet, Python >Affects Versions: 5.0.0 >Reporter: Sarah Gilmore >Priority: Minor > > In Arrow 4.0.0 it is possible to round-trip the TimeZone property of > List columns to and from parquet files: > {code:java} > >>> import pyarrow as pa > >>> import pyarrow.parquet as pq > >>> import datetime > >>> column = pa.array([[datetime.datetime(2023, 9, 23, 11)]], > >>> pa.list_(pa.timestamp('us', 'America/New_York'))); > >>> t = pa.Table.from_arrays([column], names=['TimestampColumn']); > >>> pq.write_table(t, "example.parq"); > >>> t2 = pq.read_table("example.parq"); > >>> t2 > pyarrow.Table > Dates: list > child 0, item: timestamp[us, tz=America/Denver] > {code} > However, if you read the same parquet file in pyarrow 5.0.0, the TimeZone is > set to UTC: > {code:java} > >>> t3 = pq.read_table("example.parq"); > >>> t3 > pyarrow.Table > Dates: list > child 0, item: timestamp[us, tz=UTC] > {code} > > I noticed that the TimeZone is preserved in Arrow 5.0 when reading non-nested > timestamp columns. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-14104) Reading Lists of Timestamps from parquet files in Arrow 5.0.0 fails to preserve the TimeZone - unlike in Arrow 4.0.0
[ https://issues.apache.org/jira/browse/ARROW-14104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sarah Gilmore updated ARROW-14104: -- Description: In Arrow 4.0.0 it is possible to round-trip the TimeZone property of List columns to and from parquet files: {code:java} >>> import pyarrow as pa >>> import pyarrow.parquet as pq >>> import datetime >>> column = pa.array([[datetime.datetime(2023, 9, 23, 11)]], >>> pa.list_(pa.timestamp('us', 'America/New_York'))); >>> t = pa.Table.from_arrays([column], name=['TimestampColumn']); >>> pq.write_table(t, "example.parq", version='2.0'); >>> t2 = pq.read_table("example.parq"); >>> t2 pyarrow.Table Dates: list child 0, item: timestamp[us, tz=America/Denver] {code} However, if you read the same parquet file in pyarrow 5.0.0, the TimeZone is set to UTC: {code:java} >>> t3 = pq.read_table("example.parq"); >>> t3 pyarrow.Table Dates: list child 0, item: timestamp[us, tz=UTC] {code} I noticed that the TimeZone is preserved in Arrow 5.0 when reading non-nested timestamp columns. was: In Arrow 4.0.0 it is possible to round-trip the TimeZone property of List columns to and from parquet files: {code:java} >>> import pyarrow as pa >>> import pyarrow.parquet as pq >>> import datetime >>> column = pa.array([[datetime.datetime(2023, 9, 23, 11)]], >>> pa.list_(pa.timestamp('us', 'America/New_York'))); >>> t = pa.Table.from_arrays([column], name=['TimestampColumn']); >>> pq.write_table(t, "example.parq"); >>> t2 = pq.read_table("example.parq"); >>> t2 pyarrow.Table Dates: list child 0, item: timestamp[us, tz=America/Denver] {code} However, if you read the same parquet file in pyarrow 5.0.0, the TimeZone is set to UTC: {code:java} >>> t3 = pq.read_table("example.parq"); >>> t3 pyarrow.Table Dates: list child 0, item: timestamp[us, tz=UTC] {code} I noticed that the TimeZone is preserved in Arrow 5.0 when reading non-nested timestamp columns. > Reading Lists of Timestamps from parquet files in Arrow 5.0.0 fails to > preserve the TimeZone - unlike in Arrow 4.0.0 > > > Key: ARROW-14104 > URL: https://issues.apache.org/jira/browse/ARROW-14104 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Parquet, Python >Affects Versions: 5.0.0 >Reporter: Sarah Gilmore >Priority: Minor > > In Arrow 4.0.0 it is possible to round-trip the TimeZone property of > List columns to and from parquet files: > {code:java} > >>> import pyarrow as pa > >>> import pyarrow.parquet as pq > >>> import datetime > >>> column = pa.array([[datetime.datetime(2023, 9, 23, 11)]], > >>> pa.list_(pa.timestamp('us', 'America/New_York'))); > >>> t = pa.Table.from_arrays([column], name=['TimestampColumn']); > >>> pq.write_table(t, "example.parq", version='2.0'); > >>> t2 = pq.read_table("example.parq"); > >>> t2 > pyarrow.Table > Dates: list > child 0, item: timestamp[us, tz=America/Denver] > {code} > However, if you read the same parquet file in pyarrow 5.0.0, the TimeZone is > set to UTC: > {code:java} > >>> t3 = pq.read_table("example.parq"); > >>> t3 > pyarrow.Table > Dates: list > child 0, item: timestamp[us, tz=UTC] > {code} > > I noticed that the TimeZone is preserved in Arrow 5.0 when reading non-nested > timestamp columns. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-14104) Reading Lists of Timestamps from parquet files in Arrow 5.0.0 fails to preserve the TimeZone - unlike in Arrow 4.0.0
[ https://issues.apache.org/jira/browse/ARROW-14104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sarah Gilmore updated ARROW-14104: -- Description: In Arrow 4.0.0 it is possible to round-trip the TimeZone property of List columns to and from parquet files: {code:java} >>> import pyarrow as pa >>> import pyarrow.parquet as pq >>> import datetime >>> column = pa.array([[datetime.datetime(2023, 9, 23, 11)]], >>> pa.list_(pa.timestamp('us', 'America/New_York'))); >>> t = pa.Table.from_arrays([column], name=['TimestampColumn']); >>> pq.write_table(t, "example.parq"); >>> t2 = pq.read_table("example.parq"); >>> t2 pyarrow.Table Dates: list child 0, item: timestamp[us, tz=America/Denver] {code} However, if you read the same parquet file in pyarrow 5.0.0, the TimeZone is set to UTC: {code:java} >>> t3 = pq.read_table("example.parq"); >>> t3 pyarrow.Table Dates: list child 0, item: timestamp[us, tz=UTC] {code} I noticed that the TimeZone is preserved in Arrow 5.0 when reading non-nested timestamp columns. was: In Arrow 4.0.0 it is possible to round-trip the TimeZone property of List columns to and from parquet files: {code:java} >>> import pyarrow as pa >>> import pyarrow.parquet as pq >>> column = pa.array([[datetime.datetime(2023, 9, 23, 11)]], >>> pa.list_(pa.timestamp('us', 'America/New_York')); >>> t = pa.Table.from_arrays([column], name=['TimestampColumn']); >>> pq.write_table(t, "example.parq"); >>> t2 = pq.read_table("example.parq"); >>> t2 pyarrow.Table Dates: list child 0, item: timestamp[us, tz=America/Denver] {code} However, if you read the same parquet file in pyarrow 5.0.0, the TimeZone is set to UTC: {code:java} >>> t3 = pq.read_table("example.parq"); >>> t3 pyarrow.Table Dates: list child 0, item: timestamp[us, tz=UTC] {code} I noticed that the TimeZone is preserved in Arrow 5.0 when reading non-nested timestamp columns. > Reading Lists of Timestamps from parquet files in Arrow 5.0.0 fails to > preserve the TimeZone - unlike in Arrow 4.0.0 > > > Key: ARROW-14104 > URL: https://issues.apache.org/jira/browse/ARROW-14104 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Parquet, Python >Affects Versions: 5.0.0 >Reporter: Sarah Gilmore >Priority: Minor > > In Arrow 4.0.0 it is possible to round-trip the TimeZone property of > List columns to and from parquet files: > {code:java} > >>> import pyarrow as pa > >>> import pyarrow.parquet as pq > >>> import datetime > >>> column = pa.array([[datetime.datetime(2023, 9, 23, 11)]], > >>> pa.list_(pa.timestamp('us', 'America/New_York'))); > >>> t = pa.Table.from_arrays([column], name=['TimestampColumn']); > >>> pq.write_table(t, "example.parq"); > >>> t2 = pq.read_table("example.parq"); > >>> t2 > pyarrow.Table > Dates: list > child 0, item: timestamp[us, tz=America/Denver] > {code} > However, if you read the same parquet file in pyarrow 5.0.0, the TimeZone is > set to UTC: > {code:java} > >>> t3 = pq.read_table("example.parq"); > >>> t3 > pyarrow.Table > Dates: list > child 0, item: timestamp[us, tz=UTC] > {code} > > I noticed that the TimeZone is preserved in Arrow 5.0 when reading non-nested > timestamp columns. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14104) Reading Lists of Timestamps from parquet files in Arrow 5.0.0 fails to preserve the TimeZone - unlike in Arrow 4.0.0
Sarah Gilmore created ARROW-14104: - Summary: Reading Lists of Timestamps from parquet files in Arrow 5.0.0 fails to preserve the TimeZone - unlike in Arrow 4.0.0 Key: ARROW-14104 URL: https://issues.apache.org/jira/browse/ARROW-14104 Project: Apache Arrow Issue Type: Bug Components: C++, Parquet, Python Affects Versions: 5.0.0 Reporter: Sarah Gilmore In Arrow 4.0.0 it is possible to round-trip the TimeZone property of List columns to and from parquet files: {code:java} >>> import pyarrow as pa >>> import pyarrow.parquet as pq >>> column = pa.array([[datetime.datetime(2023, 9, 23, 11)]], >>> pa.list_(pa.timestamp('us', 'America/New_York')); >>> t = pa.Table.from_arrays([column], name=['TimestampColumn']); >>> pq.write_table(t, "example.parq"); >>> t2 = pq.read_table("example.parq"); >>> t2 pyarrow.Table Dates: list child 0, item: timestamp[us, tz=America/Denver] {code} However, if you read the same parquet file in pyarrow 5.0.0, the TimeZone is set to UTC: {code:java} >>> t3 = pq.read_table("example.parq"); >>> t3 pyarrow.Table Dates: list child 0, item: timestamp[us, tz=UTC] {code} I noticed that the TimeZone is preserved in Arrow 5.0 when reading non-nested timestamp columns. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-13185) [MATLAB] Consider alternatives to placing the MEX binaries within the source tree
[ https://issues.apache.org/jira/browse/ARROW-13185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sarah Gilmore reassigned ARROW-13185: - Assignee: Sarah Gilmore > [MATLAB] Consider alternatives to placing the MEX binaries within the source > tree > - > > Key: ARROW-13185 > URL: https://issues.apache.org/jira/browse/ARROW-13185 > Project: Apache Arrow > Issue Type: Task > Components: MATLAB >Reporter: Sarah Gilmore >Assignee: Sarah Gilmore >Priority: Minor > > Since modifying the source directory via the build process is generally > considered non-optimal, we may want to explore alternative approaches. For > example, during the build process, we could create a derived source tree (a > copy of the original source tree) within the build area and place our build > artifacts within the derived source tree. Then, we could add the derived > source tree to the MATLAB search path. That's just one option, but there are > others we could explore. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-12855) error: no member named 'TableReader' in namespace during compilation
[ https://issues.apache.org/jira/browse/ARROW-12855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sarah Gilmore reassigned ARROW-12855: - Assignee: Sarah Gilmore > error: no member named 'TableReader' in namespace during compilation > > > Key: ARROW-12855 > URL: https://issues.apache.org/jira/browse/ARROW-12855 > Project: Apache Arrow > Issue Type: Bug > Components: MATLAB >Affects Versions: 4.0.0 > Environment: MATLAB 2020a, Mac OS 11.2.1 >Reporter: Andraž Matkovič >Assignee: Sarah Gilmore >Priority: Major > Labels: matlab > > I followed instructions for compilation of arrow under MATLAB > ([https://github.com/apache/arrow/tree/master/matlab).] First I set > environment variable ARROW_HOME, e.g. > > {code:java} > setenv ARROW_HOME ~/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pyarrow > {code} > (I also tried other pyarrow versions, even /usr/local, it's always the same). > > Next, when I run compile in MATLAB I get the following error: > {code:java} > Verbose mode is on.Verbose mode is on Looking for compiler 'Xcode > Clang++' .. Looking for environment variable 'DEVELOPER_DIR' ...No > Executing command 'xcode-select -print-path' ...Yes > ('/Applications/Xcode.app/Contents/Developer') Looking for folder > '/Applications/Xcode.app/Contents/Developer' ...Yes Executing command > 'which xcrun' ...Yes ('/usr/bin/xcrun') Looking for folder '/usr/bin' > ...Yes Executing command 'defaults read com.apple.dt.Xcode > IDEXcodeVersionForAgreedToGMLicense' ...No Executing command 'defaults > read /Library/Preferences/com.apple.dt.Xcode > IDEXcodeVersionForAgreedToGMLicense' ...Yes ('11.0') Executing command > 'agreed=11.0 if echo $agreed | grep -E '[\.\"]' >/dev/null; then lhs=`expr > "$agreed" : '\([0-9]*\)[\.].*'` rhs=`expr "$agreed" : '[0-9]*[\.]\(.*\)$'` > if echo $rhs | grep -E '[\."]' >/dev/null; then rhs=`expr "$rhs" : > '\([0-9]*\)[\.].*'` fi if [ $lhs -gt 4 ] || ( [ $lhs -eq 4 ] && [ $rhs -ge > 3 ] ); then echo $agreed else exit 1 fi fi' ...Yes ('11.0') Executing > command 'xcrun -sdk macosx --show-sdk-path' ...Yes > ('/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk') > Executing command 'xcrun -sdk macosx --show-sdk-version | awk 'BEGIN > {FS="."} ; {print $1"."$2}'' ...Yes ('11.1') Executing command 'clang > --version | grep -Eo '[0-9]+\.[0-9]+\.[0-9]'|head -1' ...Yes ('12.0.0').Found > installed compiler 'Xcode Clang++'.Set INCLUDE = > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1;/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/12.0.0/include;/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include;/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk/usr/include;/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk/System/Library/Frameworks;/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1;/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/12.0.0/include;/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include;/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk/usr/include;/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk/System/Library/Frameworks;Options > file > details--- > Compiler location: /Applications/Xcode.app/Contents/Developer Options file: > /Users/andraz/Library/Application > Support/MathWorks/MATLAB/R2020a/mex_C++_maci64.xml CMDLINE200 : > /usr/bin/xcrun -sdk macosx11.1 clang++ \-Wl,-twolevel_namespace -undefined > error -arch x86_64 -mmacosx-version-min=10.9 > -Wl,-syslibroot,/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk > -framework Cocoa -bundle -stdlib=libc++ -Wl,-rpath '/usr/local/lib' -O > -Wl,-exported_symbols_list,"/Applications/MATLAB_R2020a.app/extern/lib/maci64/mexFunction.map" > > -Wl,-exported_symbols_list,"/Applications/MATLAB_R2020a.app/extern/lib/maci64/c_exportsmexfileversion.map" > -Wl,-U,_mexCreateMexFunction -Wl,-U,_mexDestroyMexFunction > -Wl,-U,_mexFunctionAdapter > -Wl,-exported_symbols_list,"/Applications/MATLAB_R2020a.app/extern/lib/maci64/cppMexFunction.map" > > /var/folders/s1/f1fgqkcs6bs4c13v_50btmd4gn/T/mex_5123220870320_661/featherreadmex.o > > /var/folders/s1/f1
[jira] [Commented] (ARROW-12855) error: no member named 'TableReader' in namespace during compilation
[ https://issues.apache.org/jira/browse/ARROW-12855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370802#comment-17370802 ] Sarah Gilmore commented on ARROW-12855: --- Hi [~amatkovic], We actually just fixed this issue in a recent pull request that just got accepted yesterday. Here's a link to the pull request: [https://github.com/apache/arrow/pull/10305] and here's a link to the JIRA issue associated with it: https://issues.apache.org/jira/browse/ARROW-12730 Could you try pulling in the most recent changes from the master branch and building again? Sorry about this. Best, Sarah > error: no member named 'TableReader' in namespace during compilation > > > Key: ARROW-12855 > URL: https://issues.apache.org/jira/browse/ARROW-12855 > Project: Apache Arrow > Issue Type: Bug > Components: MATLAB >Affects Versions: 4.0.0 > Environment: MATLAB 2020a, Mac OS 11.2.1 >Reporter: Andraž Matkovič >Priority: Major > Labels: matlab > > I followed instructions for compilation of arrow under MATLAB > ([https://github.com/apache/arrow/tree/master/matlab).] First I set > environment variable ARROW_HOME, e.g. > > {code:java} > setenv ARROW_HOME ~/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pyarrow > {code} > (I also tried other pyarrow versions, even /usr/local, it's always the same). > > Next, when I run compile in MATLAB I get the following error: > {code:java} > Verbose mode is on.Verbose mode is on Looking for compiler 'Xcode > Clang++' .. Looking for environment variable 'DEVELOPER_DIR' ...No > Executing command 'xcode-select -print-path' ...Yes > ('/Applications/Xcode.app/Contents/Developer') Looking for folder > '/Applications/Xcode.app/Contents/Developer' ...Yes Executing command > 'which xcrun' ...Yes ('/usr/bin/xcrun') Looking for folder '/usr/bin' > ...Yes Executing command 'defaults read com.apple.dt.Xcode > IDEXcodeVersionForAgreedToGMLicense' ...No Executing command 'defaults > read /Library/Preferences/com.apple.dt.Xcode > IDEXcodeVersionForAgreedToGMLicense' ...Yes ('11.0') Executing command > 'agreed=11.0 if echo $agreed | grep -E '[\.\"]' >/dev/null; then lhs=`expr > "$agreed" : '\([0-9]*\)[\.].*'` rhs=`expr "$agreed" : '[0-9]*[\.]\(.*\)$'` > if echo $rhs | grep -E '[\."]' >/dev/null; then rhs=`expr "$rhs" : > '\([0-9]*\)[\.].*'` fi if [ $lhs -gt 4 ] || ( [ $lhs -eq 4 ] && [ $rhs -ge > 3 ] ); then echo $agreed else exit 1 fi fi' ...Yes ('11.0') Executing > command 'xcrun -sdk macosx --show-sdk-path' ...Yes > ('/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk') > Executing command 'xcrun -sdk macosx --show-sdk-version | awk 'BEGIN > {FS="."} ; {print $1"."$2}'' ...Yes ('11.1') Executing command 'clang > --version | grep -Eo '[0-9]+\.[0-9]+\.[0-9]'|head -1' ...Yes ('12.0.0').Found > installed compiler 'Xcode Clang++'.Set INCLUDE = > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1;/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/12.0.0/include;/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include;/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk/usr/include;/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk/System/Library/Frameworks;/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1;/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/12.0.0/include;/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include;/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk/usr/include;/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk/System/Library/Frameworks;Options > file > details--- > Compiler location: /Applications/Xcode.app/Contents/Developer Options file: > /Users/andraz/Library/Application > Support/MathWorks/MATLAB/R2020a/mex_C++_maci64.xml CMDLINE200 : > /usr/bin/xcrun -sdk macosx11.1 clang++ \-Wl,-twolevel_namespace -undefined > error -arch x86_64 -mmacosx-version-min=10.9 > -Wl,-syslibroot,/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk > -framework Cocoa -bundle -stdlib=libc++ -Wl,-rpath '/usr/local/lib' -O > -Wl,-exported_symbols_list,"/Applications/MATLAB_R2020a.app/extern/lib/maci64/mexFunctio
[jira] [Created] (ARROW-13185) [MATLAB] Consider alternatives to placing the MEX binaries within the source tree
Sarah Gilmore created ARROW-13185: - Summary: [MATLAB] Consider alternatives to placing the MEX binaries within the source tree Key: ARROW-13185 URL: https://issues.apache.org/jira/browse/ARROW-13185 Project: Apache Arrow Issue Type: Task Components: MATLAB Reporter: Sarah Gilmore Since modifying the source directory via the build process is generally considered non-optimal, we may want to explore alternative approaches. For example, during the build process, we could create a derived source tree (a copy of the original source tree) within the build area and place our build artifacts within the derived source tree. Then, we could add the derived source tree to the MATLAB search path. That's just one option, but there are others we could explore. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12754) [MATLAB] Create the LifetimeManager C++ class
Sarah Gilmore created ARROW-12754: - Summary: [MATLAB] Create the LifetimeManager C++ class Key: ARROW-12754 URL: https://issues.apache.org/jira/browse/ARROW-12754 Project: Apache Arrow Issue Type: Sub-task Components: MATLAB Reporter: Sarah Gilmore LifetimeManager is a singleton that each arrow.Array subclass will interact with to keep their corresponding C++ data structures alive. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12753) [MATLAB] Create a templated ObjectMap for storing arrow C++ data structures with IDs
Sarah Gilmore created ARROW-12753: - Summary: [MATLAB] Create a templated ObjectMap for storing arrow C++ data structures with IDs Key: ARROW-12753 URL: https://issues.apache.org/jira/browse/ARROW-12753 Project: Apache Arrow Issue Type: Sub-task Components: MATLAB Reporter: Sarah Gilmore In order to keep arrow C++ data structures alive for the duration of the wrapping MATLAB object (e.g. arrow.Array), we can store the arrow C++ data structure in a map indexed by a unique ID. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12752) [MATLAB] Implement LifetimeManager for managing arrow memory lifetime
Sarah Gilmore created ARROW-12752: - Summary: [MATLAB] Implement LifetimeManager for managing arrow memory lifetime Key: ARROW-12752 URL: https://issues.apache.org/jira/browse/ARROW-12752 Project: Apache Arrow Issue Type: Task Components: MATLAB Reporter: Sarah Gilmore When we create an arrow object in MATLAB (e.g. arrow.Array) we need to ensure the underlying arrow C++ data structures stay alive for the duration of the wrapping MATLAB object's lifetime. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12730) [MATLAB] Update featherreadmex and featherwritemex to build against latest arrow c++ APIs
Sarah Gilmore created ARROW-12730: - Summary: [MATLAB] Update featherreadmex and featherwritemex to build against latest arrow c++ APIs Key: ARROW-12730 URL: https://issues.apache.org/jira/browse/ARROW-12730 Project: Apache Arrow Issue Type: Task Components: MATLAB Reporter: Sarah Gilmore Assignee: Sarah Gilmore The mex functions featherreadmex and featherwritemex currently do not compile if you are using the latest arrow c++ APIs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-12349) [MATLAB] add support for converting MATLAB numeric arrays to arrow::NumericArrays
[ https://issues.apache.org/jira/browse/ARROW-12349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sarah Gilmore updated ARROW-12349: -- Summary: [MATLAB] add support for converting MATLAB numeric arrays to arrow::NumericArrays (was: [MATLAB] add support for converting a MATLAB uint64 array to an arrow::NumericArrays arrow::NumericArray) > [MATLAB] add support for converting MATLAB numeric arrays to > arrow::NumericArrays > -- > > Key: ARROW-12349 > URL: https://issues.apache.org/jira/browse/ARROW-12349 > Project: Apache Arrow > Issue Type: Task > Components: MATLAB >Reporter: Sarah Gilmore >Assignee: Sarah Gilmore >Priority: Minor > > Create a C++ function that accepts a MALTAB uint64 array and converts it into > a arrow::NumericArray. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-12349) [MATLAB] add support for converting a MATLAB uint64 array to an arrow::NumericArrays arrow::NumericArray
[ https://issues.apache.org/jira/browse/ARROW-12349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sarah Gilmore reassigned ARROW-12349: - Assignee: Sarah Gilmore > [MATLAB] add support for converting a MATLAB uint64 array to an > arrow::NumericArrays arrow::NumericArray > --- > > Key: ARROW-12349 > URL: https://issues.apache.org/jira/browse/ARROW-12349 > Project: Apache Arrow > Issue Type: Task > Components: MATLAB >Reporter: Sarah Gilmore >Assignee: Sarah Gilmore >Priority: Minor > > Create a C++ function that accepts a MALTAB uint64 array and converts it into > a arrow::NumericArray. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-12368) [MATLAB] create a matlab2mex function
[ https://issues.apache.org/jira/browse/ARROW-12368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sarah Gilmore reassigned ARROW-12368: - Assignee: Sarah Gilmore > [MATLAB] create a matlab2mex function > - > > Key: ARROW-12368 > URL: https://issues.apache.org/jira/browse/ARROW-12368 > Project: Apache Arrow > Issue Type: Task > Components: MATLAB >Reporter: Sarah Gilmore >Assignee: Sarah Gilmore >Priority: Minor > > Create a function that takes a native numeric MATLAB array and converts it > into a form that can be manipulated in a C++ MEX function. Once the data is > accessible inside a MEX function, it can be converted into the corresponding > arrow::NumericArray type. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12368) [MATLAB] create a matlab2mex function
Sarah Gilmore created ARROW-12368: - Summary: [MATLAB] create a matlab2mex function Key: ARROW-12368 URL: https://issues.apache.org/jira/browse/ARROW-12368 Project: Apache Arrow Issue Type: Task Components: MATLAB Reporter: Sarah Gilmore Create a function that takes a native numeric MATLAB array and converts it into a form that can be manipulated in a C++ MEX function. Once the data is accessible inside a MEX function, it can be converted into the corresponding arrow::NumericArray type. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12349) [MATLAB] add support for converting a MATLAB uint64 array to an arrow::NumericArrays arrow::NumericArray
Sarah Gilmore created ARROW-12349: - Summary: [MATLAB] add support for converting a MATLAB uint64 array to an arrow::NumericArrays arrow::NumericArray Key: ARROW-12349 URL: https://issues.apache.org/jira/browse/ARROW-12349 Project: Apache Arrow Issue Type: Task Components: MATLAB Reporter: Sarah Gilmore Create a C++ function that accepts a MALTAB uint64 array and converts it into a arrow::NumericArray. -- This message was sent by Atlassian Jira (v8.3.4#803005)