[jira] [Commented] (ARROW-1500) [C++] Result of ftruncate ignored in MemoryMappedFile::Create
[ https://issues.apache.org/jira/browse/ARROW-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172769#comment-16172769 ] ASF GitHub Bot commented on ARROW-1500: --- Github user amirma commented on the issue: https://github.com/apache/arrow/pull/1116 @wesm Rebased. > [C++] Result of ftruncate ignored in MemoryMappedFile::Create > - > > Key: ARROW-1500 > URL: https://issues.apache.org/jira/browse/ARROW-1500 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Amir Malekpour > Labels: pull-request-available > Fix For: 0.8.0 > > > Observed in gcc 5.4.0 release build -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1553) [JAVA] Implement setInitialCapacity for MapWriter and pass on this capacity during lazy creation of child vectors
[ https://issues.apache.org/jira/browse/ARROW-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172757#comment-16172757 ] ASF GitHub Bot commented on ARROW-1553: --- Github user siddharthteotia commented on the issue: https://github.com/apache/arrow/pull/1113 Can this be merged? > [JAVA] Implement setInitialCapacity for MapWriter and pass on this capacity > during lazy creation of child vectors > - > > Key: ARROW-1553 > URL: https://issues.apache.org/jira/browse/ARROW-1553 > Project: Apache Arrow > Issue Type: Bug > Components: Java - Vectors >Reporter: Siddharth Teotia >Assignee: Siddharth Teotia > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
[ https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172749#comment-16172749 ] ASF GitHub Bot commented on ARROW-1557: --- Github user wesm commented on the issue: https://github.com/apache/arrow/pull/1117 here's a fix to cherry pick https://github.com/wesm/arrow/commit/965a560867f45025dcbfe50c572593faa7d7cb33 > [PYTHON] pyarrow.Table.from_arrays doesn't validate names length > > > Key: ARROW-1557 > URL: https://issues.apache.org/jira/browse/ARROW-1557 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Assignee: Tom Augspurger >Priority: Minor > Labels: pull-request-available > Fix For: 0.8.0 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > pa.Table.from_arrays doesn't validate that the length of {{arrays}} and > {{names}} matches. I think this should raise with a {{ValueError}}: > {code} > In [1]: import pyarrow as pa > In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], > names=['a', 'b', 'c']) > Out[2]: > pyarrow.Table > a: int64 > b: int64 > In [3]: pa.__version__ > Out[3]: '0.7.0' > {code} > (This is my first time using JIRA, hopefully I didn't mess up too badly) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
[ https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172745#comment-16172745 ] ASF GitHub Bot commented on ARROW-1557: --- Github user wesm commented on the issue: https://github.com/apache/arrow/pull/1117 Appears there is a test failure that was exposed by this patch, can you fix? > [PYTHON] pyarrow.Table.from_arrays doesn't validate names length > > > Key: ARROW-1557 > URL: https://issues.apache.org/jira/browse/ARROW-1557 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Assignee: Tom Augspurger >Priority: Minor > Labels: pull-request-available > Fix For: 0.8.0 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > pa.Table.from_arrays doesn't validate that the length of {{arrays}} and > {{names}} matches. I think this should raise with a {{ValueError}}: > {code} > In [1]: import pyarrow as pa > In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], > names=['a', 'b', 'c']) > Out[2]: > pyarrow.Table > a: int64 > b: int64 > In [3]: pa.__version__ > Out[3]: '0.7.0' > {code} > (This is my first time using JIRA, hopefully I didn't mess up too badly) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1500) [C++] Result of ftruncate ignored in MemoryMappedFile::Create
[ https://issues.apache.org/jira/browse/ARROW-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172744#comment-16172744 ] ASF GitHub Bot commented on ARROW-1500: --- Github user wesm commented on the issue: https://github.com/apache/arrow/pull/1116 Can you rebase? Not sure why there's a merge conflict now > [C++] Result of ftruncate ignored in MemoryMappedFile::Create > - > > Key: ARROW-1500 > URL: https://issues.apache.org/jira/browse/ARROW-1500 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Amir Malekpour > Labels: pull-request-available > Fix For: 0.8.0 > > > Observed in gcc 5.4.0 release build -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-269) UnionVector getBuffers method does not include typevector
[ https://issues.apache.org/jira/browse/ARROW-269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-269: --- Fix Version/s: (was: 1.0.0) 0.7.0 > UnionVector getBuffers method does not include typevector > - > > Key: ARROW-269 > URL: https://issues.apache.org/jira/browse/ARROW-269 > Project: Apache Arrow > Issue Type: Bug >Reporter: Steven Phillips > Fix For: 0.7.0 > > > Only the interMapVecgtor's buffers are returned currently. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (ARROW-269) UnionVector getBuffers method does not include typevector
[ https://issues.apache.org/jira/browse/ARROW-269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-269. Resolution: Fixed Assignee: Steven Phillips https://github.com/apache/arrow/commit/ec51d566708f5d6ea0a94a6d53152dc8cc98d6aa > UnionVector getBuffers method does not include typevector > - > > Key: ARROW-269 > URL: https://issues.apache.org/jira/browse/ARROW-269 > Project: Apache Arrow > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Steven Phillips > Fix For: 0.7.0 > > > Only the interMapVecgtor's buffers are returned currently. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-269) UnionVector getBuffers method does not include typevector
[ https://issues.apache.org/jira/browse/ARROW-269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172700#comment-16172700 ] Li Jin commented on ARROW-269: -- [~wesmckinn] this is fixed by https://github.com/apache/arrow/commit/ec51d566708f5d6ea0a94a6d53152dc8cc98d6aa > UnionVector getBuffers method does not include typevector > - > > Key: ARROW-269 > URL: https://issues.apache.org/jira/browse/ARROW-269 > Project: Apache Arrow > Issue Type: Bug >Reporter: Steven Phillips > Fix For: 1.0.0 > > > Only the interMapVecgtor's buffers are returned currently. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (ARROW-1554) [Python] Document that pip wheels depend on MSVC14 runtime
[ https://issues.apache.org/jira/browse/ARROW-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-1554. - Resolution: Fixed Issue resolved by pull request 1115 [https://github.com/apache/arrow/pull/1115] > [Python] Document that pip wheels depend on MSVC14 runtime > -- > > Key: ARROW-1554 > URL: https://issues.apache.org/jira/browse/ARROW-1554 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 > Environment: Windows 10 (x64) > Python 3.6.2 (x64) >Reporter: Dima Ryazanov >Assignee: Wes McKinney > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: parquet_dependencies.png, Process Monitor.png > > > I just tried pyarrow on Windows 10, and it fails to import for me: > {code} > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\site-packages\pyarrow\__init__.py", > line 32, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: DLL load failed: The specified module could not be found. > {code} > Not sure which DLL is failing, but I do see some DLLs in the pyarrow folder: > {code} > C:\Users\dima\Documents>dir "C:\Program > Files\Python36\lib\site-packages\pyarrow\" > Volume in drive C has no label. > Volume Serial Number is 4CE9-CC3C > Directory of C:\Program Files\Python36\lib\site-packages\pyarrow > 09/19/2017 01:14 AM . > 09/19/2017 01:14 AM .. > 09/19/2017 01:14 AM 2,382,336 arrow.dll > 09/19/2017 01:14 AM 604,160 arrow_python.dll > 09/19/2017 01:14 AM 3,402 compat.py > ... > {code} > However, I cannot open them using ctypes.cdll. I wonder if some dependency is > missing? > {code} > >>> open('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll', 'rb') > <_io.BufferedReader name='C:\\Program > Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll'> > >>> > >>> cdll.LoadLibrary('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll') > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 426, in > LoadLibrary > return self._dlltype(name) > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 348, in > __init__ > self._handle = _dlopen(self._name, mode) > OSError: [WinError 126] The specified module could not be found > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1554) [Python] Document that pip wheels depend on MSVC14 runtime
[ https://issues.apache.org/jira/browse/ARROW-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172687#comment-16172687 ] ASF GitHub Bot commented on ARROW-1554: --- Github user wesm commented on the issue: https://github.com/apache/arrow/pull/1115 +1. The Travis failure appears due to a transient apt problem > [Python] Document that pip wheels depend on MSVC14 runtime > -- > > Key: ARROW-1554 > URL: https://issues.apache.org/jira/browse/ARROW-1554 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 > Environment: Windows 10 (x64) > Python 3.6.2 (x64) >Reporter: Dima Ryazanov >Assignee: Wes McKinney > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: parquet_dependencies.png, Process Monitor.png > > > I just tried pyarrow on Windows 10, and it fails to import for me: > {code} > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\site-packages\pyarrow\__init__.py", > line 32, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: DLL load failed: The specified module could not be found. > {code} > Not sure which DLL is failing, but I do see some DLLs in the pyarrow folder: > {code} > C:\Users\dima\Documents>dir "C:\Program > Files\Python36\lib\site-packages\pyarrow\" > Volume in drive C has no label. > Volume Serial Number is 4CE9-CC3C > Directory of C:\Program Files\Python36\lib\site-packages\pyarrow > 09/19/2017 01:14 AM . > 09/19/2017 01:14 AM .. > 09/19/2017 01:14 AM 2,382,336 arrow.dll > 09/19/2017 01:14 AM 604,160 arrow_python.dll > 09/19/2017 01:14 AM 3,402 compat.py > ... > {code} > However, I cannot open them using ctypes.cdll. I wonder if some dependency is > missing? > {code} > >>> open('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll', 'rb') > <_io.BufferedReader name='C:\\Program > Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll'> > >>> > >>> cdll.LoadLibrary('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll') > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 426, in > LoadLibrary > return self._dlltype(name) > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 348, in > __init__ > self._handle = _dlopen(self._name, mode) > OSError: [WinError 126] The specified module could not be found > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1192) [JAVA] Improve splitAndTransfer performance for List and Union vectors
[ https://issues.apache.org/jira/browse/ARROW-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172693#comment-16172693 ] ASF GitHub Bot commented on ARROW-1192: --- Github user asfgit closed the pull request at: https://github.com/apache/arrow/pull/819 > [JAVA] Improve splitAndTransfer performance for List and Union vectors > -- > > Key: ARROW-1192 > URL: https://issues.apache.org/jira/browse/ARROW-1192 > Project: Apache Arrow > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Steven Phillips > Labels: pull-request-available > Fix For: 0.6.0 > > > Most vector implementations slice the underlying buffer for splitAndTransfer, > but ListVector and UnionVector copy data into a new buffer. We should enhance > these to use slice as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1554) [Python] Document that pip wheels depend on MSVC14 runtime
[ https://issues.apache.org/jira/browse/ARROW-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172692#comment-16172692 ] ASF GitHub Bot commented on ARROW-1554: --- Github user asfgit closed the pull request at: https://github.com/apache/arrow/pull/1115 > [Python] Document that pip wheels depend on MSVC14 runtime > -- > > Key: ARROW-1554 > URL: https://issues.apache.org/jira/browse/ARROW-1554 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 > Environment: Windows 10 (x64) > Python 3.6.2 (x64) >Reporter: Dima Ryazanov >Assignee: Wes McKinney > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: parquet_dependencies.png, Process Monitor.png > > > I just tried pyarrow on Windows 10, and it fails to import for me: > {code} > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\site-packages\pyarrow\__init__.py", > line 32, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: DLL load failed: The specified module could not be found. > {code} > Not sure which DLL is failing, but I do see some DLLs in the pyarrow folder: > {code} > C:\Users\dima\Documents>dir "C:\Program > Files\Python36\lib\site-packages\pyarrow\" > Volume in drive C has no label. > Volume Serial Number is 4CE9-CC3C > Directory of C:\Program Files\Python36\lib\site-packages\pyarrow > 09/19/2017 01:14 AM . > 09/19/2017 01:14 AM .. > 09/19/2017 01:14 AM 2,382,336 arrow.dll > 09/19/2017 01:14 AM 604,160 arrow_python.dll > 09/19/2017 01:14 AM 3,402 compat.py > ... > {code} > However, I cannot open them using ctypes.cdll. I wonder if some dependency is > missing? > {code} > >>> open('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll', 'rb') > <_io.BufferedReader name='C:\\Program > Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll'> > >>> > >>> cdll.LoadLibrary('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll') > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 426, in > LoadLibrary > return self._dlltype(name) > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 348, in > __init__ > self._handle = _dlopen(self._name, mode) > OSError: [WinError 126] The specified module could not be found > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
[ https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-1557: --- Assignee: Tom Augspurger > [PYTHON] pyarrow.Table.from_arrays doesn't validate names length > > > Key: ARROW-1557 > URL: https://issues.apache.org/jira/browse/ARROW-1557 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Assignee: Tom Augspurger >Priority: Minor > Labels: pull-request-available > Fix For: 0.8.0 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > pa.Table.from_arrays doesn't validate that the length of {{arrays}} and > {{names}} matches. I think this should raise with a {{ValueError}}: > {code} > In [1]: import pyarrow as pa > In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], > names=['a', 'b', 'c']) > Out[2]: > pyarrow.Table > a: int64 > b: int64 > In [3]: pa.__version__ > Out[3]: '0.7.0' > {code} > (This is my first time using JIRA, hopefully I didn't mess up too badly) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (ARROW-1536) [C++] Do not transitively depend on libboost_system
[ https://issues.apache.org/jira/browse/ARROW-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-1536: --- Assignee: Deepak Majeti > [C++] Do not transitively depend on libboost_system > --- > > Key: ARROW-1536 > URL: https://issues.apache.org/jira/browse/ARROW-1536 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.7.0 >Reporter: Wes McKinney >Assignee: Deepak Majeti > Labels: pull-request-available > Fix For: 0.8.0 > > > We picked up this dependency recently. I don't think this is a blocker for > 0.7.0, but it impacts static linkers (e.g. linkers of parquet-cpp) > This was introduced in ARROW-1339 > https://github.com/apache/arrow/commit/94b7cfafae0bda8f68ee3e5e9702c954b0116203 > cc [~mdeepak] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1536) [C++] Do not transitively depend on libboost_system
[ https://issues.apache.org/jira/browse/ARROW-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172684#comment-16172684 ] ASF GitHub Bot commented on ARROW-1536: --- Github user asfgit closed the pull request at: https://github.com/apache/arrow/pull/1105 > [C++] Do not transitively depend on libboost_system > --- > > Key: ARROW-1536 > URL: https://issues.apache.org/jira/browse/ARROW-1536 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.7.0 >Reporter: Wes McKinney > Labels: pull-request-available > Fix For: 0.8.0 > > > We picked up this dependency recently. I don't think this is a blocker for > 0.7.0, but it impacts static linkers (e.g. linkers of parquet-cpp) > This was introduced in ARROW-1339 > https://github.com/apache/arrow/commit/94b7cfafae0bda8f68ee3e5e9702c954b0116203 > cc [~mdeepak] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (ARROW-1536) [C++] Do not transitively depend on libboost_system
[ https://issues.apache.org/jira/browse/ARROW-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-1536. - Resolution: Fixed Issue resolved by pull request 1105 [https://github.com/apache/arrow/pull/1105] > [C++] Do not transitively depend on libboost_system > --- > > Key: ARROW-1536 > URL: https://issues.apache.org/jira/browse/ARROW-1536 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.7.0 >Reporter: Wes McKinney > Labels: pull-request-available > Fix For: 0.8.0 > > > We picked up this dependency recently. I don't think this is a blocker for > 0.7.0, but it impacts static linkers (e.g. linkers of parquet-cpp) > This was introduced in ARROW-1339 > https://github.com/apache/arrow/commit/94b7cfafae0bda8f68ee3e5e9702c954b0116203 > cc [~mdeepak] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1577) [JS] Package release script for NPM modules
Wes McKinney created ARROW-1577: --- Summary: [JS] Package release script for NPM modules Key: ARROW-1577 URL: https://issues.apache.org/jira/browse/ARROW-1577 Project: Apache Arrow Issue Type: New Feature Components: JavaScript Affects Versions: 0.8.0 Reporter: Wes McKinney Since the NPM JavaScript module may wish to release more frequently than the main Arrow "monorepo", we should create a script to produce signed NPM artifacts to use for voting: * Update metadata for new version * Run unit tests * Create package tarballs with NPM * GPG sign and create md5 and sha512 checksum files * Upload to Apache dev SVN i.e. like https://github.com/apache/arrow/blob/master/dev/release/02-source.sh, but only for JavaScript. We will also want to write instructions for Arrow developers to verify the tarballs to streamline the release votes -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1548) [GLib] Support build append in builder
[ https://issues.apache.org/jira/browse/ARROW-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172629#comment-16172629 ] ASF GitHub Bot commented on ARROW-1548: --- Github user kou commented on the issue: https://github.com/apache/arrow/pull/1110 Emacs very helps me. :) > [GLib] Support build append in builder > -- > > Key: ARROW-1548 > URL: https://issues.apache.org/jira/browse/ARROW-1548 > Project: Apache Arrow > Issue Type: New Feature > Components: GLib >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou > Labels: pull-request-available > Fix For: 0.8.0 > > > It improves performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1209) [C++] Implement converter between Arrow record batches and Avro records
[ https://issues.apache.org/jira/browse/ARROW-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172621#comment-16172621 ] ASF GitHub Bot commented on ARROW-1209: --- Github user wesm commented on the issue: https://github.com/apache/arrow/pull/1026 Hm, yeah I'm looking at avro-c and it's not very Windows-friendly. We can use FILE* on Windows in Arrow but that won't work on files over 2GB. But maybe that's OK. > [C++] Implement converter between Arrow record batches and Avro records > --- > > Key: ARROW-1209 > URL: https://issues.apache.org/jira/browse/ARROW-1209 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney > Labels: pull-request-available > > This would be useful for streaming systems that need to consume or produce > Avro in C/C++ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1209) [C++] Implement converter between Arrow record batches and Avro records
[ https://issues.apache.org/jira/browse/ARROW-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172610#comment-16172610 ] ASF GitHub Bot commented on ARROW-1209: --- Github user mariusvniekerk commented on the issue: https://github.com/apache/arrow/pull/1026 cyavro provides support for python file-like objects by basically making a void* and using fmemopen on it to get the FILE* > [C++] Implement converter between Arrow record batches and Avro records > --- > > Key: ARROW-1209 > URL: https://issues.apache.org/jira/browse/ARROW-1209 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney > Labels: pull-request-available > > This would be useful for streaming systems that need to consume or produce > Avro in C/C++ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1209) [C++] Implement converter between Arrow record batches and Avro records
[ https://issues.apache.org/jira/browse/ARROW-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172609#comment-16172609 ] ASF GitHub Bot commented on ARROW-1209: --- Github user mariusvniekerk commented on the issue: https://github.com/apache/arrow/pull/1026 Yeah the implementation in impala seems to provide its own codecs. The cpp implementation in libavro-cpp doesn't support all the codecs yet so i can see why impala/kudu reimplemented these. I assume that the impala cpp implementation is too tied to LLVM to be easily moved upstream to avro-cpp itself? > [C++] Implement converter between Arrow record batches and Avro records > --- > > Key: ARROW-1209 > URL: https://issues.apache.org/jira/browse/ARROW-1209 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney > Labels: pull-request-available > > This would be useful for streaming systems that need to consume or produce > Avro in C/C++ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
[ https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-1557: -- Labels: pull-request-available (was: ) > [PYTHON] pyarrow.Table.from_arrays doesn't validate names length > > > Key: ARROW-1557 > URL: https://issues.apache.org/jira/browse/ARROW-1557 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Priority: Minor > Labels: pull-request-available > Fix For: 0.8.0 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > pa.Table.from_arrays doesn't validate that the length of {{arrays}} and > {{names}} matches. I think this should raise with a {{ValueError}}: > {code} > In [1]: import pyarrow as pa > In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], > names=['a', 'b', 'c']) > Out[2]: > pyarrow.Table > a: int64 > b: int64 > In [3]: pa.__version__ > Out[3]: '0.7.0' > {code} > (This is my first time using JIRA, hopefully I didn't mess up too badly) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
[ https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172608#comment-16172608 ] ASF GitHub Bot commented on ARROW-1557: --- GitHub user TomAugspurger opened a pull request: https://github.com/apache/arrow/pull/1117 ARROW-1557 [Python] Validate names length in Table.from_arrays We now raise a ValueError when the length of the names doesn't match the length of the arrays. ```python In [1]: import pyarrow as pa In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], names=['a', 'b', 'c']) --- ValueErrorTraceback (most recent call last) in () > 1 pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], names=['a', 'b', 'c']) table.pxi in pyarrow.lib.Table.from_arrays() table.pxi in pyarrow.lib._schema_from_arrays() ValueError: Length of names (3) does not match length of arrays (2) ``` This affected `RecordBatch.from_arrays` and `Table.from_arrays`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/TomAugspurger/arrow validate-names Alternatively you can review and apply these changes as the patch at: https://github.com/apache/arrow/pull/1117.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1117 commit ed74d52249fabde739cf0599be0210c818b5d272 Author: Tom Augspurger Date: 2017-09-20T01:44:44Z ARROW-1557 [Python] Validate names length in Table.from_arrays We now raise a ValueError when the length of the names doesn't match the length of the arrays. > [PYTHON] pyarrow.Table.from_arrays doesn't validate names length > > > Key: ARROW-1557 > URL: https://issues.apache.org/jira/browse/ARROW-1557 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Priority: Minor > Labels: pull-request-available > Fix For: 0.8.0 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > pa.Table.from_arrays doesn't validate that the length of {{arrays}} and > {{names}} matches. I think this should raise with a {{ValueError}}: > {code} > In [1]: import pyarrow as pa > In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], > names=['a', 'b', 'c']) > Out[2]: > pyarrow.Table > a: int64 > b: int64 > In [3]: pa.__version__ > Out[3]: '0.7.0' > {code} > (This is my first time using JIRA, hopefully I didn't mess up too badly) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1576) [Python] Add utility functions (or a richer type hierachy) for checking whether data type instances are members of various type classes
[ https://issues.apache.org/jira/browse/ARROW-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172571#comment-16172571 ] Wes McKinney commented on ARROW-1576: - cf https://github.com/mapd/pymapd/pull/50#discussion_r139854270 > [Python] Add utility functions (or a richer type hierachy) for checking > whether data type instances are members of various type classes > --- > > Key: ARROW-1576 > URL: https://issues.apache.org/jira/browse/ARROW-1576 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Wes McKinney > Fix For: 0.8.0 > > > E.g. {{is_integer}}, {{is_unsigned_integer}}. This could be implemented > similar to NumPy, too ({{isinstance(t, pa.FloatingPoint)}} or something) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1576) [Python] Add utility functions (or a richer type hierachy) for checking whether data type instances are members of various type classes
Wes McKinney created ARROW-1576: --- Summary: [Python] Add utility functions (or a richer type hierachy) for checking whether data type instances are members of various type classes Key: ARROW-1576 URL: https://issues.apache.org/jira/browse/ARROW-1576 Project: Apache Arrow Issue Type: New Feature Components: Python Reporter: Wes McKinney Fix For: 0.8.0 E.g. {{is_integer}}, {{is_unsigned_integer}}. This could be implemented similar to NumPy, too ({{isinstance(t, pa.FloatingPoint)}} or something) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1575) [Python] Add pyarrow.column factory function
Wes McKinney created ARROW-1575: --- Summary: [Python] Add pyarrow.column factory function Key: ARROW-1575 URL: https://issues.apache.org/jira/browse/ARROW-1575 Project: Apache Arrow Issue Type: New Feature Components: Python Reporter: Wes McKinney Fix For: 0.8.0 This would internally call {{Column.from_array}} as appropriate -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1500) [C++] Result of ftruncate ignored in MemoryMappedFile::Create
[ https://issues.apache.org/jira/browse/ARROW-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172525#comment-16172525 ] ASF GitHub Bot commented on ARROW-1500: --- Github user amirma commented on the issue: https://github.com/apache/arrow/pull/1116 @wesm Bah, I just noticed my patch has a bug; if truncate fails we will leak the file handle. I just resubmitted a fixed version. Thanks. > [C++] Result of ftruncate ignored in MemoryMappedFile::Create > - > > Key: ARROW-1500 > URL: https://issues.apache.org/jira/browse/ARROW-1500 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Amir Malekpour > Labels: pull-request-available > Fix For: 0.8.0 > > > Observed in gcc 5.4.0 release build -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1574) [C++] Implement kernel function that converts a dense array to dictionary given known dictionary
Wes McKinney created ARROW-1574: --- Summary: [C++] Implement kernel function that converts a dense array to dictionary given known dictionary Key: ARROW-1574 URL: https://issues.apache.org/jira/browse/ARROW-1574 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney This may simply be a special case of cast using a dictionary type -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1573) [C++] Implement stateful kernel function that uses DictionaryBuilder to compute dictionary indices
Wes McKinney created ARROW-1573: --- Summary: [C++] Implement stateful kernel function that uses DictionaryBuilder to compute dictionary indices Key: ARROW-1573 URL: https://issues.apache.org/jira/browse/ARROW-1573 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney An operator utilizing this kernel may need some way to indicate to multithreaded schedulers that it cannot be parallelized on chunked arrays (unless we implement a concurrent hash table) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1572) [C++] Implement "value counts" kernels for tabulating value frequencies
Wes McKinney created ARROW-1572: --- Summary: [C++] Implement "value counts" kernels for tabulating value frequencies Key: ARROW-1572 URL: https://issues.apache.org/jira/browse/ARROW-1572 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney This is related to "match", "isin", and "unique" since hashing is generally required -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1571) [C++] Implement argsort kernels (sort indices) for integers using O(n) counting sort
Wes McKinney created ARROW-1571: --- Summary: [C++] Implement argsort kernels (sort indices) for integers using O(n) counting sort Key: ARROW-1571 URL: https://issues.apache.org/jira/browse/ARROW-1571 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney This function requires knowledge of the minimum and maximum of an array. If it is small enough, then an array of size {{maximum - minimum}} can be constructed and used to tabulate value frequencies and then compute the sort indices (this is called "grade up" or "grade down" in APL languages). There is generally a cross-over point where this function performs worse than mergesort or quicksort due to data locality issues -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1564) [C++] Kernel functions for computing minimum and maximum of an array in one pass
[ https://issues.apache.org/jira/browse/ARROW-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amir Malekpour updated ARROW-1564: -- Description: This is useful for determining whether a small-range integer O( n ) sort can be used in some circumstances. Can also be used for simply computing array statistics (was: This is useful for determining whether a small-range integer O( n ) sort can be used in some circumstances. Can also be use for simply computing array statistics) > [C++] Kernel functions for computing minimum and maximum of an array in one > pass > > > Key: ARROW-1564 > URL: https://issues.apache.org/jira/browse/ARROW-1564 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney > Labels: Analytics > > This is useful for determining whether a small-range integer O( n ) sort can > be used in some circumstances. Can also be used for simply computing array > statistics -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1570) [C++] Define API for creating a kernel instance from function of scalar input and output with a particular signature
Wes McKinney created ARROW-1570: --- Summary: [C++] Define API for creating a kernel instance from function of scalar input and output with a particular signature Key: ARROW-1570 URL: https://issues.apache.org/jira/browse/ARROW-1570 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney This could include an {{std::function}} instance (but these cannot be inlined by the C++ compiler), but should also permit use with inline-able functions or functors -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-772) [C++] Implement take kernel functions
[ https://issues.apache.org/jira/browse/ARROW-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-772: --- Summary: [C++] Implement take kernel functions (was: [C++] Implement Take function for arrow::Array types) > [C++] Implement take kernel functions > - > > Key: ARROW-772 > URL: https://issues.apache.org/jira/browse/ARROW-772 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney > Labels: Analytics > Fix For: 0.8.0 > > > Among other things, this can be used to convert from DictionaryArray back to > dense array. This is equivalent to {{ndarray.take}} in NumPy -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1569) [C++] Kernel functions for determining monotonicity (ascending or descending) for well-ordered types
Wes McKinney created ARROW-1569: --- Summary: [C++] Kernel functions for determining monotonicity (ascending or descending) for well-ordered types Key: ARROW-1569 URL: https://issues.apache.org/jira/browse/ARROW-1569 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney These kernels must offer some stateful variant so that monotonicity can be determined across chunked arrays -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1568) [C++] Implement "drop null" kernels that return array without nulls
Wes McKinney created ARROW-1568: --- Summary: [C++] Implement "drop null" kernels that return array without nulls Key: ARROW-1568 URL: https://issues.apache.org/jira/browse/ARROW-1568 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1567) [C++] Implement "fill null" kernels that replace null values with some scalar replacement value
Wes McKinney created ARROW-1567: --- Summary: [C++] Implement "fill null" kernels that replace null values with some scalar replacement value Key: ARROW-1567 URL: https://issues.apache.org/jira/browse/ARROW-1567 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1566) [C++] Implement "argsort" kernels that use mergesort to compute sorting indices
Wes McKinney created ARROW-1566: --- Summary: [C++] Implement "argsort" kernels that use mergesort to compute sorting indices Key: ARROW-1566 URL: https://issues.apache.org/jira/browse/ARROW-1566 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1565) [C++] "argtopk" and "argbottomk" functions for computing indices of largest or smallest elements
Wes McKinney created ARROW-1565: --- Summary: [C++] "argtopk" and "argbottomk" functions for computing indices of largest or smallest elements Key: ARROW-1565 URL: https://issues.apache.org/jira/browse/ARROW-1565 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney Heap-based topk can compute these indices in O(n log k) time -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1564) [C++] Kernel functions for computing minimum and maximum of an array in one pass
[ https://issues.apache.org/jira/browse/ARROW-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1564: Description: This is useful for determining whether a small-range integer O(n) sort can be used in some circumstances. Can also be use for simply computing array statistics > [C++] Kernel functions for computing minimum and maximum of an array in one > pass > > > Key: ARROW-1564 > URL: https://issues.apache.org/jira/browse/ARROW-1564 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney > Labels: Analytics > > This is useful for determining whether a small-range integer O(n) sort can be > used in some circumstances. Can also be use for simply computing array > statistics -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1564) [C++] Kernel functions for computing minimum and maximum of an array in one pass
[ https://issues.apache.org/jira/browse/ARROW-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1564: Description: This is useful for determining whether a small-range integer O( n ) sort can be used in some circumstances. Can also be use for simply computing array statistics (was: This is useful for determining whether a small-range integer O(n) sort can be used in some circumstances. Can also be use for simply computing array statistics) > [C++] Kernel functions for computing minimum and maximum of an array in one > pass > > > Key: ARROW-1564 > URL: https://issues.apache.org/jira/browse/ARROW-1564 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney > Labels: Analytics > > This is useful for determining whether a small-range integer O( n ) sort can > be used in some circumstances. Can also be use for simply computing array > statistics -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1564) [C++] Kernel functions for computing minimum and maximum of an array in one pass
Wes McKinney created ARROW-1564: --- Summary: [C++] Kernel functions for computing minimum and maximum of an array in one pass Key: ARROW-1564 URL: https://issues.apache.org/jira/browse/ARROW-1564 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1500) [C++] Result of ftruncate ignored in MemoryMappedFile::Create
[ https://issues.apache.org/jira/browse/ARROW-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172433#comment-16172433 ] ASF GitHub Bot commented on ARROW-1500: --- Github user wesm commented on the issue: https://github.com/apache/arrow/pull/1116 Thanks, the Travis CI tubes are a bit clogged today so I may not be able to merge until later tonight or tomorrow morning > [C++] Result of ftruncate ignored in MemoryMappedFile::Create > - > > Key: ARROW-1500 > URL: https://issues.apache.org/jira/browse/ARROW-1500 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Amir Malekpour > Labels: pull-request-available > Fix For: 0.8.0 > > > Observed in gcc 5.4.0 release build -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1500) [C++] Result of ftruncate ignored in MemoryMappedFile::Create
[ https://issues.apache.org/jira/browse/ARROW-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-1500: -- Labels: pull-request-available (was: ) > [C++] Result of ftruncate ignored in MemoryMappedFile::Create > - > > Key: ARROW-1500 > URL: https://issues.apache.org/jira/browse/ARROW-1500 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Amir Malekpour > Labels: pull-request-available > Fix For: 0.8.0 > > > Observed in gcc 5.4.0 release build -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1563) [C++] Implement logical unary and binary kernels for boolean arrays
Wes McKinney created ARROW-1563: --- Summary: [C++] Implement logical unary and binary kernels for boolean arrays Key: ARROW-1563 URL: https://issues.apache.org/jira/browse/ARROW-1563 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney And, or, not (negate), xor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1562) [C++] Numeric kernel implementations for add (+)
Wes McKinney created ARROW-1562: --- Summary: [C++] Numeric kernel implementations for add (+) Key: ARROW-1562 URL: https://issues.apache.org/jira/browse/ARROW-1562 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney This function should respect consistent type promotions between types of different sizes and signed and unsigned integers -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1561) [C++] Kernel implementations for "isin" (set containment)
Wes McKinney created ARROW-1561: --- Summary: [C++] Kernel implementations for "isin" (set containment) Key: ARROW-1561 URL: https://issues.apache.org/jira/browse/ARROW-1561 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney isin determines whether each element in the left array is contained in the values in the right array. This function must handle the case where the right array has nulls (so that null in the left array will return true) {code} isin(['a', 'b', null], ['a', 'c']) returns [true, false, null] isin(['a', 'b', null], ['a', 'c', null]) returns [true, false, true] {code} May need an option to return false for null instead of null -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1560) [C++] Kernel implementations for "match" function
Wes McKinney created ARROW-1560: --- Summary: [C++] Kernel implementations for "match" function Key: ARROW-1560 URL: https://issues.apache.org/jira/browse/ARROW-1560 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney Match computes a position index array from an array values into a set of categories {code} match(['a', 'b', 'a', null, 'b', 'a', 'b'], ['b', 'a']) return [1, 0, 1, null, 0, 1, 0] {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1559) [C++] Kernel implementations for "unique" (compute distinct elements of array)
Wes McKinney created ARROW-1559: --- Summary: [C++] Kernel implementations for "unique" (compute distinct elements of array) Key: ARROW-1559 URL: https://issues.apache.org/jira/browse/ARROW-1559 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1558) [C++] Implement boolean selection kernels
Wes McKinney created ARROW-1558: --- Summary: [C++] Implement boolean selection kernels Key: ARROW-1558 URL: https://issues.apache.org/jira/browse/ARROW-1558 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney Select values where a boolean selection array is true. If any values in are null, then values in the output array should be null -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1553) [JAVA] Implement setInitialCapacity for MapWriter and pass on this capacity during lazy creation of child vectors
[ https://issues.apache.org/jira/browse/ARROW-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172344#comment-16172344 ] ASF GitHub Bot commented on ARROW-1553: --- Github user siddharthteotia commented on the issue: https://github.com/apache/arrow/pull/1113 Added unit test > [JAVA] Implement setInitialCapacity for MapWriter and pass on this capacity > during lazy creation of child vectors > - > > Key: ARROW-1553 > URL: https://issues.apache.org/jira/browse/ARROW-1553 > Project: Apache Arrow > Issue Type: Bug > Components: Java - Vectors >Reporter: Siddharth Teotia >Assignee: Siddharth Teotia > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1534) [C++] Decimal128::ToBytes and uint8_t* constructor should return/assume big-endian byte order
[ https://issues.apache.org/jira/browse/ARROW-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172305#comment-16172305 ] ASF GitHub Bot commented on ARROW-1534: --- Github user cpcloud commented on the issue: https://github.com/apache/arrow/pull/1108 Closing until we resolve the way forward with parquet-cpp and decimals. > [C++] Decimal128::ToBytes and uint8_t* constructor should return/assume > big-endian byte order > - > > Key: ARROW-1534 > URL: https://issues.apache.org/jira/browse/ARROW-1534 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.6.0 >Reporter: Phillip Cloud >Assignee: Phillip Cloud > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1534) [C++] Decimal128::ToBytes and uint8_t* constructor should return/assume big-endian byte order
[ https://issues.apache.org/jira/browse/ARROW-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172307#comment-16172307 ] ASF GitHub Bot commented on ARROW-1534: --- Github user cpcloud closed the pull request at: https://github.com/apache/arrow/pull/1108 > [C++] Decimal128::ToBytes and uint8_t* constructor should return/assume > big-endian byte order > - > > Key: ARROW-1534 > URL: https://issues.apache.org/jira/browse/ARROW-1534 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.6.0 >Reporter: Phillip Cloud >Assignee: Phillip Cloud > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (ARROW-1500) [C++] Result of ftruncate ignored in MemoryMappedFile::Create
[ https://issues.apache.org/jira/browse/ARROW-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-1500: --- Assignee: Amir Malekpour > [C++] Result of ftruncate ignored in MemoryMappedFile::Create > - > > Key: ARROW-1500 > URL: https://issues.apache.org/jira/browse/ARROW-1500 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Amir Malekpour > Fix For: 0.8.0 > > > Observed in gcc 5.4.0 release build -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1555) [Python] write_to_dataset on s3
[ https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1555: Summary: [Python] write_to_dataset on s3 (was: PyArrow write_to_dataset on s3) > [Python] write_to_dataset on s3 > --- > > Key: ARROW-1555 > URL: https://issues.apache.org/jira/browse/ARROW-1555 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Young-Jun Ko >Assignee: Florian Jetter >Priority: Trivial > Fix For: 0.8.0 > > > When writing a arrow table to s3, I get an NotImplemented Exception. > The root cause is in _ensure_filesystem and can be reproduced as follows: > import pyarrow > import pyarrow.parquet as pqa > import s3fs > s3 = s3fs.S3FileSystem() > pqa._ensure_filesystem(s3).exists("anything") > It appears that the S3FSWrapper that is instantiated in _ensure_filesystem > does not expose the exist method of s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1555) PyArrow write_to_dataset on s3
[ https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172142#comment-16172142 ] Wes McKinney commented on ARROW-1555: - {{exists}} is a hard one. It may be better to try to fix the implementation of {{write_to_dataset}} to not use methods like {{exists}} that are not S3-friendly https://github.com/apache/arrow/blob/master/python/pyarrow/parquet.py#L920 > PyArrow write_to_dataset on s3 > -- > > Key: ARROW-1555 > URL: https://issues.apache.org/jira/browse/ARROW-1555 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Young-Jun Ko >Assignee: Florian Jetter >Priority: Trivial > Fix For: 0.8.0 > > > When writing a arrow table to s3, I get an NotImplemented Exception. > The root cause is in _ensure_filesystem and can be reproduced as follows: > import pyarrow > import pyarrow.parquet as pqa > import s3fs > s3 = s3fs.S3FileSystem() > pqa._ensure_filesystem(s3).exists("anything") > It appears that the S3FSWrapper that is instantiated in _ensure_filesystem > does not expose the exist method of s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1555) PyArrow write_to_dataset on s3
[ https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172076#comment-16172076 ] Florian Jetter commented on ARROW-1555: --- [~wesmckinn] Yes, it seems like some abstract methods of the FileSystem class (exists, open, etc.) were not implemented in the wrapper. I'll take care of it > PyArrow write_to_dataset on s3 > -- > > Key: ARROW-1555 > URL: https://issues.apache.org/jira/browse/ARROW-1555 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Young-Jun Ko >Priority: Trivial > Fix For: 0.8.0 > > > When writing a arrow table to s3, I get an NotImplemented Exception. > The root cause is in _ensure_filesystem and can be reproduced as follows: > import pyarrow > import pyarrow.parquet as pqa > import s3fs > s3 = s3fs.S3FileSystem() > pqa._ensure_filesystem(s3).exists("anything") > It appears that the S3FSWrapper that is instantiated in _ensure_filesystem > does not expose the exist method of s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (ARROW-1555) PyArrow write_to_dataset on s3
[ https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Florian Jetter reassigned ARROW-1555: - Assignee: Florian Jetter > PyArrow write_to_dataset on s3 > -- > > Key: ARROW-1555 > URL: https://issues.apache.org/jira/browse/ARROW-1555 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Young-Jun Ko >Assignee: Florian Jetter >Priority: Trivial > Fix For: 0.8.0 > > > When writing a arrow table to s3, I get an NotImplemented Exception. > The root cause is in _ensure_filesystem and can be reproduced as follows: > import pyarrow > import pyarrow.parquet as pqa > import s3fs > s3 = s3fs.S3FileSystem() > pqa._ensure_filesystem(s3).exists("anything") > It appears that the S3FSWrapper that is instantiated in _ensure_filesystem > does not expose the exist method of s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1554) [Python] Document that pip wheels depend on MSVC14 runtime
[ https://issues.apache.org/jira/browse/ARROW-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-1554: -- Labels: pull-request-available (was: ) > [Python] Document that pip wheels depend on MSVC14 runtime > -- > > Key: ARROW-1554 > URL: https://issues.apache.org/jira/browse/ARROW-1554 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 > Environment: Windows 10 (x64) > Python 3.6.2 (x64) >Reporter: Dima Ryazanov > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: parquet_dependencies.png, Process Monitor.png > > > I just tried pyarrow on Windows 10, and it fails to import for me: > {code} > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\site-packages\pyarrow\__init__.py", > line 32, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: DLL load failed: The specified module could not be found. > {code} > Not sure which DLL is failing, but I do see some DLLs in the pyarrow folder: > {code} > C:\Users\dima\Documents>dir "C:\Program > Files\Python36\lib\site-packages\pyarrow\" > Volume in drive C has no label. > Volume Serial Number is 4CE9-CC3C > Directory of C:\Program Files\Python36\lib\site-packages\pyarrow > 09/19/2017 01:14 AM . > 09/19/2017 01:14 AM .. > 09/19/2017 01:14 AM 2,382,336 arrow.dll > 09/19/2017 01:14 AM 604,160 arrow_python.dll > 09/19/2017 01:14 AM 3,402 compat.py > ... > {code} > However, I cannot open them using ctypes.cdll. I wonder if some dependency is > missing? > {code} > >>> open('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll', 'rb') > <_io.BufferedReader name='C:\\Program > Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll'> > >>> > >>> cdll.LoadLibrary('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll') > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 426, in > LoadLibrary > return self._dlltype(name) > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 348, in > __init__ > self._handle = _dlopen(self._name, mode) > OSError: [WinError 126] The specified module could not be found > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (ARROW-1554) [Python] Document that pip wheels depend on MSVC14 runtime
[ https://issues.apache.org/jira/browse/ARROW-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-1554: --- Assignee: Wes McKinney > [Python] Document that pip wheels depend on MSVC14 runtime > -- > > Key: ARROW-1554 > URL: https://issues.apache.org/jira/browse/ARROW-1554 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 > Environment: Windows 10 (x64) > Python 3.6.2 (x64) >Reporter: Dima Ryazanov >Assignee: Wes McKinney > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: parquet_dependencies.png, Process Monitor.png > > > I just tried pyarrow on Windows 10, and it fails to import for me: > {code} > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\site-packages\pyarrow\__init__.py", > line 32, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: DLL load failed: The specified module could not be found. > {code} > Not sure which DLL is failing, but I do see some DLLs in the pyarrow folder: > {code} > C:\Users\dima\Documents>dir "C:\Program > Files\Python36\lib\site-packages\pyarrow\" > Volume in drive C has no label. > Volume Serial Number is 4CE9-CC3C > Directory of C:\Program Files\Python36\lib\site-packages\pyarrow > 09/19/2017 01:14 AM . > 09/19/2017 01:14 AM .. > 09/19/2017 01:14 AM 2,382,336 arrow.dll > 09/19/2017 01:14 AM 604,160 arrow_python.dll > 09/19/2017 01:14 AM 3,402 compat.py > ... > {code} > However, I cannot open them using ctypes.cdll. I wonder if some dependency is > missing? > {code} > >>> open('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll', 'rb') > <_io.BufferedReader name='C:\\Program > Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll'> > >>> > >>> cdll.LoadLibrary('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll') > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 426, in > LoadLibrary > return self._dlltype(name) > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 348, in > __init__ > self._handle = _dlopen(self._name, mode) > OSError: [WinError 126] The specified module could not be found > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1554) [Python] Document that pip wheels depend on MSVC14 runtime
[ https://issues.apache.org/jira/browse/ARROW-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172053#comment-16172053 ] ASF GitHub Bot commented on ARROW-1554: --- GitHub user wesm opened a pull request: https://github.com/apache/arrow/pull/1115 ARROW-1554: [Python] Update Sphinx install page to note that VC14 runtime may need to be installed on Windows You can merge this pull request into a Git repository by running: $ git pull https://github.com/wesm/arrow ARROW-1554 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/arrow/pull/1115.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1115 commit a7c3e2795b5dc326d15b06b483283afa29a03ed7 Author: Wes McKinney Date: 2017-09-19T17:29:01Z Update Sphinx install page to note that VC14 runtime may need to be installed separately when using pip on Windows Change-Id: I3d0ba98091d5d59a81f528a07740bcc405848287 > [Python] Document that pip wheels depend on MSVC14 runtime > -- > > Key: ARROW-1554 > URL: https://issues.apache.org/jira/browse/ARROW-1554 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 > Environment: Windows 10 (x64) > Python 3.6.2 (x64) >Reporter: Dima Ryazanov > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: parquet_dependencies.png, Process Monitor.png > > > I just tried pyarrow on Windows 10, and it fails to import for me: > {code} > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\site-packages\pyarrow\__init__.py", > line 32, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: DLL load failed: The specified module could not be found. > {code} > Not sure which DLL is failing, but I do see some DLLs in the pyarrow folder: > {code} > C:\Users\dima\Documents>dir "C:\Program > Files\Python36\lib\site-packages\pyarrow\" > Volume in drive C has no label. > Volume Serial Number is 4CE9-CC3C > Directory of C:\Program Files\Python36\lib\site-packages\pyarrow > 09/19/2017 01:14 AM . > 09/19/2017 01:14 AM .. > 09/19/2017 01:14 AM 2,382,336 arrow.dll > 09/19/2017 01:14 AM 604,160 arrow_python.dll > 09/19/2017 01:14 AM 3,402 compat.py > ... > {code} > However, I cannot open them using ctypes.cdll. I wonder if some dependency is > missing? > {code} > >>> open('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll', 'rb') > <_io.BufferedReader name='C:\\Program > Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll'> > >>> > >>> cdll.LoadLibrary('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll') > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 426, in > LoadLibrary > return self._dlltype(name) > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 348, in > __init__ > self._handle = _dlopen(self._name, mode) > OSError: [WinError 126] The specified module could not be found > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
[ https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1557: Description: pa.Table.from_arrays doesn't validate that the length of {{arrays}} and {{names}} matches. I think this should raise with a {{ValueError}}: {code} In [1]: import pyarrow as pa In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], names=['a', 'b', 'c']) Out[2]: pyarrow.Table a: int64 b: int64 In [3]: pa.__version__ Out[3]: '0.7.0' {code} (This is my first time using JIRA, hopefully I didn't mess up too badly) was: pa.Table.from_arrays doesn't validate that the length of {{arrays}} and {{names}} matches. I think this should raise with a {{ValueError}}: {{ In [1]: import pyarrow as pa In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], names=['a', 'b', 'c']) Out[2]: pyarrow.Table a: int64 b: int64 In [3]: pa.__version__ Out[3]: '0.7.0' }} (This is my first time using JIRA, hopefully I didn't mess up too badly) > [PYTHON] pyarrow.Table.from_arrays doesn't validate names length > > > Key: ARROW-1557 > URL: https://issues.apache.org/jira/browse/ARROW-1557 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Priority: Minor > Fix For: 0.8.0 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > pa.Table.from_arrays doesn't validate that the length of {{arrays}} and > {{names}} matches. I think this should raise with a {{ValueError}}: > {code} > In [1]: import pyarrow as pa > In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], > names=['a', 'b', 'c']) > Out[2]: > pyarrow.Table > a: int64 > b: int64 > In [3]: pa.__version__ > Out[3]: '0.7.0' > {code} > (This is my first time using JIRA, hopefully I didn't mess up too badly) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
[ https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172019#comment-16172019 ] Wes McKinney commented on ARROW-1557: - Agreed! thanks for the bug report > [PYTHON] pyarrow.Table.from_arrays doesn't validate names length > > > Key: ARROW-1557 > URL: https://issues.apache.org/jira/browse/ARROW-1557 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Priority: Minor > Fix For: 0.8.0 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > pa.Table.from_arrays doesn't validate that the length of {{arrays}} and > {{names}} matches. I think this should raise with a {{ValueError}}: > {code} > In [1]: import pyarrow as pa > In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], > names=['a', 'b', 'c']) > Out[2]: > pyarrow.Table > a: int64 > b: int64 > In [3]: pa.__version__ > Out[3]: '0.7.0' > {code} > (This is my first time using JIRA, hopefully I didn't mess up too badly) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
[ https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1557: Fix Version/s: 0.8.0 > [PYTHON] pyarrow.Table.from_arrays doesn't validate names length > > > Key: ARROW-1557 > URL: https://issues.apache.org/jira/browse/ARROW-1557 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Priority: Minor > Fix For: 0.8.0 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > pa.Table.from_arrays doesn't validate that the length of {{arrays}} and > {{names}} matches. I think this should raise with a {{ValueError}}: > {{ > In [1]: import pyarrow as pa > In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], > names=['a', 'b', 'c']) > Out[2]: > pyarrow.Table > a: int64 > b: int64 > In [3]: pa.__version__ > Out[3]: '0.7.0' > }} > (This is my first time using JIRA, hopefully I didn't mess up too badly) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
[ https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172007#comment-16172007 ] Tom Augspurger commented on ARROW-1557: --- I can probably submit a fix on Thursday or Friday. > [PYTHON] pyarrow.Table.from_arrays doesn't validate names length > > > Key: ARROW-1557 > URL: https://issues.apache.org/jira/browse/ARROW-1557 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Priority: Minor > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > pa.Table.from_arrays doesn't validate that the length of {{arrays}} and > {{names}} matches. I think this should raise with a {{ValueError}}: > {{ > In [1]: import pyarrow as pa > In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], > names=['a', 'b', 'c']) > Out[2]: > pyarrow.Table > a: int64 > b: int64 > In [3]: pa.__version__ > Out[3]: '0.7.0' > }} > (This is my first time using JIRA, hopefully I didn't mess up too badly) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
[ https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Augspurger updated ARROW-1557: -- Summary: [PYTHON] pyarrow.Table.from_arrays doesn't validate names length (was: pyarrow.Table.from_arrays doesn't validate names length) > [PYTHON] pyarrow.Table.from_arrays doesn't validate names length > > > Key: ARROW-1557 > URL: https://issues.apache.org/jira/browse/ARROW-1557 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Priority: Minor > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > pa.Table.from_arrays doesn't validate that the length of {{arrays}} and > {{names}} matches. I think this should raise with a {{ValueError}}: > {{ > In [1]: import pyarrow as pa > In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], > names=['a', 'b', 'c']) > Out[2]: > pyarrow.Table > a: int64 > b: int64 > In [3]: pa.__version__ > Out[3]: '0.7.0' > }} > (This is my first time using JIRA, hopefully I didn't mess up too badly) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1557) pyarrow.Table.from_arrays doesn't validate names length
Tom Augspurger created ARROW-1557: - Summary: pyarrow.Table.from_arrays doesn't validate names length Key: ARROW-1557 URL: https://issues.apache.org/jira/browse/ARROW-1557 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.7.0 Reporter: Tom Augspurger Priority: Minor pa.Table.from_arrays doesn't validate that the length of {{arrays}} and {{names}} matches. I think this should raise with a {{ValueError}}: {{ In [1]: import pyarrow as pa In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], names=['a', 'b', 'c']) Out[2]: pyarrow.Table a: int64 b: int64 In [3]: pa.__version__ Out[3]: '0.7.0' }} (This is my first time using JIRA, hopefully I didn't mess up too badly) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1556) [C++] Incorporate AssertArraysEqual function from PARQUET-1100 patch
Wes McKinney created ARROW-1556: --- Summary: [C++] Incorporate AssertArraysEqual function from PARQUET-1100 patch Key: ARROW-1556 URL: https://issues.apache.org/jira/browse/ARROW-1556 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Fix For: 0.8.0 see discussion in https://github.com/apache/parquet-cpp/pull/398 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1347) [JAVA] List null type should use consistent name for inner field
[ https://issues.apache.org/jira/browse/ARROW-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171974#comment-16171974 ] ASF GitHub Bot commented on ARROW-1347: --- Github user wesm commented on the issue: https://github.com/apache/arrow/pull/959 Can this be merged? > [JAVA] List null type should use consistent name for inner field > > > Key: ARROW-1347 > URL: https://issues.apache.org/jira/browse/ARROW-1347 > Project: Apache Arrow > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Steven Phillips > Labels: pull-request-available > > The child field for List type has the field name "$data$" in most cases. In > the case that there is not a known type for the List, currently the > getField() method will return a subfield with name "DEFAULT". We should make > this consistent with the rest of the cases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1192) [JAVA] Improve splitAndTransfer performance for List and Union vectors
[ https://issues.apache.org/jira/browse/ARROW-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-1192: -- Labels: pull-request-available (was: ) > [JAVA] Improve splitAndTransfer performance for List and Union vectors > -- > > Key: ARROW-1192 > URL: https://issues.apache.org/jira/browse/ARROW-1192 > Project: Apache Arrow > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Steven Phillips > Labels: pull-request-available > Fix For: 0.6.0 > > > Most vector implementations slice the underlying buffer for splitAndTransfer, > but ListVector and UnionVector copy data into a new buffer. We should enhance > these to use slice as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1192) [JAVA] Improve splitAndTransfer performance for List and Union vectors
[ https://issues.apache.org/jira/browse/ARROW-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171971#comment-16171971 ] ASF GitHub Bot commented on ARROW-1192: --- Github user wesm commented on the issue: https://github.com/apache/arrow/pull/819 @StevenMPhillips can you close? > [JAVA] Improve splitAndTransfer performance for List and Union vectors > -- > > Key: ARROW-1192 > URL: https://issues.apache.org/jira/browse/ARROW-1192 > Project: Apache Arrow > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Steven Phillips > Labels: pull-request-available > Fix For: 0.6.0 > > > Most vector implementations slice the underlying buffer for splitAndTransfer, > but ListVector and UnionVector copy data into a new buffer. We should enhance > these to use slice as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1347) [JAVA] List null type should use consistent name for inner field
[ https://issues.apache.org/jira/browse/ARROW-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-1347: -- Labels: pull-request-available (was: ) > [JAVA] List null type should use consistent name for inner field > > > Key: ARROW-1347 > URL: https://issues.apache.org/jira/browse/ARROW-1347 > Project: Apache Arrow > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Steven Phillips > Labels: pull-request-available > > The child field for List type has the field name "$data$" in most cases. In > the case that there is not a known type for the List, currently the > getField() method will return a subfield with name "DEFAULT". We should make > this consistent with the rest of the cases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1554) [Python] Document that pip wheels depend on MSVC14 runtime
[ https://issues.apache.org/jira/browse/ARROW-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171969#comment-16171969 ] Wes McKinney commented on ARROW-1554: - Cool. I changed the JIRA title so that we can add a note to the Sphinx docs about this issue > [Python] Document that pip wheels depend on MSVC14 runtime > -- > > Key: ARROW-1554 > URL: https://issues.apache.org/jira/browse/ARROW-1554 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 > Environment: Windows 10 (x64) > Python 3.6.2 (x64) >Reporter: Dima Ryazanov > Fix For: 0.8.0 > > Attachments: parquet_dependencies.png, Process Monitor.png > > > I just tried pyarrow on Windows 10, and it fails to import for me: > {code} > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\site-packages\pyarrow\__init__.py", > line 32, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: DLL load failed: The specified module could not be found. > {code} > Not sure which DLL is failing, but I do see some DLLs in the pyarrow folder: > {code} > C:\Users\dima\Documents>dir "C:\Program > Files\Python36\lib\site-packages\pyarrow\" > Volume in drive C has no label. > Volume Serial Number is 4CE9-CC3C > Directory of C:\Program Files\Python36\lib\site-packages\pyarrow > 09/19/2017 01:14 AM . > 09/19/2017 01:14 AM .. > 09/19/2017 01:14 AM 2,382,336 arrow.dll > 09/19/2017 01:14 AM 604,160 arrow_python.dll > 09/19/2017 01:14 AM 3,402 compat.py > ... > {code} > However, I cannot open them using ctypes.cdll. I wonder if some dependency is > missing? > {code} > >>> open('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll', 'rb') > <_io.BufferedReader name='C:\\Program > Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll'> > >>> > >>> cdll.LoadLibrary('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll') > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 426, in > LoadLibrary > return self._dlltype(name) > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 348, in > __init__ > self._handle = _dlopen(self._name, mode) > OSError: [WinError 126] The specified module could not be found > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1554) [Python] Document that pip wheels depend on MSVC14 runtime
[ https://issues.apache.org/jira/browse/ARROW-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1554: Summary: [Python] Document that pip wheels depend on MSVC14 runtime (was: "ImportError: DLL load failed: The specified module could not be found" on Windows 10) > [Python] Document that pip wheels depend on MSVC14 runtime > -- > > Key: ARROW-1554 > URL: https://issues.apache.org/jira/browse/ARROW-1554 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 > Environment: Windows 10 (x64) > Python 3.6.2 (x64) >Reporter: Dima Ryazanov > Fix For: 0.8.0 > > Attachments: parquet_dependencies.png, Process Monitor.png > > > I just tried pyarrow on Windows 10, and it fails to import for me: > {code} > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\site-packages\pyarrow\__init__.py", > line 32, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: DLL load failed: The specified module could not be found. > {code} > Not sure which DLL is failing, but I do see some DLLs in the pyarrow folder: > {code} > C:\Users\dima\Documents>dir "C:\Program > Files\Python36\lib\site-packages\pyarrow\" > Volume in drive C has no label. > Volume Serial Number is 4CE9-CC3C > Directory of C:\Program Files\Python36\lib\site-packages\pyarrow > 09/19/2017 01:14 AM . > 09/19/2017 01:14 AM .. > 09/19/2017 01:14 AM 2,382,336 arrow.dll > 09/19/2017 01:14 AM 604,160 arrow_python.dll > 09/19/2017 01:14 AM 3,402 compat.py > ... > {code} > However, I cannot open them using ctypes.cdll. I wonder if some dependency is > missing? > {code} > >>> open('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll', 'rb') > <_io.BufferedReader name='C:\\Program > Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll'> > >>> > >>> cdll.LoadLibrary('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll') > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 426, in > LoadLibrary > return self._dlltype(name) > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 348, in > __init__ > self._handle = _dlopen(self._name, mode) > OSError: [WinError 126] The specified module could not be found > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (ARROW-1547) [JAVA] Fix 8x memory over-allocation in BitVector
[ https://issues.apache.org/jira/browse/ARROW-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-1547. - Resolution: Fixed Issue resolved by pull request 1109 [https://github.com/apache/arrow/pull/1109] > [JAVA] Fix 8x memory over-allocation in BitVector > - > > Key: ARROW-1547 > URL: https://issues.apache.org/jira/browse/ARROW-1547 > Project: Apache Arrow > Issue Type: Bug > Components: Java - Vectors >Reporter: Siddharth Teotia >Assignee: Siddharth Teotia > Labels: pull-request-available > > Typically there are 3 ways of specifying the amount of memory needed for > vectors. > CASE (1) allocateNew() -- here the application doesn't really specify the > size of memory or value count. Each vector type has a default value count > (4096) and therefore a default size (in bytes) is used in such cases. > For example, for a 4 byte fixed-width vector, we will allocate 32KB of memory > for a call to allocateNew(). > CASE (2) setInitialCapacity(count) followed by allocateNew() - In this case > also the application doesn't specify the value count or size in > allocateNew(). However, the call to setInitialCapacity() dictates the amount > of memory the subsequent call to allocateNew() will allocate. > For example, we can do setInitialCapacity(1024) and the call to allocateNew() > will allocate 4KB of memory for the 4 byte fixed-width vector. > CASE (3) allocateNew(count) - The application is specific about requirements. > For nullable vectors, the above calls also allocate the memory for validity > vector. > The problem is that Bit Vector uses a default memory size in bytes of 4096. > In other words, we allocate a vector for 4096*8 value count. > In the default case (as explained above), the vector types have a value count > of 4096 so we need only 4096 bits (512 bytes) in the bit vector and not > really 4096 as the size in bytes. > This happens in CASE 1 where the application depends on the default memory > allocation . In such cases, the size of buffer for bit vector is 8x than > actually needed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1547) [JAVA] Fix 8x memory over-allocation in BitVector
[ https://issues.apache.org/jira/browse/ARROW-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171965#comment-16171965 ] ASF GitHub Bot commented on ARROW-1547: --- Github user asfgit closed the pull request at: https://github.com/apache/arrow/pull/1109 > [JAVA] Fix 8x memory over-allocation in BitVector > - > > Key: ARROW-1547 > URL: https://issues.apache.org/jira/browse/ARROW-1547 > Project: Apache Arrow > Issue Type: Bug > Components: Java - Vectors >Reporter: Siddharth Teotia >Assignee: Siddharth Teotia > Labels: pull-request-available > > Typically there are 3 ways of specifying the amount of memory needed for > vectors. > CASE (1) allocateNew() -- here the application doesn't really specify the > size of memory or value count. Each vector type has a default value count > (4096) and therefore a default size (in bytes) is used in such cases. > For example, for a 4 byte fixed-width vector, we will allocate 32KB of memory > for a call to allocateNew(). > CASE (2) setInitialCapacity(count) followed by allocateNew() - In this case > also the application doesn't specify the value count or size in > allocateNew(). However, the call to setInitialCapacity() dictates the amount > of memory the subsequent call to allocateNew() will allocate. > For example, we can do setInitialCapacity(1024) and the call to allocateNew() > will allocate 4KB of memory for the 4 byte fixed-width vector. > CASE (3) allocateNew(count) - The application is specific about requirements. > For nullable vectors, the above calls also allocate the memory for validity > vector. > The problem is that Bit Vector uses a default memory size in bytes of 4096. > In other words, we allocate a vector for 4096*8 value count. > In the default case (as explained above), the vector types have a value count > of 4096 so we need only 4096 bits (512 bytes) in the bit vector and not > really 4096 as the size in bytes. > This happens in CASE 1 where the application depends on the default memory > allocation . In such cases, the size of buffer for bit vector is 8x than > actually needed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1554) "ImportError: DLL load failed: The specified module could not be found" on Windows 10
[ https://issues.apache.org/jira/browse/ARROW-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171966#comment-16171966 ] Dima Ryazanov commented on ARROW-1554: -- Yep, installing the Visual Studio C++ Redistributable fixed the problem. (Though that answer says 2015 - but points to the 2010 one. Also, appears to be x86 only. I installed this one: https://www.microsoft.com/en-us/download/details.aspx?id=48145) (Haven't actually tried conda yet - but I tried it before in a different environment, and I see "Miniconda3/Library/bin/msvcp140.dll" there - so makes sense that it works.) > "ImportError: DLL load failed: The specified module could not be found" on > Windows 10 > - > > Key: ARROW-1554 > URL: https://issues.apache.org/jira/browse/ARROW-1554 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 > Environment: Windows 10 (x64) > Python 3.6.2 (x64) >Reporter: Dima Ryazanov > Fix For: 0.8.0 > > Attachments: parquet_dependencies.png, Process Monitor.png > > > I just tried pyarrow on Windows 10, and it fails to import for me: > {code} > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\site-packages\pyarrow\__init__.py", > line 32, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: DLL load failed: The specified module could not be found. > {code} > Not sure which DLL is failing, but I do see some DLLs in the pyarrow folder: > {code} > C:\Users\dima\Documents>dir "C:\Program > Files\Python36\lib\site-packages\pyarrow\" > Volume in drive C has no label. > Volume Serial Number is 4CE9-CC3C > Directory of C:\Program Files\Python36\lib\site-packages\pyarrow > 09/19/2017 01:14 AM . > 09/19/2017 01:14 AM .. > 09/19/2017 01:14 AM 2,382,336 arrow.dll > 09/19/2017 01:14 AM 604,160 arrow_python.dll > 09/19/2017 01:14 AM 3,402 compat.py > ... > {code} > However, I cannot open them using ctypes.cdll. I wonder if some dependency is > missing? > {code} > >>> open('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll', 'rb') > <_io.BufferedReader name='C:\\Program > Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll'> > >>> > >>> cdll.LoadLibrary('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll') > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 426, in > LoadLibrary > return self._dlltype(name) > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 348, in > __init__ > self._handle = _dlopen(self._name, mode) > OSError: [WinError 126] The specified module could not be found > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1547) [JAVA] Fix 8x memory over-allocation in BitVector
[ https://issues.apache.org/jira/browse/ARROW-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171963#comment-16171963 ] ASF GitHub Bot commented on ARROW-1547: --- Github user wesm commented on the issue: https://github.com/apache/arrow/pull/1109 +1 > [JAVA] Fix 8x memory over-allocation in BitVector > - > > Key: ARROW-1547 > URL: https://issues.apache.org/jira/browse/ARROW-1547 > Project: Apache Arrow > Issue Type: Bug > Components: Java - Vectors >Reporter: Siddharth Teotia >Assignee: Siddharth Teotia > Labels: pull-request-available > > Typically there are 3 ways of specifying the amount of memory needed for > vectors. > CASE (1) allocateNew() -- here the application doesn't really specify the > size of memory or value count. Each vector type has a default value count > (4096) and therefore a default size (in bytes) is used in such cases. > For example, for a 4 byte fixed-width vector, we will allocate 32KB of memory > for a call to allocateNew(). > CASE (2) setInitialCapacity(count) followed by allocateNew() - In this case > also the application doesn't specify the value count or size in > allocateNew(). However, the call to setInitialCapacity() dictates the amount > of memory the subsequent call to allocateNew() will allocate. > For example, we can do setInitialCapacity(1024) and the call to allocateNew() > will allocate 4KB of memory for the 4 byte fixed-width vector. > CASE (3) allocateNew(count) - The application is specific about requirements. > For nullable vectors, the above calls also allocate the memory for validity > vector. > The problem is that Bit Vector uses a default memory size in bytes of 4096. > In other words, we allocate a vector for 4096*8 value count. > In the default case (as explained above), the vector types have a value count > of 4096 so we need only 4096 bits (512 bytes) in the bit vector and not > really 4096 as the size in bytes. > This happens in CASE 1 where the application depends on the default memory > allocation . In such cases, the size of buffer for bit vector is 8x than > actually needed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1533) [JAVA] realloc should consider the existing buffer capacity for computing target memory requirement
[ https://issues.apache.org/jira/browse/ARROW-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171957#comment-16171957 ] ASF GitHub Bot commented on ARROW-1533: --- Github user asfgit closed the pull request at: https://github.com/apache/arrow/pull/1112 > [JAVA] realloc should consider the existing buffer capacity for computing > target memory requirement > --- > > Key: ARROW-1533 > URL: https://issues.apache.org/jira/browse/ARROW-1533 > Project: Apache Arrow > Issue Type: Bug > Components: Java - Vectors >Reporter: Siddharth Teotia >Assignee: Siddharth Teotia > Labels: pull-request-available > > We recently encountered a problem when we were trying to add JSON files with > complex schema as datasets. > Initially we started with a Float8Vector with default memory allocation of > (4096 * 8) 32KB. > Went through several iterations of setSafe() to trigger a realloc() from 32KB > to 64KB. > Another round of setSafe() calls to trigger a realloc() from 64KB to 128KB > After that we encountered a BigInt and promoted our vector to UnionVector. > This required us to create a UnionVector with BigIntVector and Float8Vector. > The latter required us to transfer the Float8Vector we were earlier working > with to the Float8Vector inside the Union. > As part of transferTo(), the target Float8Vector got all the ArrowBuf state > (capacity, buffer contents) etc transferred from the source vector. > Later, a realloc was triggered on the Float8Vector inside the UnionVector. > The computation inside realloc() to determine the amount of memory to be > reallocated goes wrong since it makes the decision based on > allocateSizeInBytes -- although this vector was created as part of transfer() > from 128KB source vector, allocateSizeInBytes is still at the initial/default > value of 32KB > We end up allocating a 64KB buffer and attempt to copy 128KB over 64KB and > seg fault when invoking setBytes(). > There is a wrong assumption in realloc() that allocateSizeInBytes is always > equal to data.capacity(). The particular scenario described above exposes > where this assumption could go wrong. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (ARROW-1533) [JAVA] realloc should consider the existing buffer capacity for computing target memory requirement
[ https://issues.apache.org/jira/browse/ARROW-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-1533. - Resolution: Fixed Issue resolved by pull request 1112 [https://github.com/apache/arrow/pull/1112] > [JAVA] realloc should consider the existing buffer capacity for computing > target memory requirement > --- > > Key: ARROW-1533 > URL: https://issues.apache.org/jira/browse/ARROW-1533 > Project: Apache Arrow > Issue Type: Bug > Components: Java - Vectors >Reporter: Siddharth Teotia >Assignee: Siddharth Teotia > Labels: pull-request-available > > We recently encountered a problem when we were trying to add JSON files with > complex schema as datasets. > Initially we started with a Float8Vector with default memory allocation of > (4096 * 8) 32KB. > Went through several iterations of setSafe() to trigger a realloc() from 32KB > to 64KB. > Another round of setSafe() calls to trigger a realloc() from 64KB to 128KB > After that we encountered a BigInt and promoted our vector to UnionVector. > This required us to create a UnionVector with BigIntVector and Float8Vector. > The latter required us to transfer the Float8Vector we were earlier working > with to the Float8Vector inside the Union. > As part of transferTo(), the target Float8Vector got all the ArrowBuf state > (capacity, buffer contents) etc transferred from the source vector. > Later, a realloc was triggered on the Float8Vector inside the UnionVector. > The computation inside realloc() to determine the amount of memory to be > reallocated goes wrong since it makes the decision based on > allocateSizeInBytes -- although this vector was created as part of transfer() > from 128KB source vector, allocateSizeInBytes is still at the initial/default > value of 32KB > We end up allocating a 64KB buffer and attempt to copy 128KB over 64KB and > seg fault when invoking setBytes(). > There is a wrong assumption in realloc() that allocateSizeInBytes is always > equal to data.capacity(). The particular scenario described above exposes > where this assumption could go wrong. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1533) [JAVA] realloc should consider the existing buffer capacity for computing target memory requirement
[ https://issues.apache.org/jira/browse/ARROW-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171955#comment-16171955 ] ASF GitHub Bot commented on ARROW-1533: --- Github user wesm commented on the issue: https://github.com/apache/arrow/pull/1112 +1 > [JAVA] realloc should consider the existing buffer capacity for computing > target memory requirement > --- > > Key: ARROW-1533 > URL: https://issues.apache.org/jira/browse/ARROW-1533 > Project: Apache Arrow > Issue Type: Bug > Components: Java - Vectors >Reporter: Siddharth Teotia >Assignee: Siddharth Teotia > Labels: pull-request-available > > We recently encountered a problem when we were trying to add JSON files with > complex schema as datasets. > Initially we started with a Float8Vector with default memory allocation of > (4096 * 8) 32KB. > Went through several iterations of setSafe() to trigger a realloc() from 32KB > to 64KB. > Another round of setSafe() calls to trigger a realloc() from 64KB to 128KB > After that we encountered a BigInt and promoted our vector to UnionVector. > This required us to create a UnionVector with BigIntVector and Float8Vector. > The latter required us to transfer the Float8Vector we were earlier working > with to the Float8Vector inside the Union. > As part of transferTo(), the target Float8Vector got all the ArrowBuf state > (capacity, buffer contents) etc transferred from the source vector. > Later, a realloc was triggered on the Float8Vector inside the UnionVector. > The computation inside realloc() to determine the amount of memory to be > reallocated goes wrong since it makes the decision based on > allocateSizeInBytes -- although this vector was created as part of transfer() > from 128KB source vector, allocateSizeInBytes is still at the initial/default > value of 32KB > We end up allocating a 64KB buffer and attempt to copy 128KB over 64KB and > seg fault when invoking setBytes(). > There is a wrong assumption in realloc() that allocateSizeInBytes is always > equal to data.capacity(). The particular scenario described above exposes > where this assumption could go wrong. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1554) "ImportError: DLL load failed: The specified module could not be found" on Windows 10
[ https://issues.apache.org/jira/browse/ARROW-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171922#comment-16171922 ] Wes McKinney commented on ARROW-1554: - According to https://answers.microsoft.com/en-us/windows/forum/windows_10-performance/msvcp140dll-is-missing-in-my-win-10/1c65d6b0-68b8-4b59-b720-3e6a33038389?auth=1 you may be able to resolve this by installing Visual C++ Redistributable on your machine, which will install the VC14 runtime. > "ImportError: DLL load failed: The specified module could not be found" on > Windows 10 > - > > Key: ARROW-1554 > URL: https://issues.apache.org/jira/browse/ARROW-1554 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 > Environment: Windows 10 (x64) > Python 3.6.2 (x64) >Reporter: Dima Ryazanov > Fix For: 0.8.0 > > Attachments: parquet_dependencies.png, Process Monitor.png > > > I just tried pyarrow on Windows 10, and it fails to import for me: > {code} > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\site-packages\pyarrow\__init__.py", > line 32, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: DLL load failed: The specified module could not be found. > {code} > Not sure which DLL is failing, but I do see some DLLs in the pyarrow folder: > {code} > C:\Users\dima\Documents>dir "C:\Program > Files\Python36\lib\site-packages\pyarrow\" > Volume in drive C has no label. > Volume Serial Number is 4CE9-CC3C > Directory of C:\Program Files\Python36\lib\site-packages\pyarrow > 09/19/2017 01:14 AM . > 09/19/2017 01:14 AM .. > 09/19/2017 01:14 AM 2,382,336 arrow.dll > 09/19/2017 01:14 AM 604,160 arrow_python.dll > 09/19/2017 01:14 AM 3,402 compat.py > ... > {code} > However, I cannot open them using ctypes.cdll. I wonder if some dependency is > missing? > {code} > >>> open('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll', 'rb') > <_io.BufferedReader name='C:\\Program > Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll'> > >>> > >>> cdll.LoadLibrary('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll') > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 426, in > LoadLibrary > return self._dlltype(name) > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 348, in > __init__ > self._handle = _dlopen(self._name, mode) > OSError: [WinError 126] The specified module could not be found > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1554) "ImportError: DLL load failed: The specified module could not be found" on Windows 10
[ https://issues.apache.org/jira/browse/ARROW-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1554: Attachment: parquet_dependencies.png > "ImportError: DLL load failed: The specified module could not be found" on > Windows 10 > - > > Key: ARROW-1554 > URL: https://issues.apache.org/jira/browse/ARROW-1554 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 > Environment: Windows 10 (x64) > Python 3.6.2 (x64) >Reporter: Dima Ryazanov > Fix For: 0.8.0 > > Attachments: parquet_dependencies.png, Process Monitor.png > > > I just tried pyarrow on Windows 10, and it fails to import for me: > {code} > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\site-packages\pyarrow\__init__.py", > line 32, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: DLL load failed: The specified module could not be found. > {code} > Not sure which DLL is failing, but I do see some DLLs in the pyarrow folder: > {code} > C:\Users\dima\Documents>dir "C:\Program > Files\Python36\lib\site-packages\pyarrow\" > Volume in drive C has no label. > Volume Serial Number is 4CE9-CC3C > Directory of C:\Program Files\Python36\lib\site-packages\pyarrow > 09/19/2017 01:14 AM . > 09/19/2017 01:14 AM .. > 09/19/2017 01:14 AM 2,382,336 arrow.dll > 09/19/2017 01:14 AM 604,160 arrow_python.dll > 09/19/2017 01:14 AM 3,402 compat.py > ... > {code} > However, I cannot open them using ctypes.cdll. I wonder if some dependency is > missing? > {code} > >>> open('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll', 'rb') > <_io.BufferedReader name='C:\\Program > Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll'> > >>> > >>> cdll.LoadLibrary('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll') > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 426, in > LoadLibrary > return self._dlltype(name) > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 348, in > __init__ > self._handle = _dlopen(self._name, mode) > OSError: [WinError 126] The specified module could not be found > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1554) "ImportError: DLL load failed: The specified module could not be found" on Windows 10
[ https://issues.apache.org/jira/browse/ARROW-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171921#comment-16171921 ] Wes McKinney commented on ARROW-1554: - OK, yeah, I used dependency walker and see that also, attaching screenshot > "ImportError: DLL load failed: The specified module could not be found" on > Windows 10 > - > > Key: ARROW-1554 > URL: https://issues.apache.org/jira/browse/ARROW-1554 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 > Environment: Windows 10 (x64) > Python 3.6.2 (x64) >Reporter: Dima Ryazanov > Fix For: 0.8.0 > > Attachments: parquet_dependencies.png, Process Monitor.png > > > I just tried pyarrow on Windows 10, and it fails to import for me: > {code} > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\site-packages\pyarrow\__init__.py", > line 32, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: DLL load failed: The specified module could not be found. > {code} > Not sure which DLL is failing, but I do see some DLLs in the pyarrow folder: > {code} > C:\Users\dima\Documents>dir "C:\Program > Files\Python36\lib\site-packages\pyarrow\" > Volume in drive C has no label. > Volume Serial Number is 4CE9-CC3C > Directory of C:\Program Files\Python36\lib\site-packages\pyarrow > 09/19/2017 01:14 AM . > 09/19/2017 01:14 AM .. > 09/19/2017 01:14 AM 2,382,336 arrow.dll > 09/19/2017 01:14 AM 604,160 arrow_python.dll > 09/19/2017 01:14 AM 3,402 compat.py > ... > {code} > However, I cannot open them using ctypes.cdll. I wonder if some dependency is > missing? > {code} > >>> open('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll', 'rb') > <_io.BufferedReader name='C:\\Program > Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll'> > >>> > >>> cdll.LoadLibrary('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll') > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 426, in > LoadLibrary > return self._dlltype(name) > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 348, in > __init__ > self._handle = _dlopen(self._name, mode) > OSError: [WinError 126] The specified module could not be found > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1554) "ImportError: DLL load failed: The specified module could not be found" on Windows 10
[ https://issues.apache.org/jira/browse/ARROW-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171913#comment-16171913 ] Dima Ryazanov commented on ARROW-1554: -- Looks like it's missing MSVCP140.dll - see the screenshot. And you're right, tensorflow is also failing. I'll try conda next. > "ImportError: DLL load failed: The specified module could not be found" on > Windows 10 > - > > Key: ARROW-1554 > URL: https://issues.apache.org/jira/browse/ARROW-1554 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 > Environment: Windows 10 (x64) > Python 3.6.2 (x64) >Reporter: Dima Ryazanov > Fix For: 0.8.0 > > Attachments: Process Monitor.png > > > I just tried pyarrow on Windows 10, and it fails to import for me: > {code} > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\site-packages\pyarrow\__init__.py", > line 32, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: DLL load failed: The specified module could not be found. > {code} > Not sure which DLL is failing, but I do see some DLLs in the pyarrow folder: > {code} > C:\Users\dima\Documents>dir "C:\Program > Files\Python36\lib\site-packages\pyarrow\" > Volume in drive C has no label. > Volume Serial Number is 4CE9-CC3C > Directory of C:\Program Files\Python36\lib\site-packages\pyarrow > 09/19/2017 01:14 AM . > 09/19/2017 01:14 AM .. > 09/19/2017 01:14 AM 2,382,336 arrow.dll > 09/19/2017 01:14 AM 604,160 arrow_python.dll > 09/19/2017 01:14 AM 3,402 compat.py > ... > {code} > However, I cannot open them using ctypes.cdll. I wonder if some dependency is > missing? > {code} > >>> open('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll', 'rb') > <_io.BufferedReader name='C:\\Program > Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll'> > >>> > >>> cdll.LoadLibrary('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll') > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 426, in > LoadLibrary > return self._dlltype(name) > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 348, in > __init__ > self._handle = _dlopen(self._name, mode) > OSError: [WinError 126] The specified module could not be found > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1554) "ImportError: DLL load failed: The specified module could not be found" on Windows 10
[ https://issues.apache.org/jira/browse/ARROW-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dima Ryazanov updated ARROW-1554: - Attachment: Process Monitor.png > "ImportError: DLL load failed: The specified module could not be found" on > Windows 10 > - > > Key: ARROW-1554 > URL: https://issues.apache.org/jira/browse/ARROW-1554 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 > Environment: Windows 10 (x64) > Python 3.6.2 (x64) >Reporter: Dima Ryazanov > Fix For: 0.8.0 > > Attachments: Process Monitor.png > > > I just tried pyarrow on Windows 10, and it fails to import for me: > {code} > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\site-packages\pyarrow\__init__.py", > line 32, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: DLL load failed: The specified module could not be found. > {code} > Not sure which DLL is failing, but I do see some DLLs in the pyarrow folder: > {code} > C:\Users\dima\Documents>dir "C:\Program > Files\Python36\lib\site-packages\pyarrow\" > Volume in drive C has no label. > Volume Serial Number is 4CE9-CC3C > Directory of C:\Program Files\Python36\lib\site-packages\pyarrow > 09/19/2017 01:14 AM . > 09/19/2017 01:14 AM .. > 09/19/2017 01:14 AM 2,382,336 arrow.dll > 09/19/2017 01:14 AM 604,160 arrow_python.dll > 09/19/2017 01:14 AM 3,402 compat.py > ... > {code} > However, I cannot open them using ctypes.cdll. I wonder if some dependency is > missing? > {code} > >>> open('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll', 'rb') > <_io.BufferedReader name='C:\\Program > Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll'> > >>> > >>> cdll.LoadLibrary('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll') > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 426, in > LoadLibrary > return self._dlltype(name) > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 348, in > __init__ > self._handle = _dlopen(self._name, mode) > OSError: [WinError 126] The specified module could not be found > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1554) "ImportError: DLL load failed: The specified module could not be found" on Windows 10
[ https://issues.apache.org/jira/browse/ARROW-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171905#comment-16171905 ] Wes McKinney commented on ARROW-1554: - Are you able to install and use tensorflow from pip on your machine? https://pypi.python.org/pypi/tensorflow That's a very similar build toolchain to ours > "ImportError: DLL load failed: The specified module could not be found" on > Windows 10 > - > > Key: ARROW-1554 > URL: https://issues.apache.org/jira/browse/ARROW-1554 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 > Environment: Windows 10 (x64) > Python 3.6.2 (x64) >Reporter: Dima Ryazanov > Fix For: 0.8.0 > > > I just tried pyarrow on Windows 10, and it fails to import for me: > {code} > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\site-packages\pyarrow\__init__.py", > line 32, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: DLL load failed: The specified module could not be found. > {code} > Not sure which DLL is failing, but I do see some DLLs in the pyarrow folder: > {code} > C:\Users\dima\Documents>dir "C:\Program > Files\Python36\lib\site-packages\pyarrow\" > Volume in drive C has no label. > Volume Serial Number is 4CE9-CC3C > Directory of C:\Program Files\Python36\lib\site-packages\pyarrow > 09/19/2017 01:14 AM . > 09/19/2017 01:14 AM .. > 09/19/2017 01:14 AM 2,382,336 arrow.dll > 09/19/2017 01:14 AM 604,160 arrow_python.dll > 09/19/2017 01:14 AM 3,402 compat.py > ... > {code} > However, I cannot open them using ctypes.cdll. I wonder if some dependency is > missing? > {code} > >>> open('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll', 'rb') > <_io.BufferedReader name='C:\\Program > Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll'> > >>> > >>> cdll.LoadLibrary('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll') > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 426, in > LoadLibrary > return self._dlltype(name) > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 348, in > __init__ > self._handle = _dlopen(self._name, mode) > OSError: [WinError 126] The specified module could not be found > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1554) "ImportError: DLL load failed: The specified module could not be found" on Windows 10
[ https://issues.apache.org/jira/browse/ARROW-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171907#comment-16171907 ] Wes McKinney commented on ARROW-1554: - cc [~Max Risuhin] > "ImportError: DLL load failed: The specified module could not be found" on > Windows 10 > - > > Key: ARROW-1554 > URL: https://issues.apache.org/jira/browse/ARROW-1554 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 > Environment: Windows 10 (x64) > Python 3.6.2 (x64) >Reporter: Dima Ryazanov > Fix For: 0.8.0 > > > I just tried pyarrow on Windows 10, and it fails to import for me: > {code} > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\site-packages\pyarrow\__init__.py", > line 32, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: DLL load failed: The specified module could not be found. > {code} > Not sure which DLL is failing, but I do see some DLLs in the pyarrow folder: > {code} > C:\Users\dima\Documents>dir "C:\Program > Files\Python36\lib\site-packages\pyarrow\" > Volume in drive C has no label. > Volume Serial Number is 4CE9-CC3C > Directory of C:\Program Files\Python36\lib\site-packages\pyarrow > 09/19/2017 01:14 AM . > 09/19/2017 01:14 AM .. > 09/19/2017 01:14 AM 2,382,336 arrow.dll > 09/19/2017 01:14 AM 604,160 arrow_python.dll > 09/19/2017 01:14 AM 3,402 compat.py > ... > {code} > However, I cannot open them using ctypes.cdll. I wonder if some dependency is > missing? > {code} > >>> open('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll', 'rb') > <_io.BufferedReader name='C:\\Program > Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll'> > >>> > >>> cdll.LoadLibrary('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll') > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 426, in > LoadLibrary > return self._dlltype(name) > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 348, in > __init__ > self._handle = _dlopen(self._name, mode) > OSError: [WinError 126] The specified module could not be found > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1554) "ImportError: DLL load failed: The specified module could not be found" on Windows 10
[ https://issues.apache.org/jira/browse/ARROW-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171897#comment-16171897 ] Wes McKinney commented on ARROW-1554: - I see. Are you able to install with conda instead? That's going to be a much more reliable / robust environment all around for Windows users. It also installs dependencies like different MSVC runtimes. If you or anyone knows a tool to figure out what DLL dependency is missing (based on what we've discussed, it suggests that it's missing symbols _outside_ pyarrow, like something in the MSVC runtime), that would be really helpful. > "ImportError: DLL load failed: The specified module could not be found" on > Windows 10 > - > > Key: ARROW-1554 > URL: https://issues.apache.org/jira/browse/ARROW-1554 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 > Environment: Windows 10 (x64) > Python 3.6.2 (x64) >Reporter: Dima Ryazanov > Fix For: 0.8.0 > > > I just tried pyarrow on Windows 10, and it fails to import for me: > {code} > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\site-packages\pyarrow\__init__.py", > line 32, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: DLL load failed: The specified module could not be found. > {code} > Not sure which DLL is failing, but I do see some DLLs in the pyarrow folder: > {code} > C:\Users\dima\Documents>dir "C:\Program > Files\Python36\lib\site-packages\pyarrow\" > Volume in drive C has no label. > Volume Serial Number is 4CE9-CC3C > Directory of C:\Program Files\Python36\lib\site-packages\pyarrow > 09/19/2017 01:14 AM . > 09/19/2017 01:14 AM .. > 09/19/2017 01:14 AM 2,382,336 arrow.dll > 09/19/2017 01:14 AM 604,160 arrow_python.dll > 09/19/2017 01:14 AM 3,402 compat.py > ... > {code} > However, I cannot open them using ctypes.cdll. I wonder if some dependency is > missing? > {code} > >>> open('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll', 'rb') > <_io.BufferedReader name='C:\\Program > Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll'> > >>> > >>> cdll.LoadLibrary('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll') > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 426, in > LoadLibrary > return self._dlltype(name) > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 348, in > __init__ > self._handle = _dlopen(self._name, mode) > OSError: [WinError 126] The specified module could not be found > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (ARROW-1554) "ImportError: DLL load failed: The specified module could not be found" on Windows 10
[ https://issues.apache.org/jira/browse/ARROW-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171874#comment-16171874 ] Dima Ryazanov edited comment on ARROW-1554 at 9/19/17 3:17 PM: --- Yes, using pip. I've tried 0.5.0, 0.6.0, and 0.7.0 - and it's all the same. I just did a "pip uninstall pyarrow"; it failed cause I actually had some files open, so I then manually deleted the ...\site-packages\pyarrow dir, then installed pyarrow again. Same thing. was (Author: dimaryaz): Yes, using pip. I've tried 0.5.0, 0.6.0, and 0.7.0 - and it's all the same. I just did a {code}pip uninstall pyarrow{code}; it failed cause I actually had some files open, so I then manually deleted the ...\site-packages\pyarrow dir, then installed pyarrow again. Same thing. > "ImportError: DLL load failed: The specified module could not be found" on > Windows 10 > - > > Key: ARROW-1554 > URL: https://issues.apache.org/jira/browse/ARROW-1554 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 > Environment: Windows 10 (x64) > Python 3.6.2 (x64) >Reporter: Dima Ryazanov > Fix For: 0.8.0 > > > I just tried pyarrow on Windows 10, and it fails to import for me: > {code} > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\site-packages\pyarrow\__init__.py", > line 32, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: DLL load failed: The specified module could not be found. > {code} > Not sure which DLL is failing, but I do see some DLLs in the pyarrow folder: > {code} > C:\Users\dima\Documents>dir "C:\Program > Files\Python36\lib\site-packages\pyarrow\" > Volume in drive C has no label. > Volume Serial Number is 4CE9-CC3C > Directory of C:\Program Files\Python36\lib\site-packages\pyarrow > 09/19/2017 01:14 AM . > 09/19/2017 01:14 AM .. > 09/19/2017 01:14 AM 2,382,336 arrow.dll > 09/19/2017 01:14 AM 604,160 arrow_python.dll > 09/19/2017 01:14 AM 3,402 compat.py > ... > {code} > However, I cannot open them using ctypes.cdll. I wonder if some dependency is > missing? > {code} > >>> open('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll', 'rb') > <_io.BufferedReader name='C:\\Program > Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll'> > >>> > >>> cdll.LoadLibrary('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll') > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 426, in > LoadLibrary > return self._dlltype(name) > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 348, in > __init__ > self._handle = _dlopen(self._name, mode) > OSError: [WinError 126] The specified module could not be found > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1533) [JAVA] realloc should consider the existing buffer capacity for computing target memory requirement
[ https://issues.apache.org/jira/browse/ARROW-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171876#comment-16171876 ] ASF GitHub Bot commented on ARROW-1533: --- Github user icexelloss commented on the issue: https://github.com/apache/arrow/pull/1112 LGTM too. > [JAVA] realloc should consider the existing buffer capacity for computing > target memory requirement > --- > > Key: ARROW-1533 > URL: https://issues.apache.org/jira/browse/ARROW-1533 > Project: Apache Arrow > Issue Type: Bug > Components: Java - Vectors >Reporter: Siddharth Teotia >Assignee: Siddharth Teotia > Labels: pull-request-available > > We recently encountered a problem when we were trying to add JSON files with > complex schema as datasets. > Initially we started with a Float8Vector with default memory allocation of > (4096 * 8) 32KB. > Went through several iterations of setSafe() to trigger a realloc() from 32KB > to 64KB. > Another round of setSafe() calls to trigger a realloc() from 64KB to 128KB > After that we encountered a BigInt and promoted our vector to UnionVector. > This required us to create a UnionVector with BigIntVector and Float8Vector. > The latter required us to transfer the Float8Vector we were earlier working > with to the Float8Vector inside the Union. > As part of transferTo(), the target Float8Vector got all the ArrowBuf state > (capacity, buffer contents) etc transferred from the source vector. > Later, a realloc was triggered on the Float8Vector inside the UnionVector. > The computation inside realloc() to determine the amount of memory to be > reallocated goes wrong since it makes the decision based on > allocateSizeInBytes -- although this vector was created as part of transfer() > from 128KB source vector, allocateSizeInBytes is still at the initial/default > value of 32KB > We end up allocating a 64KB buffer and attempt to copy 128KB over 64KB and > seg fault when invoking setBytes(). > There is a wrong assumption in realloc() that allocateSizeInBytes is always > equal to data.capacity(). The particular scenario described above exposes > where this assumption could go wrong. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1554) "ImportError: DLL load failed: The specified module could not be found" on Windows 10
[ https://issues.apache.org/jira/browse/ARROW-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171874#comment-16171874 ] Dima Ryazanov commented on ARROW-1554: -- Yes, using pip. I've tried 0.5.0, 0.6.0, and 0.7.0 - and it's all the same. I just did a {code}pip uninstall pyarrow{code}; it failed cause I actually had some files open, so I then manually deleted the ...\site-packages\pyarrow dir, then installed pyarrow again. Same thing. > "ImportError: DLL load failed: The specified module could not be found" on > Windows 10 > - > > Key: ARROW-1554 > URL: https://issues.apache.org/jira/browse/ARROW-1554 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 > Environment: Windows 10 (x64) > Python 3.6.2 (x64) >Reporter: Dima Ryazanov > Fix For: 0.8.0 > > > I just tried pyarrow on Windows 10, and it fails to import for me: > {code} > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\site-packages\pyarrow\__init__.py", > line 32, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: DLL load failed: The specified module could not be found. > {code} > Not sure which DLL is failing, but I do see some DLLs in the pyarrow folder: > {code} > C:\Users\dima\Documents>dir "C:\Program > Files\Python36\lib\site-packages\pyarrow\" > Volume in drive C has no label. > Volume Serial Number is 4CE9-CC3C > Directory of C:\Program Files\Python36\lib\site-packages\pyarrow > 09/19/2017 01:14 AM . > 09/19/2017 01:14 AM .. > 09/19/2017 01:14 AM 2,382,336 arrow.dll > 09/19/2017 01:14 AM 604,160 arrow_python.dll > 09/19/2017 01:14 AM 3,402 compat.py > ... > {code} > However, I cannot open them using ctypes.cdll. I wonder if some dependency is > missing? > {code} > >>> open('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll', 'rb') > <_io.BufferedReader name='C:\\Program > Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll'> > >>> > >>> cdll.LoadLibrary('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll') > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 426, in > LoadLibrary > return self._dlltype(name) > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 348, in > __init__ > self._handle = _dlopen(self._name, mode) > OSError: [WinError 126] The specified module could not be found > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1553) [JAVA] Implement setInitialCapacity for MapWriter and pass on this capacity during lazy creation of child vectors
[ https://issues.apache.org/jira/browse/ARROW-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171816#comment-16171816 ] ASF GitHub Bot commented on ARROW-1553: --- Github user jacques-n commented on the issue: https://github.com/apache/arrow/pull/1113 LGTM. Definitely helps our use case. Agree with @icexelloss that we should add a test as well for this situation. > [JAVA] Implement setInitialCapacity for MapWriter and pass on this capacity > during lazy creation of child vectors > - > > Key: ARROW-1553 > URL: https://issues.apache.org/jira/browse/ARROW-1553 > Project: Apache Arrow > Issue Type: Bug > Components: Java - Vectors >Reporter: Siddharth Teotia >Assignee: Siddharth Teotia > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1533) [JAVA] realloc should consider the existing buffer capacity for computing target memory requirement
[ https://issues.apache.org/jira/browse/ARROW-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171812#comment-16171812 ] ASF GitHub Bot commented on ARROW-1533: --- Github user jacques-n commented on the issue: https://github.com/apache/arrow/pull/1112 Good additional questions that we should address in ARROW-1463. +1 on getting this merged. > [JAVA] realloc should consider the existing buffer capacity for computing > target memory requirement > --- > > Key: ARROW-1533 > URL: https://issues.apache.org/jira/browse/ARROW-1533 > Project: Apache Arrow > Issue Type: Bug > Components: Java - Vectors >Reporter: Siddharth Teotia >Assignee: Siddharth Teotia > Labels: pull-request-available > > We recently encountered a problem when we were trying to add JSON files with > complex schema as datasets. > Initially we started with a Float8Vector with default memory allocation of > (4096 * 8) 32KB. > Went through several iterations of setSafe() to trigger a realloc() from 32KB > to 64KB. > Another round of setSafe() calls to trigger a realloc() from 64KB to 128KB > After that we encountered a BigInt and promoted our vector to UnionVector. > This required us to create a UnionVector with BigIntVector and Float8Vector. > The latter required us to transfer the Float8Vector we were earlier working > with to the Float8Vector inside the Union. > As part of transferTo(), the target Float8Vector got all the ArrowBuf state > (capacity, buffer contents) etc transferred from the source vector. > Later, a realloc was triggered on the Float8Vector inside the UnionVector. > The computation inside realloc() to determine the amount of memory to be > reallocated goes wrong since it makes the decision based on > allocateSizeInBytes -- although this vector was created as part of transfer() > from 128KB source vector, allocateSizeInBytes is still at the initial/default > value of 32KB > We end up allocating a 64KB buffer and attempt to copy 128KB over 64KB and > seg fault when invoking setBytes(). > There is a wrong assumption in realloc() that allocateSizeInBytes is always > equal to data.capacity(). The particular scenario described above exposes > where this assumption could go wrong. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1538) [C++] Support Ubuntu 14.04 in .deb packaging automation
[ https://issues.apache.org/jira/browse/ARROW-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171762#comment-16171762 ] Wes McKinney commented on ARROW-1538: - hi [~rvernica] you need ARROW-1546 https://github.com/apache/arrow/commit/bfe657909f5e7d96b7b8e5179baa17044b6ea375 > [C++] Support Ubuntu 14.04 in .deb packaging automation > --- > > Key: ARROW-1538 > URL: https://issues.apache.org/jira/browse/ARROW-1538 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Packaging >Reporter: Wes McKinney > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1209) [C++] Implement converter between Arrow record batches and Avro records
[ https://issues.apache.org/jira/browse/ARROW-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171759#comment-16171759 ] ASF GitHub Bot commented on ARROW-1209: --- Github user wesm commented on the issue: https://github.com/apache/arrow/pull/1026 Wow, providing a `FILE*`! That is incredibly restrictive. I will have to poke around at the C implementation and also look in other Avro users like Impala https://github.com/apache/incubator-impala/blob/master/be/src/exec/hdfs-avro-scanner.cc > [C++] Implement converter between Arrow record batches and Avro records > --- > > Key: ARROW-1209 > URL: https://issues.apache.org/jira/browse/ARROW-1209 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney > Labels: pull-request-available > > This would be useful for streaming systems that need to consume or produce > Avro in C/C++ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1555) PyArrow write_to_dataset on s3
[ https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171755#comment-16171755 ] Wes McKinney commented on ARROW-1555: - cc [~fjetter] This may not be too hard to fix -- I don't think that {{parquet.write_to_dataset}} has been tested with S3, so a patch to make this S3-friendly would be welcome. > PyArrow write_to_dataset on s3 > -- > > Key: ARROW-1555 > URL: https://issues.apache.org/jira/browse/ARROW-1555 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Young-Jun Ko >Priority: Trivial > Fix For: 0.8.0 > > > When writing a arrow table to s3, I get an NotImplemented Exception. > The root cause is in _ensure_filesystem and can be reproduced as follows: > import pyarrow > import pyarrow.parquet as pqa > import s3fs > s3 = s3fs.S3FileSystem() > pqa._ensure_filesystem(s3).exists("anything") > It appears that the S3FSWrapper that is instantiated in _ensure_filesystem > does not expose the exist method of s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1555) PyArrow write_to_dataset on s3
[ https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1555: Fix Version/s: 0.8.0 > PyArrow write_to_dataset on s3 > -- > > Key: ARROW-1555 > URL: https://issues.apache.org/jira/browse/ARROW-1555 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Young-Jun Ko >Priority: Trivial > Fix For: 0.8.0 > > > When writing a arrow table to s3, I get an NotImplemented Exception. > The root cause is in _ensure_filesystem and can be reproduced as follows: > import pyarrow > import pyarrow.parquet as pqa > import s3fs > s3 = s3fs.S3FileSystem() > pqa._ensure_filesystem(s3).exists("anything") > It appears that the S3FSWrapper that is instantiated in _ensure_filesystem > does not expose the exist method of s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1554) "ImportError: DLL load failed: The specified module could not be found" on Windows 10
[ https://issues.apache.org/jira/browse/ARROW-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171680#comment-16171680 ] Wes McKinney commented on ARROW-1554: - I just tested the 0.7.0 wheel locally on Windows 10 and it works OK for me. Is it possible that you had one of the DLLs open when you updated pyarrow? Maybe try removing the directory and reinstalling > "ImportError: DLL load failed: The specified module could not be found" on > Windows 10 > - > > Key: ARROW-1554 > URL: https://issues.apache.org/jira/browse/ARROW-1554 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 > Environment: Windows 10 (x64) > Python 3.6.2 (x64) >Reporter: Dima Ryazanov > Fix For: 0.8.0 > > > I just tried pyarrow on Windows 10, and it fails to import for me: > {code} > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\site-packages\pyarrow\__init__.py", > line 32, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: DLL load failed: The specified module could not be found. > {code} > Not sure which DLL is failing, but I do see some DLLs in the pyarrow folder: > {code} > C:\Users\dima\Documents>dir "C:\Program > Files\Python36\lib\site-packages\pyarrow\" > Volume in drive C has no label. > Volume Serial Number is 4CE9-CC3C > Directory of C:\Program Files\Python36\lib\site-packages\pyarrow > 09/19/2017 01:14 AM . > 09/19/2017 01:14 AM .. > 09/19/2017 01:14 AM 2,382,336 arrow.dll > 09/19/2017 01:14 AM 604,160 arrow_python.dll > 09/19/2017 01:14 AM 3,402 compat.py > ... > {code} > However, I cannot open them using ctypes.cdll. I wonder if some dependency is > missing? > {code} > >>> open('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll', 'rb') > <_io.BufferedReader name='C:\\Program > Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll'> > >>> > >>> cdll.LoadLibrary('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll') > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 426, in > LoadLibrary > return self._dlltype(name) > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 348, in > __init__ > self._handle = _dlopen(self._name, mode) > OSError: [WinError 126] The specified module could not be found > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (ARROW-1554) "ImportError: DLL load failed: The specified module could not be found" on Windows 10
[ https://issues.apache.org/jira/browse/ARROW-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171665#comment-16171665 ] Wes McKinney edited comment on ARROW-1554 at 9/19/17 1:31 PM: -- You installed the wheel with pip is that right? Is it pyarrow 0.7.0? was (Author: wesmckinn): You installed the wheel with pip is that right? Is it pyarrow 0.6.0? > "ImportError: DLL load failed: The specified module could not be found" on > Windows 10 > - > > Key: ARROW-1554 > URL: https://issues.apache.org/jira/browse/ARROW-1554 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 > Environment: Windows 10 (x64) > Python 3.6.2 (x64) >Reporter: Dima Ryazanov > Fix For: 0.8.0 > > > I just tried pyarrow on Windows 10, and it fails to import for me: > {code} > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\site-packages\pyarrow\__init__.py", > line 32, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: DLL load failed: The specified module could not be found. > {code} > Not sure which DLL is failing, but I do see some DLLs in the pyarrow folder: > {code} > C:\Users\dima\Documents>dir "C:\Program > Files\Python36\lib\site-packages\pyarrow\" > Volume in drive C has no label. > Volume Serial Number is 4CE9-CC3C > Directory of C:\Program Files\Python36\lib\site-packages\pyarrow > 09/19/2017 01:14 AM . > 09/19/2017 01:14 AM .. > 09/19/2017 01:14 AM 2,382,336 arrow.dll > 09/19/2017 01:14 AM 604,160 arrow_python.dll > 09/19/2017 01:14 AM 3,402 compat.py > ... > {code} > However, I cannot open them using ctypes.cdll. I wonder if some dependency is > missing? > {code} > >>> open('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll', 'rb') > <_io.BufferedReader name='C:\\Program > Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll'> > >>> > >>> cdll.LoadLibrary('C:\\Program > >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll') > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 426, in > LoadLibrary > return self._dlltype(name) > File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 348, in > __init__ > self._handle = _dlopen(self._name, mode) > OSError: [WinError 126] The specified module could not be found > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)