[jira] [Commented] (ARROW-11427) [C++] Arrow uses AVX512 instructions even when not supported by the OS
[ https://issues.apache.org/jira/browse/ARROW-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17279467#comment-17279467 ] Ali Cetin commented on ARROW-11427: --- [~apitrou], I have tested the fix in WS2012 and WS2016. I can verify that it works. It defaults to avx2 in WS2012 and avx512 in WS2016. (y) > [C++] Arrow uses AVX512 instructions even when not supported by the OS > -- > > Key: ARROW-11427 > URL: https://issues.apache.org/jira/browse/ARROW-11427 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Environment: Windows Server 2012 Datacenter, Azure VM (D2_v2), Intel > Xeon Platinum 8171m >Reporter: Ali Cetin >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > *Update*: Azure (D2_v2) VM no longer spins-up with Xeon Platinum 8171m, so > I'm unable to test it with other OS's. Azure VM's are assigned different > type of CPU's of same "class" depending on availability. I will try my "luck" > later. > VM's w/ Xeon Platinum 8171m running on Azure (D2_v2) start crashing after > upgrading from pyarrow 2.0 to pyarrow 3.0. However, this only happens when > reading parquet files larger than 4096 bits!? > Windows closes Python with exit code 255 and produces this: > > {code:java} > Faulting application name: python.exe, version: 3.8.3150.1013, time stamp: > 0x5ebc7702 Faulting module name: arrow.dll, version: 0.0.0.0, time stamp: > 0x60060ce3 Exception code: 0xc01d Fault offset: 0x0047aadc > Faulting process id: 0x1b10 Faulting application start time: > 0x01d6f4a43dca3c14 Faulting application path: > D:\SvcFab\_App\SomeApp.FabricType_App32\SomeApp.Fabric.Executor.ProcessActorPkg.Code.1.0.218-prod\Python38\python.exe > Faulting module path: > D:\SvcFab\_App\SomeApp.FabricType_App32\temp\Executions\50cfffe8-9250-4ac7-8ba8-08d8c2bb3edf\.venv\lib\site-packages\pyarrow\arrow.dll{code} > > Tested on: > ||OS||Xeon Platinum 8171m or 8272CL||Other CPUs|| > |Windows Server 2012 Data Center|Fail|OK| > |Windows Server 2016 Data Center| OK|OK| > |Windows Server 2019 Data Center| | | > |Windows 10| |OK| > > Example code (Python): > {code:java} > import numpy as np > import pandas as pd > data_len = 2**5 > data = pd.DataFrame( > {"values": np.arange(0., float(data_len), dtype=float)}, > index=np.arange(0, data_len, dtype=int) > ) > data.to_parquet("test.parquet") > data = pd.read_parquet("test.parquet", engine="pyarrow") # fails here! > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11427) [C++] Arrow uses AVX512 instructions even when not supported by the OS
[ https://issues.apache.org/jira/browse/ARROW-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277432#comment-17277432 ] Ali Cetin commented on ARROW-11427: --- Cool. I can give it a try in the coming days. > [C++] Arrow uses AVX512 instructions even when not supported by the OS > -- > > Key: ARROW-11427 > URL: https://issues.apache.org/jira/browse/ARROW-11427 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Environment: Windows Server 2012 Datacenter, Azure VM (D2_v2), Intel > Xeon Platinum 8171m >Reporter: Ali Cetin >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > *Update*: Azure (D2_v2) VM no longer spins-up with Xeon Platinum 8171m, so > I'm unable to test it with other OS's. Azure VM's are assigned different > type of CPU's of same "class" depending on availability. I will try my "luck" > later. > VM's w/ Xeon Platinum 8171m running on Azure (D2_v2) start crashing after > upgrading from pyarrow 2.0 to pyarrow 3.0. However, this only happens when > reading parquet files larger than 4096 bits!? > Windows closes Python with exit code 255 and produces this: > > {code:java} > Faulting application name: python.exe, version: 3.8.3150.1013, time stamp: > 0x5ebc7702 Faulting module name: arrow.dll, version: 0.0.0.0, time stamp: > 0x60060ce3 Exception code: 0xc01d Fault offset: 0x0047aadc > Faulting process id: 0x1b10 Faulting application start time: > 0x01d6f4a43dca3c14 Faulting application path: > D:\SvcFab\_App\SomeApp.FabricType_App32\SomeApp.Fabric.Executor.ProcessActorPkg.Code.1.0.218-prod\Python38\python.exe > Faulting module path: > D:\SvcFab\_App\SomeApp.FabricType_App32\temp\Executions\50cfffe8-9250-4ac7-8ba8-08d8c2bb3edf\.venv\lib\site-packages\pyarrow\arrow.dll{code} > > Tested on: > ||OS||Xeon Platinum 8171m or 8272CL||Other CPUs|| > |Windows Server 2012 Data Center|Fail|OK| > |Windows Server 2016 Data Center| OK|OK| > |Windows Server 2019 Data Center| | | > |Windows 10| |OK| > > Example code (Python): > {code:java} > import numpy as np > import pandas as pd > data_len = 2**5 > data = pd.DataFrame( > {"values": np.arange(0., float(data_len), dtype=float)}, > index=np.arange(0, data_len, dtype=int) > ) > data.to_parquet("test.parquet") > data = pd.read_parquet("test.parquet", engine="pyarrow") # fails here! > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11427) [C++] Arrow uses AVX512 instructions even when not supported by the OS
[ https://issues.apache.org/jira/browse/ARROW-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277421#comment-17277421 ] Antoine Pitrou commented on ARROW-11427: (removed previous post, sorry) [~ali.cetin] Can you try to install the following wheel and see it if fixes the issue? [https://github.com/ursacomputing/crossbow/releases/download/build-27-github-wheel-windows-cp38/pyarrow-3.1.0.dev112-cp38-cp38-win_amd64.whl] Also, it will allow you to inspect the current SIMD level, like this: {code:java} $ python -c "import pyarrow as pa; print(pa.runtime_info())" RuntimeInfo(simd_level='avx2', detected_simd_level='avx2') {code} You should get "avx2" on Windows Server 2012, and "avx512" on Windows Server 2016. > [C++] Arrow uses AVX512 instructions even when not supported by the OS > -- > > Key: ARROW-11427 > URL: https://issues.apache.org/jira/browse/ARROW-11427 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Environment: Windows Server 2012 Datacenter, Azure VM (D2_v2), Intel > Xeon Platinum 8171m >Reporter: Ali Cetin >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > *Update*: Azure (D2_v2) VM no longer spins-up with Xeon Platinum 8171m, so > I'm unable to test it with other OS's. Azure VM's are assigned different > type of CPU's of same "class" depending on availability. I will try my "luck" > later. > VM's w/ Xeon Platinum 8171m running on Azure (D2_v2) start crashing after > upgrading from pyarrow 2.0 to pyarrow 3.0. However, this only happens when > reading parquet files larger than 4096 bits!? > Windows closes Python with exit code 255 and produces this: > > {code:java} > Faulting application name: python.exe, version: 3.8.3150.1013, time stamp: > 0x5ebc7702 Faulting module name: arrow.dll, version: 0.0.0.0, time stamp: > 0x60060ce3 Exception code: 0xc01d Fault offset: 0x0047aadc > Faulting process id: 0x1b10 Faulting application start time: > 0x01d6f4a43dca3c14 Faulting application path: > D:\SvcFab\_App\SomeApp.FabricType_App32\SomeApp.Fabric.Executor.ProcessActorPkg.Code.1.0.218-prod\Python38\python.exe > Faulting module path: > D:\SvcFab\_App\SomeApp.FabricType_App32\temp\Executions\50cfffe8-9250-4ac7-8ba8-08d8c2bb3edf\.venv\lib\site-packages\pyarrow\arrow.dll{code} > > Tested on: > ||OS||Xeon Platinum 8171m or 8272CL||Other CPUs|| > |Windows Server 2012 Data Center|Fail|OK| > |Windows Server 2016 Data Center| OK|OK| > |Windows Server 2019 Data Center| | | > |Windows 10| |OK| > > Example code (Python): > {code:java} > import numpy as np > import pandas as pd > data_len = 2**5 > data = pd.DataFrame( > {"values": np.arange(0., float(data_len), dtype=float)}, > index=np.arange(0, data_len, dtype=int) > ) > data.to_parquet("test.parquet") > data = pd.read_parquet("test.parquet", engine="pyarrow") # fails here! > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11427) [C++] Arrow uses AVX512 instructions even when not supported by the OS
[ https://issues.apache.org/jira/browse/ARROW-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277352#comment-17277352 ] Antoine Pitrou commented on ARROW-11427: [~ali.cetin] Could you try installing this wheel and see if it fixes the issue: [https://github.com/ursacomputing/crossbow/releases/download/build-26-github-wheel-windows-cp38/pyarrow-3.1.0.dev109-cp38-cp38-win_amd64.whl] ? > [C++] Arrow uses AVX512 instructions even when not supported by the OS > -- > > Key: ARROW-11427 > URL: https://issues.apache.org/jira/browse/ARROW-11427 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Environment: Windows Server 2012 Datacenter, Azure VM (D2_v2), Intel > Xeon Platinum 8171m >Reporter: Ali Cetin >Priority: Major > Fix For: 4.0.0 > > > *Update*: Azure (D2_v2) VM no longer spins-up with Xeon Platinum 8171m, so > I'm unable to test it with other OS's. Azure VM's are assigned different > type of CPU's of same "class" depending on availability. I will try my "luck" > later. > VM's w/ Xeon Platinum 8171m running on Azure (D2_v2) start crashing after > upgrading from pyarrow 2.0 to pyarrow 3.0. However, this only happens when > reading parquet files larger than 4096 bits!? > Windows closes Python with exit code 255 and produces this: > > {code:java} > Faulting application name: python.exe, version: 3.8.3150.1013, time stamp: > 0x5ebc7702 Faulting module name: arrow.dll, version: 0.0.0.0, time stamp: > 0x60060ce3 Exception code: 0xc01d Fault offset: 0x0047aadc > Faulting process id: 0x1b10 Faulting application start time: > 0x01d6f4a43dca3c14 Faulting application path: > D:\SvcFab\_App\SomeApp.FabricType_App32\SomeApp.Fabric.Executor.ProcessActorPkg.Code.1.0.218-prod\Python38\python.exe > Faulting module path: > D:\SvcFab\_App\SomeApp.FabricType_App32\temp\Executions\50cfffe8-9250-4ac7-8ba8-08d8c2bb3edf\.venv\lib\site-packages\pyarrow\arrow.dll{code} > > Tested on: > ||OS||Xeon Platinum 8171m or 8272CL||Other CPUs|| > |Windows Server 2012 Data Center|Fail|OK| > |Windows Server 2016 Data Center| OK|OK| > |Windows Server 2019 Data Center| | | > |Windows 10| |OK| > > Example code (Python): > {code:java} > import numpy as np > import pandas as pd > data_len = 2**5 > data = pd.DataFrame( > {"values": np.arange(0., float(data_len), dtype=float)}, > index=np.arange(0, data_len, dtype=int) > ) > data.to_parquet("test.parquet") > data = pd.read_parquet("test.parquet", engine="pyarrow") # fails here! > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)