[ https://issues.apache.org/jira/browse/ARROW-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117004#comment-16117004 ]
Aditi Breed edited comment on ARROW-1247 at 8/7/17 6:35 PM: ------------------------------------------------------------ I took the dataset and divided it into chunks, to see which chunk would not save properly, In one of the chunks, a string value had about 5 Mill chars, which is a problem ( bad dataset from what I found). This was causing the dataset save to fail. I dont think this is a pyarrow issue. was (Author: p ved): I took the dataset and divided it into chunks, to see which chunk would not save properly, In one of the chunks, a strign value had about 5 Mill chars, which is a problem ( bad dataset from what I found). This was causing the dataset save to fail. I dont think this is a pyarrow issue. > [Python] pyarrow causes python to crash errors on parquet.dll > ------------------------------------------------------------- > > Key: ARROW-1247 > URL: https://issues.apache.org/jira/browse/ARROW-1247 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.4.1 > Environment: Python Version: > 3.5.2 |Anaconda custom (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1900 > 64 bit (AMD64)] > Windows Edition: Windows Server 2012 R2 > Reporter: Aditi Breed > > Hello, > I have a script which fetches data, and stores the data in Pandas > dataframe. > I make 3 aggregations of data, MEAN/STDEV/MAX, each of which are converted to > an arrow table and saved on the disk as a parquet file. > This code works just fine for 100-500 records, but errors out for bigger > volume. I also know this code works because another developer is using the > same code on a mirrored machine ( in terms of hardware ) and it works. > The order of the dataset I am trying to save is millions. > The code errors out @ line pq.write_table(arrowTable, filePath). > Here is the code: > arrowTable = pa.Table.from_pandas(self.grpByMeanDS2) > > begintime = datetime.now() > begintime_str = begintime.strftime("%Y%m%d%I%M%S") > > filePath = SaveFileLoc + "\\Raw\\" + agg + "Data" + begintime_str + > ".parq" > print('Begin Saving File') > pq.write_table(arrowTable, filePath) > print('Done Saving File') > > print('Appending FilePath to List') > self.listspDF.append(filePath) > print('Done Appending FilePath to List') > > Python crashes and throws a "python has to close error". > Following is the detailed error: > ------------------ > Problem Event Name: APPCRASH > Application Name: python.exe > Application Version: 3.5.2150.1013 > Application Timestamp: 577be340 > Fault Module Name: parquet.dll > Fault Module Version: 0.0.0.0 > Fault Module Timestamp: 59403662 > Exception Code: c0000005 > Exception Offset: 000000000005f990 > OS Version: 6.3.9600.2.0.0.400.8 > Locale ID: 1033 > Read our privacy statement online: > http://go.microsoft.com/fwlink/?linkid=280262 > If the online privacy statement is not available, please read our privacy > statement offline: > C:\Windows\system32\en-US\erofflps.txt > -------------------------------------------- > I have tried updating Python and pyarrow, with no luck. > Following is the version of python: > import sys > print (sys.version) > 3.5.2 |Anaconda custom (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC > v.1900 64 bit (AMD64)] > Following are results of pip freeze: > alabaster==0.7.9 > anaconda-clean==1.0 > anaconda-client==1.5.1 > anaconda-navigator==1.3.1 > argcomplete==1.0.0 > astroid==1.4.7 > astropy==2.0 > Babel==2.3.4 > backports.shutil-get-terminal-size==1.0.0 > beautifulsoup4==4.5.1 > bitarray==0.8.1 > blaze==0.10.1 > bokeh==0.12.2 > boto==2.42.0 > Bottleneck==1.2.1 > cffi==1.7.0 > chest==0.2.3 > click==6.6 > cloudpickle==0.2.1 > clyent==1.2.2 > colorama==0.3.7 > comtypes==1.1.2 > conda==4.3.22 > conda-build==2.0.2 > configobj==5.0.6 > contextlib2==0.5.3 > cryptography==1.5 > cycler==0.10.0 > Cython==0.24.1 > cytoolz==0.8.0 > dask==0.11.0 > datashape==0.5.2 > decorator==4.0.10 > dill==0.2.5 > docutils==0.12 > dynd===c328ab7 > et-xmlfile==1.0.1 > fastcache==1.0.2 > filelock==2.0.6 > Flask==0.11.1 > Flask-Cors==2.1.2 > gevent==1.1.2 > greenlet==0.4.10 > h5py==2.7.0 > HeapDict==1.0.0 > idna==2.1 > imageio==2.2.0 > imagesize==0.7.1 > ipykernel==4.5.0 > ipython==5.1.0 > ipython-genutils==0.1.0 > ipywidgets==5.2.2 > itsdangerous==0.24 > jdcal==1.2 > jedi==0.9.0 > Jinja2==2.8 > jsonschema==2.5.1 > jupyter==1.0.0 > jupyter-client==4.4.0 > jupyter-console==5.0.0 > jupyter-core==4.2.0 > lazy-object-proxy==1.2.1 > llvmlite==0.19.0 > locket==0.2.0 > lxml==3.6.4 > MarkupSafe==0.23 > matplotlib==2.0.2 > menuinst==1.4.1 > mistune==0.7.3 > mpmath==0.19 > multipledispatch==0.4.8 > nb-anacondacloud==1.2.0 > nb-conda==2.0.0 > nb-conda-kernels==2.0.0 > nbconvert==4.2.0 > nbformat==4.1.0 > nbpresent==3.0.2 > networkx==1.11 > nltk==3.2.1 > nose==1.3.7 > notebook==4.2.3 > numba==0.34.0 > numexpr==2.6.2 > numpy==1.13.1 > odo==0.5.0 > openpyxl==2.3.2 > pandas==0.20.2 > partd==0.3.6 > path.py==0.0.0 > pathlib2==2.1.0 > patsy==0.4.1 > pep8==1.7.0 > pickleshare==0.7.4 > Pillow==3.3.1 > pkginfo==1.3.2 > ply==3.9 > prompt-toolkit==1.0.3 > psutil==4.3.1 > py==1.4.31 > py4j==0.10.4 > pyarrow==0.4.1 > pyasn1==0.1.9 > pycosat==0.6.1 > pycparser==2.14 > pycrypto==2.6.1 > pycurl==7.43.0 > pyflakes==1.3.0 > Pygments==2.1.3 > pyidealdata==0.7.0 > pylint==1.5.4 > pyodbc==4.0.17 > pyOpenSSL==16.2.0 > pyparsing==2.1.4 > pyspark==2.1.0+hadoop2.7 > pytest==2.9.2 > python-dateutil==2.5.3 > pytz==2016.6.1 > PyUber==1.4.4 > PyWavelets==0.5.2 > pywin32==220 > PyYAML==3.12 > pyzmq==15.4.0 > QtAwesome==0.3.3 > qtconsole==4.2.1 > QtPy==1.1.2 > requests==2.14.2 > rope-py3k==0.9.4.post1 > ruamel-yaml===-VERSION > scikit-image==0.13.0 > scikit-learn==0.18.2 > scipy==0.19.1 > simplegeneric==0.8.1 > singledispatch==3.4.0.3 > six==1.10.0 > snowballstemmer==1.2.1 > sockjs-tornado==1.0.3 > sphinx==1.4.6 > spyder==3.0.0 > SQLAlchemy==1.0.13 > statsmodels==0.8.0 > sympy==1.0 > tables==3.2.2 > toolz==0.8.0 > tornado==4.4.1 > traitlets==4.3.0 > unicodecsv==0.14.1 > wcwidth==0.1.7 > Werkzeug==0.11.11 > widgetsnbextension==1.2.6 > win-unicode-console==0.5 > wrapt==1.10.6 > xlrd==1.0.0 > XlsxWriter==0.9.3 > xlwings==0.10.0 > xlwt==1.1.2 > I was wondering if someone could shed light why pyarrow would not work on a > certain machine ? > Thanks, > Adu -- This message was sent by Atlassian JIRA (v6.4.14#64029)