[ 
https://issues.apache.org/jira/browse/ARROW-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100348#comment-16100348
 ] 

Wes McKinney commented on ARROW-1247:
-------------------------------------

I am not sure how to debug from here. You have one of two options:

* Create an obfuscated data file that does not contain proprietary IP, but 
exhibits the error
* Provide reproducible code that generates data in memory that exhibits the 
error

Short of something reproducible to enable us to locate the source of the 
problem, we'll have to wait for someone else to have an issue that may be able 
to provide a reproducible example.

pyarrow 0.5.0 has just been published on conda-forge, can you try this again 
with

{{conda install pyarrow=0.5.0 -c conda-forge}}

and let me know if it is still a problem?

> pyarrow causes python to crash errors on parquet.dll
> ----------------------------------------------------
>
>                 Key: ARROW-1247
>                 URL: https://issues.apache.org/jira/browse/ARROW-1247
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.4.1
>         Environment: Python Version:
> 3.5.2 |Anaconda custom (64-bit)| (default, Jul  5 2016, 11:41:13) [MSC v.1900 
> 64 bit (AMD64)]
> Windows Edition: Windows Server 2012 R2
>            Reporter: Aditi Breed
>
> Hello,
>       I have a script which fetches data, and stores the data in Pandas 
> dataframe.
> I make 3 aggregations of data, MEAN/STDEV/MAX, each of which are converted to 
> an arrow table and saved on the disk as a parquet file.
> This code works just fine for 100-500 records, but errors out for bigger 
> volume. I also know this code works because another developer is using the 
> same code on a mirrored machine ( in terms of hardware ) and it works.
> The order of the dataset I am trying to save is millions.
> The code errors out @ line    pq.write_table(arrowTable, filePath).
> Here is the code:
>     arrowTable = pa.Table.from_pandas(self.grpByMeanDS2)
>       
>       begintime = datetime.now()
>       begintime_str = begintime.strftime("%Y%m%d%I%M%S")              
>       
>       filePath = SaveFileLoc + "\\Raw\\" + agg + "Data" + begintime_str + 
> ".parq"
>       print('Begin Saving File')
>       pq.write_table(arrowTable, filePath)
>       print('Done Saving File')
>       
>       print('Appending FilePath to List')
>       self.listspDF.append(filePath)
>       print('Done Appending FilePath to List')
>       
> Python crashes and throws a "python has to close error".
> Following is the detailed error:
> ------------------
> Problem Event Name:                        APPCRASH
>   Application Name:                           python.exe
>   Application Version:                        3.5.2150.1013
>   Application Timestamp:                  577be340
>   Fault Module Name:                        parquet.dll
>   Fault Module Version:                     0.0.0.0
>   Fault Module Timestamp:               59403662
>   Exception Code:                               c0000005
>   Exception Offset:                              000000000005f990
>   OS Version:                                       6.3.9600.2.0.0.400.8
>   Locale ID:                                          1033
> Read our privacy statement online:
>   http://go.microsoft.com/fwlink/?linkid=280262
> If the online privacy statement is not available, please read our privacy 
> statement offline:
>   C:\Windows\system32\en-US\erofflps.txt
> --------------------------------------------
> I have tried updating Python and pyarrow, with no luck.
> Following is the version of python:
>     import sys
>     print (sys.version)
>     3.5.2 |Anaconda custom (64-bit)| (default, Jul  5 2016, 11:41:13) [MSC 
> v.1900 64 bit (AMD64)]
> Following are results of pip freeze:
>       alabaster==0.7.9
>       anaconda-clean==1.0
>       anaconda-client==1.5.1
>       anaconda-navigator==1.3.1
>       argcomplete==1.0.0
>       astroid==1.4.7
>       astropy==2.0
>       Babel==2.3.4
>       backports.shutil-get-terminal-size==1.0.0
>       beautifulsoup4==4.5.1
>       bitarray==0.8.1
>       blaze==0.10.1
>       bokeh==0.12.2
>       boto==2.42.0
>       Bottleneck==1.2.1
>       cffi==1.7.0
>       chest==0.2.3
>       click==6.6
>       cloudpickle==0.2.1
>       clyent==1.2.2
>       colorama==0.3.7
>       comtypes==1.1.2
>       conda==4.3.22
>       conda-build==2.0.2
>       configobj==5.0.6
>       contextlib2==0.5.3
>       cryptography==1.5
>       cycler==0.10.0
>       Cython==0.24.1
>       cytoolz==0.8.0
>       dask==0.11.0
>       datashape==0.5.2
>       decorator==4.0.10
>       dill==0.2.5
>       docutils==0.12
>       dynd===c328ab7
>       et-xmlfile==1.0.1
>       fastcache==1.0.2
>       filelock==2.0.6
>       Flask==0.11.1
>       Flask-Cors==2.1.2
>       gevent==1.1.2
>       greenlet==0.4.10
>       h5py==2.7.0
>       HeapDict==1.0.0
>       idna==2.1
>       imageio==2.2.0
>       imagesize==0.7.1
>       ipykernel==4.5.0
>       ipython==5.1.0
>       ipython-genutils==0.1.0
>       ipywidgets==5.2.2
>       itsdangerous==0.24
>       jdcal==1.2
>       jedi==0.9.0
>       Jinja2==2.8
>       jsonschema==2.5.1
>       jupyter==1.0.0
>       jupyter-client==4.4.0
>       jupyter-console==5.0.0
>       jupyter-core==4.2.0
>       lazy-object-proxy==1.2.1
>       llvmlite==0.19.0
>       locket==0.2.0
>       lxml==3.6.4
>       MarkupSafe==0.23
>       matplotlib==2.0.2
>       menuinst==1.4.1
>       mistune==0.7.3
>       mpmath==0.19
>       multipledispatch==0.4.8
>       nb-anacondacloud==1.2.0
>       nb-conda==2.0.0
>       nb-conda-kernels==2.0.0
>       nbconvert==4.2.0
>       nbformat==4.1.0
>       nbpresent==3.0.2
>       networkx==1.11
>       nltk==3.2.1
>       nose==1.3.7
>       notebook==4.2.3
>       numba==0.34.0
>       numexpr==2.6.2
>       numpy==1.13.1
>       odo==0.5.0
>       openpyxl==2.3.2
>       pandas==0.20.2
>       partd==0.3.6
>       path.py==0.0.0
>       pathlib2==2.1.0
>       patsy==0.4.1
>       pep8==1.7.0
>       pickleshare==0.7.4
>       Pillow==3.3.1
>       pkginfo==1.3.2
>       ply==3.9
>       prompt-toolkit==1.0.3
>       psutil==4.3.1
>       py==1.4.31
>       py4j==0.10.4
>       pyarrow==0.4.1
>       pyasn1==0.1.9
>       pycosat==0.6.1
>       pycparser==2.14
>       pycrypto==2.6.1
>       pycurl==7.43.0
>       pyflakes==1.3.0
>       Pygments==2.1.3
>       pyidealdata==0.7.0
>       pylint==1.5.4
>       pyodbc==4.0.17
>       pyOpenSSL==16.2.0
>       pyparsing==2.1.4
>       pyspark==2.1.0+hadoop2.7
>       pytest==2.9.2
>       python-dateutil==2.5.3
>       pytz==2016.6.1
>       PyUber==1.4.4
>       PyWavelets==0.5.2
>       pywin32==220
>       PyYAML==3.12
>       pyzmq==15.4.0
>       QtAwesome==0.3.3
>       qtconsole==4.2.1
>       QtPy==1.1.2
>       requests==2.14.2
>       rope-py3k==0.9.4.post1
>       ruamel-yaml===-VERSION
>       scikit-image==0.13.0
>       scikit-learn==0.18.2
>       scipy==0.19.1
>       simplegeneric==0.8.1
>       singledispatch==3.4.0.3
>       six==1.10.0
>       snowballstemmer==1.2.1
>       sockjs-tornado==1.0.3
>       sphinx==1.4.6
>       spyder==3.0.0
>       SQLAlchemy==1.0.13
>       statsmodels==0.8.0
>       sympy==1.0
>       tables==3.2.2
>       toolz==0.8.0
>       tornado==4.4.1
>       traitlets==4.3.0
>       unicodecsv==0.14.1
>       wcwidth==0.1.7
>       Werkzeug==0.11.11
>       widgetsnbextension==1.2.6
>       win-unicode-console==0.5
>       wrapt==1.10.6
>       xlrd==1.0.0
>       XlsxWriter==0.9.3
>       xlwings==0.10.0
>       xlwt==1.1.2
> I was wondering if someone could shed light why pyarrow would not work on a 
> certain machine ?
> Thanks,
> Adu



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to