
Aditi Breed edited comment on ARROW-1247 at 7/25/17 4:49 PM:

Hello Wes,
                 Thank you for your response. Apologies for the late response. 
I was trying out a few things, before I replied to you.

Unfortunately I cannot share the dataset with you, since it is considered 
secret data in our company.

The first install was a conda install and not working. The day  i reported 
this, I uninstalled the conda install, and did a pip install of pyarrow.

Just a quick order of the flow of code:
1) get data
2) remove nans ( I double checked this by saving , re-reading and checking on 
the dataset).
3) Aggregate to get Means - Save as parquet with pyarrow ( works fine )
4) Aggregate to get StdDev - Save as parquet with pyarrow ( Errors out ) 

Error out at the same line which works in Step 3 above.
print('Begin Saving File')
pq.write_table(arrowTable, filePath)
print('Done Saving File')
I also felt like I was unable to view the actual error, so I ran the code in a 
diff IDE. below is the error I see.

Unhandled Exception: System.ArgumentException: Missing parameter does not have 
a default value.
Parameter name: parameters
   at System.Reflection.MethodBase.CheckArguments(Object[] parameters, Binder 
binder, BindingFlags invokeAttr, CultureInfo culture, Signature sig)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags 
invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean 
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags 
invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)

Am i missing something ? 

Let me know.


was (Author: p ved):
Hello Wes,
                 Thank you for your response. Apologies for the late response. 
I was trying out a few things, before I replied to you.

Unfortunately I cannot share the dataset with you, since it is considered 
secret data in our company.

The first install was a conda install and not working. The day  i reported 
this, I uninstalled the conda install, and did a pip install of pyarrow.

Just a quick order of the flow of code:
1) get data
2) remove nans ( I double checked this by saving , re-reading and checking on 
the dataset).
3) Aggregate to get Means - Save as parquet with pyarrow ( works fine )
4) Aggregate to get StdDev - Save as parquet with pyarrow ( Errors out ) 

Error out at the same line which works in Step 3 above.
print('Begin Saving File')
pq.write_table(arrowTable, filePath)
print('Done Saving File')
I also felt like I was unable to view the actual error, so I ran the code in a 
diff IDE.

Unhandled Exception: System.ArgumentException: Missing parameter does not have 
a default value.
Parameter name: parameters
   at System.Reflection.MethodBase.CheckArguments(Object[] parameters, Binder 
binder, BindingFlags invokeAttr, CultureInfo culture, Signature sig)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags 
invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean 
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags 
invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)

Am i missing something ? 

Let me know.


> pyarrow causes python to crash errors on parquet.dll
> ----------------------------------------------------
>                 Key: ARROW-1247
>                 URL: https://issues.apache.org/jira/browse/ARROW-1247
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.4.1
>         Environment: Python Version:
> 3.5.2 |Anaconda custom (64-bit)| (default, Jul  5 2016, 11:41:13) [MSC v.1900 
> 64 bit (AMD64)]
> Windows Edition: Windows Server 2012 R2
>            Reporter: Aditi Breed
> Hello,
>       I have a script which fetches data, and stores the data in Pandas 
> dataframe.
> I make 3 aggregations of data, MEAN/STDEV/MAX, each of which are converted to 
> an arrow table and saved on the disk as a parquet file.
> This code works just fine for 100-500 records, but errors out for bigger 
> volume. I also know this code works because another developer is using the 
> same code on a mirrored machine ( in terms of hardware ) and it works.
> The order of the dataset I am trying to save is millions.
> The code errors out @ line    pq.write_table(arrowTable, filePath).
> Here is the code:
>     arrowTable = pa.Table.from_pandas(self.grpByMeanDS2)
>       begintime = datetime.now()
>       begintime_str = begintime.strftime("%Y%m%d%I%M%S")              
>       filePath = SaveFileLoc + "\\Raw\\" + agg + "Data" + begintime_str + 
> ".parq"
>       print('Begin Saving File')
>       pq.write_table(arrowTable, filePath)
>       print('Done Saving File')
>       print('Appending FilePath to List')
>       self.listspDF.append(filePath)
>       print('Done Appending FilePath to List')
> Python crashes and throws a "python has to close error".
> Following is the detailed error:
> ------------------
> Problem Event Name:                        APPCRASH
>   Application Name:                           python.exe
>   Application Version:                        3.5.2150.1013
>   Application Timestamp:                  577be340
>   Fault Module Name:                        parquet.dll
>   Fault Module Version:           
>   Fault Module Timestamp:               59403662
>   Exception Code:                               c0000005
>   Exception Offset:                              000000000005f990
>   OS Version:                                       6.3.9600.
>   Locale ID:                                          1033
> Read our privacy statement online:
>   http://go.microsoft.com/fwlink/?linkid=280262
> If the online privacy statement is not available, please read our privacy 
> statement offline:
>   C:\Windows\system32\en-US\erofflps.txt
> --------------------------------------------
> I have tried updating Python and pyarrow, with no luck.
> Following is the version of python:
>     import sys
>     print (sys.version)
>     3.5.2 |Anaconda custom (64-bit)| (default, Jul  5 2016, 11:41:13) [MSC 
> v.1900 64 bit (AMD64)]
> Following are results of pip freeze:
>       alabaster==0.7.9
>       anaconda-clean==1.0
>       anaconda-client==1.5.1
>       anaconda-navigator==1.3.1
>       argcomplete==1.0.0
>       astroid==1.4.7
>       astropy==2.0
>       Babel==2.3.4
>       backports.shutil-get-terminal-size==1.0.0
>       beautifulsoup4==4.5.1
>       bitarray==0.8.1
>       blaze==0.10.1
>       bokeh==0.12.2
>       boto==2.42.0
>       Bottleneck==1.2.1
>       cffi==1.7.0
>       chest==0.2.3
>       click==6.6
>       cloudpickle==0.2.1
>       clyent==1.2.2
>       colorama==0.3.7
>       comtypes==1.1.2
>       conda==4.3.22
>       conda-build==2.0.2
>       configobj==5.0.6
>       contextlib2==0.5.3
>       cryptography==1.5
>       cycler==0.10.0
>       Cython==0.24.1
>       cytoolz==0.8.0
>       dask==0.11.0
>       datashape==0.5.2
>       decorator==4.0.10
>       dill==0.2.5
>       docutils==0.12
>       dynd===c328ab7
>       et-xmlfile==1.0.1
>       fastcache==1.0.2
>       filelock==2.0.6
>       Flask==0.11.1
>       Flask-Cors==2.1.2
>       gevent==1.1.2
>       greenlet==0.4.10
>       h5py==2.7.0
>       HeapDict==1.0.0
>       idna==2.1
>       imageio==2.2.0
>       imagesize==0.7.1
>       ipykernel==4.5.0
>       ipython==5.1.0
>       ipython-genutils==0.1.0
>       ipywidgets==5.2.2
>       itsdangerous==0.24
>       jdcal==1.2
>       jedi==0.9.0
>       Jinja2==2.8
>       jsonschema==2.5.1
>       jupyter==1.0.0
>       jupyter-client==4.4.0
>       jupyter-console==5.0.0
>       jupyter-core==4.2.0
>       lazy-object-proxy==1.2.1
>       llvmlite==0.19.0
>       locket==0.2.0
>       lxml==3.6.4
>       MarkupSafe==0.23
>       matplotlib==2.0.2
>       menuinst==1.4.1
>       mistune==0.7.3
>       mpmath==0.19
>       multipledispatch==0.4.8
>       nb-anacondacloud==1.2.0
>       nb-conda==2.0.0
>       nb-conda-kernels==2.0.0
>       nbconvert==4.2.0
>       nbformat==4.1.0
>       nbpresent==3.0.2
>       networkx==1.11
>       nltk==3.2.1
>       nose==1.3.7
>       notebook==4.2.3
>       numba==0.34.0
>       numexpr==2.6.2
>       numpy==1.13.1
>       odo==0.5.0
>       openpyxl==2.3.2
>       pandas==0.20.2
>       partd==0.3.6
>       path.py==0.0.0
>       pathlib2==2.1.0
>       patsy==0.4.1
>       pep8==1.7.0
>       pickleshare==0.7.4
>       Pillow==3.3.1
>       pkginfo==1.3.2
>       ply==3.9
>       prompt-toolkit==1.0.3
>       psutil==4.3.1
>       py==1.4.31
>       py4j==0.10.4
>       pyarrow==0.4.1
>       pyasn1==0.1.9
>       pycosat==0.6.1
>       pycparser==2.14
>       pycrypto==2.6.1
>       pycurl==7.43.0
>       pyflakes==1.3.0
>       Pygments==2.1.3
>       pyidealdata==0.7.0
>       pylint==1.5.4
>       pyodbc==4.0.17
>       pyOpenSSL==16.2.0
>       pyparsing==2.1.4
>       pyspark==2.1.0+hadoop2.7
>       pytest==2.9.2
>       python-dateutil==2.5.3
>       pytz==2016.6.1
>       PyUber==1.4.4
>       PyWavelets==0.5.2
>       pywin32==220
>       PyYAML==3.12
>       pyzmq==15.4.0
>       QtAwesome==0.3.3
>       qtconsole==4.2.1
>       QtPy==1.1.2
>       requests==2.14.2
>       rope-py3k==0.9.4.post1
>       ruamel-yaml===-VERSION
>       scikit-image==0.13.0
>       scikit-learn==0.18.2
>       scipy==0.19.1
>       simplegeneric==0.8.1
>       singledispatch==
>       six==1.10.0
>       snowballstemmer==1.2.1
>       sockjs-tornado==1.0.3
>       sphinx==1.4.6
>       spyder==3.0.0
>       SQLAlchemy==1.0.13
>       statsmodels==0.8.0
>       sympy==1.0
>       tables==3.2.2
>       toolz==0.8.0
>       tornado==4.4.1
>       traitlets==4.3.0
>       unicodecsv==0.14.1
>       wcwidth==0.1.7
>       Werkzeug==0.11.11
>       widgetsnbextension==1.2.6
>       win-unicode-console==0.5
>       wrapt==1.10.6
>       xlrd==1.0.0
>       XlsxWriter==0.9.3
>       xlwings==0.10.0
>       xlwt==1.1.2
> I was wondering if someone could shed light why pyarrow would not work on a 
> certain machine ?
> Thanks,
> Adu

This message was sent by Atlassian JIRA

Reply via email to