[ 
https://issues.apache.org/jira/browse/ARROW-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16695361#comment-16695361
 ] 

Andy Reagan commented on ARROW-2654:
------------------------------------

šŸ‘Œ



Sent from my Samsung Galaxy smartphone.


-------- Original message --------
From: "Wes McKinney (JIRA)" <j...@apache.org>
Date: 11/21/18 12:46 PM (GMT-05:00)
To: "Reagan, Andrew" <area...@massmutual.com>
Subject: [EXTERNAL][jira] [Closed] (ARROW-2654) [Python] Error with errno 22 
when loading 3.6 GB Parquet file


     [ 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ARROW-2D2654-3Fpage-3Dcom.atlassian.jira.plugin.system.issuetabpanels-3Aall-2Dtabpanel&d=DwIFaQ&c=BX7Y4KpGhcDnIsrgHKqkbfoiiDvjhxwuYUpcrPD7xrE&r=a3UJQ6cVzThaj2bmSG-EKQHCpiQSdhnF6mf5fZPUK8Y&m=w_LcK_-uCZKvZ2zgobYJW5GfFcksTlVY0r8YUPcZnDM&s=0yFlBigppaVMfLgXwrIzmnIsCEK2cnpyQ139gtCKTZs&e=
 ]

Wes McKinney closed ARROW-2654.
-------------------------------
    Resolution: Duplicate
      Assignee: Wes McKinney

Closing as duplicate of ARROW-3762. That issue contains a repro so we'll use 
that one for fixing the issue




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

This e-mail transmission may contain information that is proprietary, 
privileged and/or confidential and is intended exclusively for the person(s) to 
whom it is addressed. Any use, copying, retention or disclosure by any person 
other than the intended recipient or the intended recipient's designees is 
strictly prohibited. If you are not the intended recipient or their designee, 
please notify the sender immediately by return e-mail and delete all copies


> [Python] Error with errno 22 when loading 3.6 GB Parquet file
> -------------------------------------------------------------
>
>                 Key: ARROW-2654
>                 URL: https://issues.apache.org/jira/browse/ARROW-2654
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.9.0
>            Reporter: Andy Reagan
>            Assignee: Wes McKinney
>            Priority: Major
>              Labels: parquet
>             Fix For: 0.12.0
>
>
> I saved a file using pandas to_parquet method, but can't read it back in. 
> Here's the full stack trace:
> Ā 
> {code:java}
> Traceback (most recent call last):
> File "src/data/CLXP_pull.py", line 214, in <module>
>  main()
>  File 
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
>  line 722, in _call_
>  return self.main(*args, **kwargs)
>  File 
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
>  line 697, in main
>  rv = self.invoke(ctx)
>  File 
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
>  line 895, in invoke
>  return ctx.invoke(self.callback, **ctx.params)
>  File 
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
>  line 535, in invoke
>  return callback(*args, **kwargs)
>  File "src/data/CLXP_pull.py", line 188, in main
>  results[fullname] = pd.read_parquet(os.path.join(project_dir, "data", "raw", 
> fullname+".parquet"), engine="pyarrow")
>  File 
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pandas/io/parquet.py",
>  line 257, in read_parquet
>  return impl.read(path, columns=columns, **kwargs)
>  File 
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pandas/io/parquet.py",
>  line 130, in read
>  **kwargs).to_pandas()
>  File 
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pyarrow/parquet.py",
>  line 939, in read_table
>  pf = ParquetFile(source, metadata=metadata)
>  File 
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pyarrow/parquet.py",
>  line 64, in _init_
>  self.reader.open(source, metadata=metadata)
>  File "_parquet.pyx", line 651, in pyarrow._parquet.ParquetReader.open
>  File "error.pxi", line 79, in pyarrow.lib.check_status
>  pyarrow.lib.ArrowIOError: Arrow error: IOError: [Errno 22] Invalid argument
> {code}
> Any ideas what could cause this? The file itself is 3.6GB.
> I'm runningĀ pandas==0.22.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to