[ https://issues.apache.org/jira/browse/ARROW-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352023#comment-16352023 ]
Jingyuan Wang commented on ARROW-2059: -------------------------------------- Here is what I've done. I simple repeated the 1M rows and created a 30M and 100M testing csv files and try to repeat the process of reading from csv, writing as feather and reading from feather and time each part. I also repeated the measurement 10 times for the four combination of (python-2.7, python-3.6) x (feather-format-0.3.1, feather-format-0.4.0). Processing 100M rows files all failed on my laptop (16GB memory) except for the version of python2.7 and feather-format-0.3.1. The measurement of 1M rows is as following: ||python version||feather version|| # rows||write feather||read feather|| |2.7|0.3.1|1M|0.06216781139|0.05903599262| |2.7|0.4.0|1M|0.1335380793|0.04576666355| |3.6|0.3.1|1M|0.07768514156|0.09041910172| |3.6|0.4.0|1M|0.08690385818|0.05801310539| The measuremnt of 30M rows is as following: ||python version||feather version|| # rows||write feather||read feather|| |2.7|0.3.1|30M|1.747310066|2.35606482| |2.7|0.4.0|30M|3.5653723|1.934461188| |3.6|0.3.1|30M|2.407458949|2.811572456| |3.6|0.4.0|30M|2.925034189|1.852504301| >From both tables, performance of writing to feather did degrade from 0.3.1 to >0.4.0 with python2 being more dramatically. Reading feather files were >actually faster with the newer feather version. One other thing, I noticed that feather-format-0.3.1 does not even depend on Arrow. So the performance difference is more than the Arrow's version upgrade. And I do think we need some thorough benchmarks for Arrow or do we already have them? > [Python] Possible performance regression in Feather read/write path > ------------------------------------------------------------------- > > Key: ARROW-2059 > URL: https://issues.apache.org/jira/browse/ARROW-2059 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Reporter: Wes McKinney > Assignee: Jingyuan Wang > Priority: Major > Fix For: 0.9.0 > > > See discussion in https://github.com/wesm/feather/issues/329. Needs to be > investigated -- This message was sent by Atlassian JIRA (v7.6.3#76005)