Robert Nishihara created ARROW-1410:
---------------------------------------
Summary: Plasma object store occasionally pauses for a long time
Key: ARROW-1410
URL: https://issues.apache.org/jira/browse/ARROW-1410
Project: Apache Arrow
Issue Type: Improvement
Environment: Ubuntu 16.04
Reporter: Robert Nishihara
The problem can be reproduced as follows. First start a plasma store with
{code}
plasma_store -s /tmp/s1 -m 500000000000
{code}
Then continuously put in objects using a script like the following.
{code}
import pyarrow.plasma as plasma
import numpy as np
client = plasma.connect('/tmp/s1', '', 0)
for i in range(20000):
print(i)
object_id = plasma.ObjectID(np.random.bytes(20))
client.create(object_id, np.random.randint(0, 100000000))
client.seal(object_id)
{code}
As the loop counters are being printed, you will see long pauses. The problem
is the fact that we are mmapping pages with the MAP_POPULATE flag. Though this
can be used to improve performance of subsequent object creations, it isn't
worth the long pauses. We may want to find a way to populate the pages in the
background.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)