Robert Nishihara created ARROW-1410:
---------------------------------------

             Summary: Plasma object store occasionally pauses for a long time
                 Key: ARROW-1410
                 URL: https://issues.apache.org/jira/browse/ARROW-1410
             Project: Apache Arrow
          Issue Type: Improvement
         Environment: Ubuntu 16.04
            Reporter: Robert Nishihara


The problem can be reproduced as follows. First start a plasma store with

{code}
plasma_store -s /tmp/s1 -m 500000000000
{code}

Then continuously put in objects using a script like the following.

{code}
import pyarrow.plasma as plasma
import numpy as np

client = plasma.connect('/tmp/s1', '', 0)

for i in range(20000):
    print(i)
    object_id = plasma.ObjectID(np.random.bytes(20))
    client.create(object_id, np.random.randint(0, 100000000))
    client.seal(object_id)
{code}

As the loop counters are being printed, you will see long pauses. The problem 
is the fact that we are mmapping pages with the MAP_POPULATE flag. Though this 
can be used to improve performance of subsequent object creations, it isn't 
worth the long pauses. We may want to find a way to populate the pages in the 
background.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to