William Forson created PARQUET-799:
--------------------------------------

             Summary: concurrently usage of the file reader API
                 Key: PARQUET-799
                 URL: https://issues.apache.org/jira/browse/PARQUET-799
             Project: Parquet
          Issue Type: Bug
          Components: parquet-cpp
            Reporter: William Forson


I've recently been debugging a segfault that occurs when concurrently reading 
(distinct) parquet files from multiple threads.

I initially assumed this was a reasonable thing to do, since the project README 
doesn't say anything about concurrency one way or the other. But then I 
encountered [this TODO 
comment|https://github.com/apache/parquet-cpp/blob/master/src/parquet/column/page.h#L35]:

{quote}
// TODO: Parallel processing is not yet safe because of memory-ownership
// semantics (the PageReader may or may not own the memory referenced by a
// page)
{quote}

And it has got me wondering: is parquet-cpp fundamentally NOT thread-safe, even 
for the use case of reading a single file per thread at any given time? Or is 
it basically thread-safe with a couple gotchas?

Also, jfyi, I'm currently running against a build which incorporates [this 
change|https://github.com/apache/parquet-cpp/commit/002466539f6aba7bf1f885b66f61f302ed88fa6b].

(aside: my motivation for recently posting an issue re. {{THRIFT_HOME}} was to 
rule out any ABI weirdness that might result from building parquet-cpp against 
a different version of thrift than the applications that ultimately consume 
parquet-cpp)

Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to