William Forson created PARQUET-799:
--------------------------------------
Summary: concurrently usage of the file reader API
Key: PARQUET-799
URL: https://issues.apache.org/jira/browse/PARQUET-799
Project: Parquet
Issue Type: Bug
Components: parquet-cpp
Reporter: William Forson
I've recently been debugging a segfault that occurs when concurrently reading
(distinct) parquet files from multiple threads.
I initially assumed this was a reasonable thing to do, since the project README
doesn't say anything about concurrency one way or the other. But then I
encountered [this TODO
comment|https://github.com/apache/parquet-cpp/blob/master/src/parquet/column/page.h#L35]:
{quote}
// TODO: Parallel processing is not yet safe because of memory-ownership
// semantics (the PageReader may or may not own the memory referenced by a
// page)
{quote}
And it has got me wondering: is parquet-cpp fundamentally NOT thread-safe, even
for the use case of reading a single file per thread at any given time? Or is
it basically thread-safe with a couple gotchas?
Also, jfyi, I'm currently running against a build which incorporates [this
change|https://github.com/apache/parquet-cpp/commit/002466539f6aba7bf1f885b66f61f302ed88fa6b].
(aside: my motivation for recently posting an issue re. {{THRIFT_HOME}} was to
rule out any ABI weirdness that might result from building parquet-cpp against
a different version of thrift than the applications that ultimately consume
parquet-cpp)
Thanks!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)