[
https://issues.apache.org/jira/browse/PARQUET-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749417#comment-15749417
]
Wes McKinney commented on PARQUET-799:
--------------------------------------
You should be able to read columns in parallel if your input data source is
threadsafe. The {{ParquetFileReader::OpenFile}} is a convenience API that uses
a sample implementation of {{parquet::RandomAccessSource}},
{{LocalFileSource}}. Among other issues, {{LocalFileSource}} using {{FILE}}
which isn't safe to use cross-platform.
You are welcome to submit a patch to make {{LocalFileSource}} threadsafe, or
you can implement your own {{RandomAccessSource}}. You may also look at using
the IO classes from
https://github.com/apache/arrow/tree/master/cpp/src/arrow/io (these also need
some threadsafety work), which I'm more interested in maintaining for
production use than the ones in Parquet.
Improvements to the code and its documentation would be most welcome.
> concurrent usage of the file reader API
> ---------------------------------------
>
> Key: PARQUET-799
> URL: https://issues.apache.org/jira/browse/PARQUET-799
> Project: Parquet
> Issue Type: Bug
> Components: parquet-cpp
> Reporter: William Forson
>
> I've recently been debugging a segfault that occurs when concurrently reading
> (distinct) parquet files from multiple threads.
> I initially assumed this was a reasonable thing to do, since the project
> README doesn't say anything about concurrency one way or the other. But then
> I encountered [this TODO
> comment|https://github.com/apache/parquet-cpp/blob/master/src/parquet/column/page.h#L35]:
> {quote}
> // TODO: Parallel processing is not yet safe because of memory-ownership
> // semantics (the PageReader may or may not own the memory referenced by a
> // page)
> {quote}
> And it has got me wondering: is parquet-cpp fundamentally NOT thread-safe,
> even for the use case of reading a single file per thread at any given time?
> Or is it basically thread-safe with a couple gotchas?
> Also, jfyi, I'm currently running against a build which incorporates [this
> change|https://github.com/apache/parquet-cpp/commit/002466539f6aba7bf1f885b66f61f302ed88fa6b].
> (aside: my motivation for recently posting an issue re. {{THRIFT_HOME}} was
> to rule out any ABI weirdness that might result from building parquet-cpp
> against a different version of thrift than the applications that ultimately
> consume parquet-cpp)
> Thanks!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)