[ 
https://issues.apache.org/jira/browse/PARQUET-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749289#comment-15749289
 ] 

William Forson commented on PARQUET-799:
----------------------------------------

Could you clarify the sense in which the "IO sublayer" is not threadsafe? 

More specifically, I'm interested in the thread safety of the 
{{ParquetFileReader}} class (which uses {{LocalFileSource}}). I would assume 
that a given _instance_ of this class is non-threadsafe (owing to a combination 
of common sense and the fact that {{ParquetFileReader::OpenFile}} returns a 
unique pointer). However, I would NOT assume that there is anything wrong with 
invoking {{ParquetFileReader::OpenFile}} concurrently, or using distinct 
{{ParquetFileReader}} instances concurrently. Are my assumptions wrong?

Finally, I'm curious as to why you refer to {{LocalFileSource}} as a "sample" 
implementation. Do you mean to say that the certain parts of the codebase which 
are not explicitly labeled as "test", "example", etc are specifically not 
intended for usage in production? (and if so, is the delineation between the 
production-ready and non-production-ready parts of the codebase stated clearly 
somewhere in the project source?)

Thanks!

> concurrent usage of the file reader API
> ---------------------------------------
>
>                 Key: PARQUET-799
>                 URL: https://issues.apache.org/jira/browse/PARQUET-799
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-cpp
>            Reporter: William Forson
>
> I've recently been debugging a segfault that occurs when concurrently reading 
> (distinct) parquet files from multiple threads.
> I initially assumed this was a reasonable thing to do, since the project 
> README doesn't say anything about concurrency one way or the other. But then 
> I encountered [this TODO 
> comment|https://github.com/apache/parquet-cpp/blob/master/src/parquet/column/page.h#L35]:
> {quote}
> // TODO: Parallel processing is not yet safe because of memory-ownership
> // semantics (the PageReader may or may not own the memory referenced by a
> // page)
> {quote}
> And it has got me wondering: is parquet-cpp fundamentally NOT thread-safe, 
> even for the use case of reading a single file per thread at any given time? 
> Or is it basically thread-safe with a couple gotchas?
> Also, jfyi, I'm currently running against a build which incorporates [this 
> change|https://github.com/apache/parquet-cpp/commit/002466539f6aba7bf1f885b66f61f302ed88fa6b].
> (aside: my motivation for recently posting an issue re. {{THRIFT_HOME}} was 
> to rule out any ABI weirdness that might result from building parquet-cpp 
> against a different version of thrift than the applications that ultimately 
> consume parquet-cpp)
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to