Hi,
I have a question regarding the initialization/finalization of the S3
filesystem within the Arrow filesystem library. Apologies if this question has
been raised in the past; I did perform a search but that search didn't turn up
anything. I did read the thread that discussed the issue of init/finalize,
though nothing I found made it clear when the addition of the finalize method
surfaced. I thought I read mention that it occurred around version 12.0.0, but
not certain. That's just a side note really, I am curious to know when it came
about, because we had been using an old version of the libraries (8.0.0) and it
didn't exist within that version. But I digress.
So my issue and the question I have surrounds this notion of timing. The
aforementioned thread that I read made it clear that the init/finalize should
take place at the beginning and the end of main():
// Snipped for brevity reasons
int main()
{
// More snipping
arrow::Status initializeStatus = arrow::fs::InitializeS3( globalOptions );
...
arrow::Status finalizeStatus = arrow::fs::FinalizeS3();
} /* end of your main() entry point*/
The thread also made it clear that this bookended init/finalize should not
occur within a class definition, most likely in the constructor/destructor
respectively.
So OK. While I am not familiar with the reason that this structure became "a
thing" within the Arrow filesystem library, it is indeed that way now.
Admittedly, I would like to know why this is being done in this fashion, but
that is tangential to my issue. Now for my question: this is all fine and well
in the context of developing your own stand-alone program and such. However,
what happens when you live in an embedded world in which your code lies many
layers below main() and you don't have access to main(), even if you wanted to
follow the prescribed pattern? I mean, we are expected to wind up and then
down in an on-demand fashion, allocating and then freeing all resources
respectively. I pulled the init/finalize out to the outermost layer that I
have any involvement with, yet I see the following error messages:
2024-11-26T04:55:10,917 DEBUG [00000007] () App.parquet - Could not create a
AWS filesystem object
2024-11-26T04:55:10,917 DEBUG [00000007] () App.parquet - parquetFileReader):
Exception exit, reason = Unable to create a file system object on AWS server:
Invalid: S3 subsystem is finalized
This occurs because the first spool-up/spool-down worked successfully, but then
when we are called sometime thereafter, the finalize method has already done
its thing, thus we can't initialize again. Obviously, I know why this is
occurring, that is straightforward, I don't need an explanation for that. The
question is what can I do about this in my environment where no access to
main() is available and we must exist/not-exist on-demand? Surely I am not the
only one in this development scenario who has been faced with this issue. So
what is the solution here? Anyone else faced this? Help?
Thanks,
Jerry