RE: HadoopDataOutputStream maybe does not expose enough methods of org.apache.hadoop.fs.FSDataOutputStream
Hi, Ok, I’ve created FLINK-2580 to track this issue (and FLINK-2579, which is totally unrelated). I think I’m going to set up my dev environment to start contributing a little more than just complaining ☺. Best regards, Arnaud De : ewenstep...@gmail.com [mailto:ewenstep...@gmail.com] De la part de Stephan Ewen Envoyé : mercredi 26 août 2015 20:12 À : user@flink.apache.org Objet : Re: HadoopDataOutputStream maybe does not expose enough methods of org.apache.hadoop.fs.FSDataOutputStream I think that is a very good idea. Originally, we wrapped the Hadoop FS classes for convenience (they were changing, we wanted to keep the system independent of Hadoop), but these are no longer relevant reasons, in my opinion. Let's start with your proposal and see if we can actually get rid of the wrapping in a way that is friendly to existing users. Would you open an issue for this? Greetings, Stephan On Wed, Aug 26, 2015 at 6:23 PM, LINZ, Arnaud al...@bouyguestelecom.frmailto:al...@bouyguestelecom.fr wrote: Hi, I’ve noticed that when you use org.apache.flink.core.fs.FileSystem to write into a hdfs file, calling org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.create(), it returns a HadoopDataOutputStream that wraps a org.apache.hadoop.fs.FSDataOutputStream (under its org.apache.hadoop.hdfs.client .HdfsDataOutputStream wrappper). However, FSDataOutputStream exposes many methods like flush, getPos etc, but HadoopDataOutputStream only wraps write close. For instance, flush() calls the default, empty implementation of OutputStream instead of the hadoop one, and that’s confusing. Moreover, because of the restrictive OutputStream interface, hsync() and hflush() are not exposed to Flink ; maybe having a getWrappedStream() would be convenient. (For now, that prevents me from using Flink FileSystem object, I directly use hadoop’s one). Regards, Arnaud L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur. The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.
Re: HadoopDataOutputStream maybe does not expose enough methods of org.apache.hadoop.fs.FSDataOutputStream
On 27 Aug 2015, at 09:33, LINZ, Arnaud al...@bouyguestelecom.fr wrote: Hi, Ok, I’ve created FLINK-2580 to track this issue (and FLINK-2579, which is totally unrelated). Thanks :) I think I’m going to set up my dev environment to start contributing a little more than just complaining J. If you need any help with the setup, let us know. There is also this guide: https://ci.apache.org/projects/flink/flink-docs-master/internals/ide_setup.html – Ufuk
Re: HadoopDataOutputStream maybe does not expose enough methods of org.apache.hadoop.fs.FSDataOutputStream
I think that is a very good idea. Originally, we wrapped the Hadoop FS classes for convenience (they were changing, we wanted to keep the system independent of Hadoop), but these are no longer relevant reasons, in my opinion. Let's start with your proposal and see if we can actually get rid of the wrapping in a way that is friendly to existing users. Would you open an issue for this? Greetings, Stephan On Wed, Aug 26, 2015 at 6:23 PM, LINZ, Arnaud al...@bouyguestelecom.fr wrote: Hi, I’ve noticed that when you use org.apache.flink.core.fs.FileSystem to write into a hdfs file, calling org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.create(), it returns a HadoopDataOutputStream that wraps a org.apache.hadoop.fs.FSDataOutputStream (under its org.apache.hadoop.hdfs.client .HdfsDataOutputStream wrappper). However, FSDataOutputStream exposes many methods like flush, getPos etc, but HadoopDataOutputStream only wraps write close. For instance, flush() calls the default, empty implementation of OutputStream instead of the hadoop one, and that’s confusing. Moreover, because of the restrictive OutputStream interface, hsync() and hflush() are not exposed to Flink ; maybe having a getWrappedStream() would be convenient. (For now, that prevents me from using Flink FileSystem object, I directly use hadoop’s one). Regards, Arnaud -- L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur. The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.
HadoopDataOutputStream maybe does not expose enough methods of org.apache.hadoop.fs.FSDataOutputStream
Hi, I’ve noticed that when you use org.apache.flink.core.fs.FileSystem to write into a hdfs file, calling org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.create(), it returns a HadoopDataOutputStream that wraps a org.apache.hadoop.fs.FSDataOutputStream (under its org.apache.hadoop.hdfs.client .HdfsDataOutputStream wrappper). However, FSDataOutputStream exposes many methods like flush, getPos etc, but HadoopDataOutputStream only wraps write close. For instance, flush() calls the default, empty implementation of OutputStream instead of the hadoop one, and that’s confusing. Moreover, because of the restrictive OutputStream interface, hsync() and hflush() are not exposed to Flink ; maybe having a getWrappedStream() would be convenient. (For now, that prevents me from using Flink FileSystem object, I directly use hadoop’s one). Regards, Arnaud L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur. The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.