RE: HadoopDataOutputStream maybe does not expose enough methods of org.apache.hadoop.fs.FSDataOutputStream

2015-08-27 Thread LINZ, Arnaud
Hi,

Ok, I’ve created  FLINK-2580 to track this issue (and FLINK-2579, which is 
totally unrelated).

I think I’m going to set up my dev environment to start contributing a little 
more than just complaining ☺.

Best regards,
Arnaud

De : ewenstep...@gmail.com [mailto:ewenstep...@gmail.com] De la part de Stephan 
Ewen
Envoyé : mercredi 26 août 2015 20:12
À : user@flink.apache.org
Objet : Re: HadoopDataOutputStream maybe does not expose enough methods of 
org.apache.hadoop.fs.FSDataOutputStream

I think that is a very good idea.

Originally, we wrapped the Hadoop FS classes for convenience (they were 
changing, we wanted to keep the system independent of Hadoop), but these are no 
longer relevant reasons, in my opinion.

Let's start with your proposal and see if we can actually get rid of the 
wrapping in a way that is friendly to existing users.

Would you open an issue for this?

Greetings,
Stephan


On Wed, Aug 26, 2015 at 6:23 PM, LINZ, Arnaud 
al...@bouyguestelecom.frmailto:al...@bouyguestelecom.fr wrote:
Hi,

I’ve noticed that when you use org.apache.flink.core.fs.FileSystem to write 
into a hdfs file, calling 
org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.create(), it returns a  
HadoopDataOutputStream that wraps a org.apache.hadoop.fs.FSDataOutputStream 
(under its org.apache.hadoop.hdfs.client .HdfsDataOutputStream wrappper).

However, FSDataOutputStream exposes many methods like flush,   getPos etc, but 
HadoopDataOutputStream only wraps write  close.

For instance, flush() calls the default, empty implementation of OutputStream 
instead of the hadoop one, and that’s confusing. Moreover, because of the 
restrictive OutputStream interface, hsync() and hflush() are not exposed to 
Flink ; maybe having a getWrappedStream() would be convenient.

(For now, that prevents me from using Flink FileSystem object, I directly use 
hadoop’s one).

Regards,
Arnaud







L'intégrité de ce message n'étant pas assurée sur internet, la société 
expéditrice ne peut être tenue responsable de son contenu ni de ses pièces 
jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous 
n'êtes pas destinataire de ce message, merci de le détruire et d'avertir 
l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company 
that sent this message cannot therefore be held liable for its content nor 
attachments. Any unauthorized use or dissemination is prohibited. If you are 
not the intended recipient of this message, then please delete it and notify 
the sender.



Re: HadoopDataOutputStream maybe does not expose enough methods of org.apache.hadoop.fs.FSDataOutputStream

2015-08-27 Thread Ufuk Celebi

 On 27 Aug 2015, at 09:33, LINZ, Arnaud al...@bouyguestelecom.fr wrote:
 
 Hi,
  
 Ok, I’ve created  FLINK-2580 to track this issue (and FLINK-2579, which is 
 totally unrelated).

Thanks :)

 I think I’m going to set up my dev environment to start contributing a little 
 more than just complaining J.

If you need any help with the setup, let us know. There is also this guide: 
https://ci.apache.org/projects/flink/flink-docs-master/internals/ide_setup.html

– Ufuk



Re: HadoopDataOutputStream maybe does not expose enough methods of org.apache.hadoop.fs.FSDataOutputStream

2015-08-26 Thread Stephan Ewen
I think that is a very good idea.

Originally, we wrapped the Hadoop FS classes for convenience (they were
changing, we wanted to keep the system independent of Hadoop), but these
are no longer relevant reasons, in my opinion.

Let's start with your proposal and see if we can actually get rid of the
wrapping in a way that is friendly to existing users.

Would you open an issue for this?

Greetings,
Stephan


On Wed, Aug 26, 2015 at 6:23 PM, LINZ, Arnaud al...@bouyguestelecom.fr
wrote:

 Hi,



 I’ve noticed that when you use org.apache.flink.core.fs.FileSystem to
 write into a hdfs file, calling
 org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.create(), it returns a
  HadoopDataOutputStream that wraps a
 org.apache.hadoop.fs.FSDataOutputStream (under its
 org.apache.hadoop.hdfs.client .HdfsDataOutputStream wrappper).



 However, FSDataOutputStream exposes many methods like flush,   getPos etc,
 but HadoopDataOutputStream only wraps write  close.



 For instance, flush() calls the default, empty implementation of
 OutputStream instead of the hadoop one, and that’s confusing. Moreover,
 because of the restrictive OutputStream interface, hsync() and hflush() are
 not exposed to Flink ; maybe having a getWrappedStream() would be
 convenient.



 (For now, that prevents me from using Flink FileSystem object, I directly
 use hadoop’s one).



 Regards,

 Arnaud









 --

 L'intégrité de ce message n'étant pas assurée sur internet, la société
 expéditrice ne peut être tenue responsable de son contenu ni de ses pièces
 jointes. Toute utilisation ou diffusion non autorisée est interdite. Si
 vous n'êtes pas destinataire de ce message, merci de le détruire et
 d'avertir l'expéditeur.

 The integrity of this message cannot be guaranteed on the Internet. The
 company that sent this message cannot therefore be held liable for its
 content nor attachments. Any unauthorized use or dissemination is
 prohibited. If you are not the intended recipient of this message, then
 please delete it and notify the sender.



HadoopDataOutputStream maybe does not expose enough methods of org.apache.hadoop.fs.FSDataOutputStream

2015-08-26 Thread LINZ, Arnaud
Hi,

I’ve noticed that when you use org.apache.flink.core.fs.FileSystem to write 
into a hdfs file, calling 
org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.create(), it returns a  
HadoopDataOutputStream that wraps a org.apache.hadoop.fs.FSDataOutputStream 
(under its org.apache.hadoop.hdfs.client .HdfsDataOutputStream wrappper).

However, FSDataOutputStream exposes many methods like flush,   getPos etc, but 
HadoopDataOutputStream only wraps write  close.

For instance, flush() calls the default, empty implementation of OutputStream 
instead of the hadoop one, and that’s confusing. Moreover, because of the 
restrictive OutputStream interface, hsync() and hflush() are not exposed to 
Flink ; maybe having a getWrappedStream() would be convenient.

(For now, that prevents me from using Flink FileSystem object, I directly use 
hadoop’s one).

Regards,
Arnaud







L'intégrité de ce message n'étant pas assurée sur internet, la société 
expéditrice ne peut être tenue responsable de son contenu ni de ses pièces 
jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous 
n'êtes pas destinataire de ce message, merci de le détruire et d'avertir 
l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company 
that sent this message cannot therefore be held liable for its content nor 
attachments. Any unauthorized use or dissemination is prohibited. If you are 
not the intended recipient of this message, then please delete it and notify 
the sender.