Re: changes to compression interfaces in 0.15?

2008-02-21 Thread Arun C Murthy

Joydeep,

On Feb 20, 2008, at 5:06 PM, Joydeep Sen Sarma wrote:


Hi developers,

In migrating to 0.15 - i am noticing that the compression interfaces
have changed:

-  compression type for sequencefile outputs used to be set  
by:

SequenceFile.setCompressionType()

-  now it seems to be set using:
sequenceFileOutputFormat.setOutputCompressionType()




Yes, we added SequenceFileOutputFormat.setOutputCompressionType and  
deprecated the old api. (HADOOP-1851)




The change is for the better - but would it be possible to:

-  remove old/dead interfaces. That would have been a
straightforward hint for applications to look for new interfaces.
(hadoop-default.xml also still has setting for old conf variable:
io.seqfile.compression.type)



To maintain backward compat, we cannot remove old apis - the standard  
procedure is to deprecate them for the next release and remove them  
in subsequent releases.



-  if possible - document changed interfaces in the release
notes (there's no way we can find this out by looking at the long list
of Jiras).



Please look at the INCOMPATIBLE CHANGES section of CHANGES.txt,  
HADOOP-1851 is listed there. Admittedly we can do better, but that is  
a good place to look for when upgrading to newer releases.


i am not sure how updated the wiki is on the compression stuff (my
responsibility to update it) - but please do consider the impact of


Please use the forrest-based docs (on the hadoop website - e.g.  
mapred_tutorial.html) rather than the wiki as the gold-standard. The  
reason we moved away from the wiki is precisely this - harder to  
maintain docs per release etc.



changing interfaces on existing applications. (maybe we should have a
JIRA tag to mark out bugs that change interfaces).




Again, CHANGES.txt and INCOMPATIBLE CHANGES section for now.

Arun




As always - thanks for all the fish (err .. working code),



Joydeep







RE: changes to compression interfaces in 0.15?

2008-02-21 Thread Joydeep Sen Sarma
 To maintain backward compat, we cannot remove old apis - the standard 
 procedure is to deprecate them for the next release and remove them 
 in subsequent releases.

you've got to be kidding.

we didn't maintain backwards compatibility. my app broke. Simple and 
straightforward. and the old interfaces are not deprecated (to quote 0.15.3 on 
a 'deprecated' interface:

  /**   

   * Set the compression type for sequence files.   

   * @param job the configuration to modify 

   * @param val the new compression type (none, block, record)  

   */
  static public void setCompressionType(Configuration job,
CompressionType val) {
)

I (and i would suspect any average user willing to recompile code) would much 
much rather that we broke backwards compatibility immediately rather than 
maintain carry over defunct apis that insidiously break application behavior.

and of course - this does not address the point that the option strings 
themselves are depcreated. (remember - people set options explicitly from xml 
files and streaming. not everyone goes through java apis)).

--

as one of my dear professors once said - put ur self in the other person's 
shoe. consider that u were in my position and that a production app suddenly 
went from consuming 100G to 1TB. and everything slowed down drastically. and it 
did not give any sign that anything was amiss. everything looked golden on the 
ourside. what would be ur reaction if u find out after a week that the system 
was full and numerous processes had to be re-run? how would you have figured 
that was going to happen by looking at the INCOMPATIBLE section (which btw - i 
did carefully before sending my mail).

(fortunately i escaped the worst case - but i think this is a real call to 
action)


-Original Message-
From: Arun C Murthy [mailto:[EMAIL PROTECTED]
Sent: Thu 2/21/2008 11:21 AM
To: core-user@hadoop.apache.org
Subject: Re: changes to compression interfaces in 0.15?
 
Joydeep,

On Feb 20, 2008, at 5:06 PM, Joydeep Sen Sarma wrote:

 Hi developers,

 In migrating to 0.15 - i am noticing that the compression interfaces
 have changed:

 -  compression type for sequencefile outputs used to be set  
 by:
 SequenceFile.setCompressionType()

 -  now it seems to be set using:
 sequenceFileOutputFormat.setOutputCompressionType()



Yes, we added SequenceFileOutputFormat.setOutputCompressionType and  
deprecated the old api. (HADOOP-1851)


 The change is for the better - but would it be possible to:

 -  remove old/dead interfaces. That would have been a
 straightforward hint for applications to look for new interfaces.
 (hadoop-default.xml also still has setting for old conf variable:
 io.seqfile.compression.type)


To maintain backward compat, we cannot remove old apis - the standard  
procedure is to deprecate them for the next release and remove them  
in subsequent releases.

 -  if possible - document changed interfaces in the release
 notes (there's no way we can find this out by looking at the long list
 of Jiras).


Please look at the INCOMPATIBLE CHANGES section of CHANGES.txt,  
HADOOP-1851 is listed there. Admittedly we can do better, but that is  
a good place to look for when upgrading to newer releases.

 i am not sure how updated the wiki is on the compression stuff (my
 responsibility to update it) - but please do consider the impact of

Please use the forrest-based docs (on the hadoop website - e.g.  
mapred_tutorial.html) rather than the wiki as the gold-standard. The  
reason we moved away from the wiki is precisely this - harder to  
maintain docs per release etc.

 changing interfaces on existing applications. (maybe we should have a
 JIRA tag to mark out bugs that change interfaces).



Again, CHANGES.txt and INCOMPATIBLE CHANGES section for now.

Arun



 As always - thanks for all the fish (err .. working code),



 Joydeep







Re: changes to compression interfaces in 0.15?

2008-02-21 Thread Arun C Murthy


On Feb 21, 2008, at 12:20 PM, Joydeep Sen Sarma wrote:


To maintain backward compat, we cannot remove old apis - the standard
procedure is to deprecate them for the next release and remove them
in subsequent releases.


you've got to be kidding.

we didn't maintain backwards compatibility. my app broke. Simple  
and straightforward. and the old interfaces are not deprecated (to  
quote 0.15.3 on a 'deprecated' interface:




You are right, HADOOP-1851 didn't fix it right. I've filed HADOOP-2869.

We do need to be more diligent about listing config changes in  
CHANGES.txt for starters, and that point is taken. However, we can't  
start pulling out apis without deprecating them first.


Arun



  /**
   * Set the compression type for sequence files.
   * @param job the configuration to modify
   * @param val the new compression type (none, block, record)
   */
  static public void setCompressionType(Configuration job,
CompressionType val) {
)

I (and i would suspect any average user willing to recompile code)  
would much much rather that we broke backwards compatibility  
immediately rather than maintain carry over defunct apis that  
insidiously break application behavior.


and of course - this does not address the point that the option  
strings themselves are depcreated. (remember - people set options  
explicitly from xml files and streaming. not everyone goes through  
java apis)).


--

as one of my dear professors once said - put ur self in the other  
person's shoe. consider that u were in my position and that a  
production app suddenly went from consuming 100G to 1TB. and  
everything slowed down drastically. and it did not give any sign  
that anything was amiss. everything looked golden on the ourside.  
what would be ur reaction if u find out after a week that the  
system was full and numerous processes had to be re-run? how would  
you have figured that was going to happen by looking at the  
INCOMPATIBLE section (which btw - i did carefully before sending my  
mail).


(fortunately i escaped the worst case - but i think this is a real  
call to action)



-Original Message-
From: Arun C Murthy [mailto:[EMAIL PROTECTED]
Sent: Thu 2/21/2008 11:21 AM
To: core-user@hadoop.apache.org
Subject: Re: changes to compression interfaces in 0.15?

Joydeep,

On Feb 20, 2008, at 5:06 PM, Joydeep Sen Sarma wrote:


Hi developers,

In migrating to 0.15 - i am noticing that the compression interfaces
have changed:

-  compression type for sequencefile outputs used to be set
by:
SequenceFile.setCompressionType()

-  now it seems to be set using:
sequenceFileOutputFormat.setOutputCompressionType()




Yes, we added SequenceFileOutputFormat.setOutputCompressionType and
deprecated the old api. (HADOOP-1851)



The change is for the better - but would it be possible to:

-  remove old/dead interfaces. That would have been a
straightforward hint for applications to look for new interfaces.
(hadoop-default.xml also still has setting for old conf variable:
io.seqfile.compression.type)



To maintain backward compat, we cannot remove old apis - the standard
procedure is to deprecate them for the next release and remove them
in subsequent releases.


-  if possible - document changed interfaces in the release
notes (there's no way we can find this out by looking at the long  
list

of Jiras).



Please look at the INCOMPATIBLE CHANGES section of CHANGES.txt,
HADOOP-1851 is listed there. Admittedly we can do better, but that is
a good place to look for when upgrading to newer releases.


i am not sure how updated the wiki is on the compression stuff (my
responsibility to update it) - but please do consider the impact of


Please use the forrest-based docs (on the hadoop website - e.g.
mapred_tutorial.html) rather than the wiki as the gold-standard. The
reason we moved away from the wiki is precisely this - harder to
maintain docs per release etc.


changing interfaces on existing applications. (maybe we should have a
JIRA tag to mark out bugs that change interfaces).




Again, CHANGES.txt and INCOMPATIBLE CHANGES section for now.

Arun




As always - thanks for all the fish (err .. working code),



Joydeep










Re: changes to compression interfaces in 0.15?

2008-02-21 Thread Pete Wyckoff

If the API semantics are changing under you, you have to change your code
whether or not the API is pulled or deprecated.  Pulling it makes it more
obvious that the user has to change his/her code.

-- pete


On 2/21/08 12:41 PM, Arun C Murthy [EMAIL PROTECTED] wrote:

 
 On Feb 21, 2008, at 12:20 PM, Joydeep Sen Sarma wrote:
 
 To maintain backward compat, we cannot remove old apis - the standard
 procedure is to deprecate them for the next release and remove them
 in subsequent releases.
 
 you've got to be kidding.
 
 we didn't maintain backwards compatibility. my app broke. Simple
 and straightforward. and the old interfaces are not deprecated (to
 quote 0.15.3 on a 'deprecated' interface:
 
 
 You are right, HADOOP-1851 didn't fix it right. I've filed HADOOP-2869.
 
 We do need to be more diligent about listing config changes in
 CHANGES.txt for starters, and that point is taken. However, we can't
 start pulling out apis without deprecating them first.
 
 Arun
 
 
   /**
* Set the compression type for sequence files.
* @param job the configuration to modify
* @param val the new compression type (none, block, record)
*/
   static public void setCompressionType(Configuration job,
 CompressionType val) {
 )
 
 I (and i would suspect any average user willing to recompile code)
 would much much rather that we broke backwards compatibility
 immediately rather than maintain carry over defunct apis that
 insidiously break application behavior.
 
 and of course - this does not address the point that the option
 strings themselves are depcreated. (remember - people set options
 explicitly from xml files and streaming. not everyone goes through
 java apis)).
 
 --
 
 as one of my dear professors once said - put ur self in the other
 person's shoe. consider that u were in my position and that a
 production app suddenly went from consuming 100G to 1TB. and
 everything slowed down drastically. and it did not give any sign
 that anything was amiss. everything looked golden on the ourside.
 what would be ur reaction if u find out after a week that the
 system was full and numerous processes had to be re-run? how would
 you have figured that was going to happen by looking at the
 INCOMPATIBLE section (which btw - i did carefully before sending my
 mail).
 
 (fortunately i escaped the worst case - but i think this is a real
 call to action)
 
 
 -Original Message-
 From: Arun C Murthy [mailto:[EMAIL PROTECTED]
 Sent: Thu 2/21/2008 11:21 AM
 To: core-user@hadoop.apache.org
 Subject: Re: changes to compression interfaces in 0.15?
 
 Joydeep,
 
 On Feb 20, 2008, at 5:06 PM, Joydeep Sen Sarma wrote:
 
 Hi developers,
 
 In migrating to 0.15 - i am noticing that the compression interfaces
 have changed:
 
 -  compression type for sequencefile outputs used to be set
 by:
 SequenceFile.setCompressionType()
 
 -  now it seems to be set using:
 sequenceFileOutputFormat.setOutputCompressionType()
 
 
 
 Yes, we added SequenceFileOutputFormat.setOutputCompressionType and
 deprecated the old api. (HADOOP-1851)
 
 
 The change is for the better - but would it be possible to:
 
 -  remove old/dead interfaces. That would have been a
 straightforward hint for applications to look for new interfaces.
 (hadoop-default.xml also still has setting for old conf variable:
 io.seqfile.compression.type)
 
 
 To maintain backward compat, we cannot remove old apis - the standard
 procedure is to deprecate them for the next release and remove them
 in subsequent releases.
 
 -  if possible - document changed interfaces in the release
 notes (there's no way we can find this out by looking at the long
 list
 of Jiras).
 
 
 Please look at the INCOMPATIBLE CHANGES section of CHANGES.txt,
 HADOOP-1851 is listed there. Admittedly we can do better, but that is
 a good place to look for when upgrading to newer releases.
 
 i am not sure how updated the wiki is on the compression stuff (my
 responsibility to update it) - but please do consider the impact of
 
 Please use the forrest-based docs (on the hadoop website - e.g.
 mapred_tutorial.html) rather than the wiki as the gold-standard. The
 reason we moved away from the wiki is precisely this - harder to
 maintain docs per release etc.
 
 changing interfaces on existing applications. (maybe we should have a
 JIRA tag to mark out bugs that change interfaces).
 
 
 
 Again, CHANGES.txt and INCOMPATIBLE CHANGES section for now.
 
 Arun
 
 
 
 As always - thanks for all the fish (err .. working code),
 
 
 
 Joydeep