[jira] [Commented] (HBASE-7024) TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass

2012-10-24 Thread Dave Beech (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483096#comment-13483096
 ] 

Dave Beech commented on HBASE-7024:
---

Thanks Ted, Stack. 

Stack - you are right that keys and values have to be serializable, but they 
don't have to be Serializable in the Java interface sense. The Job/JobConf 
classes in Hadoop accept absolutely any class. Map tasks use Hadoop's 
SerializationFactory to work out which serializer class to use 
(WritableSerialization is the default, but you can specify custom ones through 
the io.serialization job setting, like AvroSerialization)

The point is that Hadoop doesn't care at all what type your map output key and 
value classes are, so long as you have provided a serializer which works with 
them. If you haven't, the job dies horribly (no surprise there).

I haven't tested with Hadoop 2 yet, no, but I'd be very surprised if this patch 
broke anything. If they'd changed this behaviour in Hadoop I'm sure there'd be 
tons of regression problems with mapreduce jobs that need custom serializers.  


> TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of 
> outputKeyClass and outputValueClass
> ---
>
> Key: HBASE-7024
> URL: https://issues.apache.org/jira/browse/HBASE-7024
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Dave Beech
>Priority: Minor
> Attachments: HBASE-7024.patch
>
>
> The various initTableMapperJob methods in TableMapReduceUtil take 
> outputKeyClass and outputValueClass parameters which need to extend 
> WritableComparable and Writable respectively. 
> Because of this, it is not convenient to use an alternative serialization 
> like Avro. (I wanted to set these parameters to AvroKey and AvroValue). 
> The methods in the MapReduce API to set map output key and value types do not 
> impose this restriction, so is there a reason to do it here?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7024) TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass

2012-10-23 Thread Dave Beech (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Beech updated HBASE-7024:
--

Issue Type: Improvement  (was: Bug)

> TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of 
> outputKeyClass and outputValueClass
> ---
>
> Key: HBASE-7024
> URL: https://issues.apache.org/jira/browse/HBASE-7024
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Dave Beech
>Priority: Minor
> Attachments: HBASE-7024.patch
>
>
> The various initTableMapperJob methods in TableMapReduceUtil take 
> outputKeyClass and outputValueClass parameters which need to extend 
> WritableComparable and Writable respectively. 
> Because of this, it is not convenient to use an alternative serialization 
> like Avro. (I wanted to set these parameters to AvroKey and AvroValue). 
> The methods in the MapReduce API to set map output key and value types do not 
> impose this restriction, so is there a reason to do it here?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7024) TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass

2012-10-23 Thread Dave Beech (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Beech updated HBASE-7024:
--

Status: Patch Available  (was: Open)

> TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of 
> outputKeyClass and outputValueClass
> ---
>
> Key: HBASE-7024
> URL: https://issues.apache.org/jira/browse/HBASE-7024
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Reporter: Dave Beech
>Priority: Minor
> Attachments: HBASE-7024.patch
>
>
> The various initTableMapperJob methods in TableMapReduceUtil take 
> outputKeyClass and outputValueClass parameters which need to extend 
> WritableComparable and Writable respectively. 
> Because of this, it is not convenient to use an alternative serialization 
> like Avro. (I wanted to set these parameters to AvroKey and AvroValue). 
> The methods in the MapReduce API to set map output key and value types do not 
> impose this restriction, so is there a reason to do it here?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7024) TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass

2012-10-23 Thread Dave Beech (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Beech updated HBASE-7024:
--

Attachment: HBASE-7024.patch

OK, thanks. Can I propose this patch - it simply removes the "extends 
Writable/WritableConfigurable" bit from the outputKeyClass and outputValueClass 
parameters

> TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of 
> outputKeyClass and outputValueClass
> ---
>
> Key: HBASE-7024
> URL: https://issues.apache.org/jira/browse/HBASE-7024
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Reporter: Dave Beech
>Priority: Minor
> Attachments: HBASE-7024.patch
>
>
> The various initTableMapperJob methods in TableMapReduceUtil take 
> outputKeyClass and outputValueClass parameters which need to extend 
> WritableComparable and Writable respectively. 
> Because of this, it is not convenient to use an alternative serialization 
> like Avro. (I wanted to set these parameters to AvroKey and AvroValue). 
> The methods in the MapReduce API to set map output key and value types do not 
> impose this restriction, so is there a reason to do it here?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7024) TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass

2012-10-22 Thread Dave Beech (JIRA)
Dave Beech created HBASE-7024:
-

 Summary: TableMapReduceUtil.initTableMapperJob unnecessarily 
limits the types of outputKeyClass and outputValueClass
 Key: HBASE-7024
 URL: https://issues.apache.org/jira/browse/HBASE-7024
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Reporter: Dave Beech
Priority: Minor


The various initTableMapperJob methods in TableMapReduceUtil take 
outputKeyClass and outputValueClass parameters which need to extend 
WritableComparable and Writable respectively. 

Because of this, it is not convenient to use an alternative serialization like 
Avro. (I wanted to set these parameters to AvroKey and AvroValue). 

The methods in the MapReduce API to set map output key and value types do not 
impose this restriction, so is there a reason to do it here?


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira