[jira] [Created] (HBASE-12321) Delete#deleteColumn seems not to work with bulkload

2014-10-22 Thread Jan Lukavsky (JIRA)
Jan Lukavsky created HBASE-12321:


 Summary: Delete#deleteColumn seems not to work with bulkload
 Key: HBASE-12321
 URL: https://issues.apache.org/jira/browse/HBASE-12321
 Project: HBase
  Issue Type: Bug
  Components: Deletes, HFile, mapreduce
Affects Versions: 0.94.6
Reporter: Jan Lukavsky
Priority: Minor


When using call to {{Delete#deleteColumn(byte[], byte[])}} to produce KeyValues 
that are subsequently written to HFileOutputFormat and bulk loaded into HBase, 
the Delete seems to be ignored. The reason for this is likely to be the missing 
(HConstants.LATEST_TIMESTAMP) timestamp in the KeyValue with type 
{{KeyValue.Type.Delete}}. I think the RegionServer than cannot delete the 
contents of the column due to mismatch in the timestamp.

When using {{Delete#deleteColumns}} everything works fine, because of different 
type {{KeyValue.Type.DeleteColumn}}.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12321) Delete#deleteColumn seems not to work with bulkload

2014-10-22 Thread Jan Lukavsky (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179892#comment-14179892
 ] 

Jan Lukavsky commented on HBASE-12321:
--

I can think of two solutions:
 # in the RecordWriter let the user know he does something that is not supposed 
to work, or
 # in the RegionServer make the Delete to delete the same data as would a call 
to {{HTable#delete}}

The second solution seems tricky, because it would require to know *when* was 
created the Delete and also *when* was issued each Put to HBase, because, it is 
possible to write data with different stamps then 'now'.

The solution in RecordWriter can include either incrementing a counter or 
throwing an exception.

What would be a better solution? Or is there any third option?



 Delete#deleteColumn seems not to work with bulkload
 ---

 Key: HBASE-12321
 URL: https://issues.apache.org/jira/browse/HBASE-12321
 Project: HBase
  Issue Type: Bug
  Components: Deletes, HFile, mapreduce
Affects Versions: 0.94.6
Reporter: Jan Lukavsky
Priority: Minor

 When using call to {{Delete#deleteColumn(byte[], byte[])}} to produce 
 KeyValues that are subsequently written to HFileOutputFormat and bulk loaded 
 into HBase, the Delete seems to be ignored. The reason for this is likely to 
 be the missing (HConstants.LATEST_TIMESTAMP) timestamp in the KeyValue with 
 type {{KeyValue.Type.Delete}}. I think the RegionServer than cannot delete 
 the contents of the column due to mismatch in the timestamp.
 When using {{Delete#deleteColumns}} everything works fine, because of 
 different type {{KeyValue.Type.DeleteColumn}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12321) Delete#deleteColumn seems not to work with bulkload

2014-10-22 Thread Jan Lukavsky (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179936#comment-14179936
 ] 

Jan Lukavsky commented on HBASE-12321:
--

Basically, I want to delete the lastest version of the column. Don't get me 
wrong, I *know* that the usage is wrong on the client side (correct usage is to 
use {{Delete#deleteColumns}}). What I see as a problem is that everything seems 
to be working just fine, except for the fact, that no data gets deleted. The 
combination of KeyValue.Type.Delete, HConstants.LATEST_TIMESTAMP and bulk load 
is IMHO wrong in all cases and the client should be notfied about it.

 Delete#deleteColumn seems not to work with bulkload
 ---

 Key: HBASE-12321
 URL: https://issues.apache.org/jira/browse/HBASE-12321
 Project: HBase
  Issue Type: Bug
  Components: Deletes, HFile, mapreduce
Affects Versions: 0.94.6
Reporter: Jan Lukavsky
Priority: Minor

 When using call to {{Delete#deleteColumn(byte[], byte[])}} to produce 
 KeyValues that are subsequently written to HFileOutputFormat and bulk loaded 
 into HBase, the Delete seems to be ignored. The reason for this is likely to 
 be the missing (HConstants.LATEST_TIMESTAMP) timestamp in the KeyValue with 
 type {{KeyValue.Type.Delete}}. I think the RegionServer than cannot delete 
 the contents of the column due to mismatch in the timestamp.
 When using {{Delete#deleteColumns}} everything works fine, because of 
 different type {{KeyValue.Type.DeleteColumn}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12321) Delete#deleteColumn seems not to work with bulkload

2014-10-22 Thread Jan Lukavsky (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179964#comment-14179964
 ] 

Jan Lukavsky commented on HBASE-12321:
--

Yes, it seems better to me, too. And what should the record reader do? Throw 
exception, increment counter, or something else? I'd prefer throwing an 
exception, but this might break some client code (which is probably already 
broken). On the other hand, throwing exception is more explicit. 

 Delete#deleteColumn seems not to work with bulkload
 ---

 Key: HBASE-12321
 URL: https://issues.apache.org/jira/browse/HBASE-12321
 Project: HBase
  Issue Type: Bug
  Components: Deletes, HFile, mapreduce
Affects Versions: 0.94.6
Reporter: Jan Lukavsky
Priority: Minor

 When using call to {{Delete#deleteColumn(byte[], byte[])}} to produce 
 KeyValues that are subsequently written to HFileOutputFormat and bulk loaded 
 into HBase, the Delete seems to be ignored. The reason for this is likely to 
 be the missing (HConstants.LATEST_TIMESTAMP) timestamp in the KeyValue with 
 type {{KeyValue.Type.Delete}}. I think the RegionServer than cannot delete 
 the contents of the column due to mismatch in the timestamp.
 When using {{Delete#deleteColumns}} everything works fine, because of 
 different type {{KeyValue.Type.DeleteColumn}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-11674) LoadIncrementalHFiles should be more verbose after unrecoverable error

2014-08-05 Thread Jan Lukavsky (JIRA)
Jan Lukavsky created HBASE-11674:


 Summary: LoadIncrementalHFiles should be more verbose after 
unrecoverable error
 Key: HBASE-11674
 URL: https://issues.apache.org/jira/browse/HBASE-11674
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 0.98.5
Reporter: Jan Lukavsky
Assignee: Jan Lukavsky


LoadIncrementalHFiles should give more information after failure to load data 
to regionserver. Currently, it logs only Encountered unrecoverable error from 
region server, but doesn't give information about
 * which region server it talked to
 * which was the region that failed to load data

In order to help understand what is going on, the log should contain both these 
information.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11674) LoadIncrementalHFiles should be more verbose after unrecoverable error

2014-08-05 Thread Jan Lukavsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Lukavsky updated HBASE-11674:
-

Status: Patch Available  (was: Open)

I did not find out, how to get all the information 
(RegionServerCallable#getLocation is protected), but I suppose that the 
following patch could do the work.

 LoadIncrementalHFiles should be more verbose after unrecoverable error
 --

 Key: HBASE-11674
 URL: https://issues.apache.org/jira/browse/HBASE-11674
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 0.98.5
Reporter: Jan Lukavsky
Assignee: Jan Lukavsky
 Attachments: HBASE-11674.patch


 LoadIncrementalHFiles should give more information after failure to load data 
 to regionserver. Currently, it logs only Encountered unrecoverable error 
 from region server, but doesn't give information about
  * which region server it talked to
  * which was the region that failed to load data
 In order to help understand what is going on, the log should contain both 
 these information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11674) LoadIncrementalHFiles should be more verbose after unrecoverable error

2014-08-05 Thread Jan Lukavsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Lukavsky updated HBASE-11674:
-

Attachment: HBASE-11674.patch

 LoadIncrementalHFiles should be more verbose after unrecoverable error
 --

 Key: HBASE-11674
 URL: https://issues.apache.org/jira/browse/HBASE-11674
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 0.98.5
Reporter: Jan Lukavsky
Assignee: Jan Lukavsky
 Attachments: HBASE-11674.patch


 LoadIncrementalHFiles should give more information after failure to load data 
 to regionserver. Currently, it logs only Encountered unrecoverable error 
 from region server, but doesn't give information about
  * which region server it talked to
  * which was the region that failed to load data
 In order to help understand what is going on, the log should contain both 
 these information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11674) LoadIncrementalHFiles should be more verbose after unrecoverable error

2014-08-05 Thread Jan Lukavsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Lukavsky updated HBASE-11674:
-

Attachment: HBASE-11674-ii.patch

I accidentally removed logging of the exception, fixing that.

 LoadIncrementalHFiles should be more verbose after unrecoverable error
 --

 Key: HBASE-11674
 URL: https://issues.apache.org/jira/browse/HBASE-11674
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 0.98.5
Reporter: Jan Lukavsky
Assignee: Jan Lukavsky
 Attachments: HBASE-11674-ii.patch, HBASE-11674.patch


 LoadIncrementalHFiles should give more information after failure to load data 
 to regionserver. Currently, it logs only Encountered unrecoverable error 
 from region server, but doesn't give information about
  * which region server it talked to
  * which was the region that failed to load data
 In order to help understand what is going on, the log should contain both 
 these information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-5757) TableInputFormat should handle as many errors as possible

2012-05-21 Thread Jan Lukavsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Lukavsky updated HBASE-5757:


Attachment: HBASE-5757-trunk-r1341041.patch

There was conflicting commit to patch for HBASE-6004. Merged this patch, the 
new one should apply to revision 1341041.

 TableInputFormat should handle as many errors as possible
 -

 Key: HBASE-5757
 URL: https://issues.apache.org/jira/browse/HBASE-5757
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.6
Reporter: Jan Lukavsky
 Attachments: HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, 
 HBASE-5757.patch


 Prior to HBASE-4196 there was different handling of IOExceptions thrown from 
 scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this 
 handling so that if exception is caught a reconnect is attempted (without 
 bothering the mapred client). After that, HBASE-4269 changed this behavior 
 back, but in both mapred and mapreduce APIs. The question is, is there any 
 reason not to handle all errors that the input format can handle? In other 
 words, why not try to reissue the request after *any* IOException? I see the 
 following disadvantages of current approach
  * the client may see exceptions like LeaseException and 
 ScannerTimeoutException if he fails to process all fetched data in timeout
  * to avoid ScannerTimeoutException the client must raise 
 hbase.regionserver.lease.period
  * timeouts for tasks is aready configured in mapred.task.timeout, so this 
 seems to me a bit redundant, because typically one needs to update both these 
 parameters
  * I don't see any possibility to get rid of LeaseException (this is 
 configured on server side)
 I think all of these issues would be gone, if the DoNotRetryIOException would 
 not be rethrown. -On the other hand, handling errors in InputFormat has 
 disadvantage, that it may hide from the user some inefficiency. Eg. if I have 
 very big scanner.caching, and I manage to process only a few rows in timeout, 
 I will end up with single row being fetched many times (and will not be 
 explicitly notified about this). Could we solve this problem by adding some 
 counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5757) TableInputFormat should handle as many errors as possible

2012-05-15 Thread Jan Lukavsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Lukavsky updated HBASE-5757:


Attachment: HBASE-5757.patch

Attaching patch including modified tests (pass on my box) and counter in the 
new API.

 TableInputFormat should handle as many errors as possible
 -

 Key: HBASE-5757
 URL: https://issues.apache.org/jira/browse/HBASE-5757
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.6
Reporter: Jan Lukavsky
 Attachments: HBASE-5757.patch, HBASE-5757.patch


 Prior to HBASE-4196 there was different handling of IOExceptions thrown from 
 scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this 
 handling so that if exception is caught a reconnect is attempted (without 
 bothering the mapred client). After that, HBASE-4269 changed this behavior 
 back, but in both mapred and mapreduce APIs. The question is, is there any 
 reason not to handle all errors that the input format can handle? In other 
 words, why not try to reissue the request after *any* IOException? I see the 
 following disadvantages of current approach
  * the client may see exceptions like LeaseException and 
 ScannerTimeoutException if he fails to process all fetched data in timeout
  * to avoid ScannerTimeoutException the client must raise 
 hbase.regionserver.lease.period
  * timeouts for tasks is aready configured in mapred.task.timeout, so this 
 seems to me a bit redundant, because typically one needs to update both these 
 parameters
  * I don't see any possibility to get rid of LeaseException (this is 
 configured on server side)
 I think all of these issues would be gone, if the DoNotRetryIOException would 
 not be rethrown. -On the other hand, handling errors in InputFormat has 
 disadvantage, that it may hide from the user some inefficiency. Eg. if I have 
 very big scanner.caching, and I manage to process only a few rows in timeout, 
 I will end up with single row being fetched many times (and will not be 
 explicitly notified about this). Could we solve this problem by adding some 
 counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

2012-05-15 Thread Jan Lukavsky (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13275793#comment-13275793
 ] 

Jan Lukavsky commented on HBASE-5757:
-

{quote}Note that we've been able to can set scanner caching on each individual 
scan in since 0.20 (HBASE-1759) – setting it for that job may be more 
'correct'. {quote}

We are setting different caching for different jobs, the problem is that the 
rows may take different time to process (based on job) and this cannot be told 
in advance. Currently, it is only possible to set the caching for the whole 
job, but even if it was possible to change the caching *during* the job, we 
would not know that we need to do it before we will get the 
ScannerTimeoutException. So handling this error in the TableInputFormat seems 
right solution to me.

 TableInputFormat should handle as many errors as possible
 -

 Key: HBASE-5757
 URL: https://issues.apache.org/jira/browse/HBASE-5757
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.6
Reporter: Jan Lukavsky
 Attachments: HBASE-5757.patch, HBASE-5757.patch


 Prior to HBASE-4196 there was different handling of IOExceptions thrown from 
 scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this 
 handling so that if exception is caught a reconnect is attempted (without 
 bothering the mapred client). After that, HBASE-4269 changed this behavior 
 back, but in both mapred and mapreduce APIs. The question is, is there any 
 reason not to handle all errors that the input format can handle? In other 
 words, why not try to reissue the request after *any* IOException? I see the 
 following disadvantages of current approach
  * the client may see exceptions like LeaseException and 
 ScannerTimeoutException if he fails to process all fetched data in timeout
  * to avoid ScannerTimeoutException the client must raise 
 hbase.regionserver.lease.period
  * timeouts for tasks is aready configured in mapred.task.timeout, so this 
 seems to me a bit redundant, because typically one needs to update both these 
 parameters
  * I don't see any possibility to get rid of LeaseException (this is 
 configured on server side)
 I think all of these issues would be gone, if the DoNotRetryIOException would 
 not be rethrown. -On the other hand, handling errors in InputFormat has 
 disadvantage, that it may hide from the user some inefficiency. Eg. if I have 
 very big scanner.caching, and I manage to process only a few rows in timeout, 
 I will end up with single row being fetched many times (and will not be 
 explicitly notified about this). Could we solve this problem by adding some 
 counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

2012-05-09 Thread Jan Lukavsky (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13271197#comment-13271197
 ] 

Jan Lukavsky commented on HBASE-5757:
-

Hi Jon,

I'm not sure, but IMO the purpose of DoNotRetryIOException is to instruct the 
HTable client not to retry the request. In TableInputFormat we are working on 
higher level, so retrying is OK. DNRIOEx is to distinguish exceptions that 
might be caused by region reassignment for instance, and that might disappear 
if the request is resent (and possibly dropping the cached region location and 
quering .META. again). UnknonwnScannerException on the other hand will not 
'disapper' if the *same* request is sent by HTable client. But in the 
InputFormat we can restart the scanner, and so we will not send the same 
request, hence it can succeed.

Retrying the request just once and then giving up is to avoid infinite cycles, 
and mostly it suffices to retry just once, because a typical cause of the 
UnknownScannerException or LeaseException is too slow Mapper (there could be 
other like scanning for too sparse column, but this will not be solved by this 
issue :)). There is possibility to lower scanner caching, but this might be 
inefficient (eg. when the 99.99% of time the caching is just OK, and then there 
exists some strange records, that take the Mapper longer to process). Lowering 
the caching globally just because of these few records doesn't sound like the 
'correct' solution.



 TableInputFormat should handle as many errors as possible
 -

 Key: HBASE-5757
 URL: https://issues.apache.org/jira/browse/HBASE-5757
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.6
Reporter: Jan Lukavsky
 Attachments: HBASE-5757.patch


 Prior to HBASE-4196 there was different handling of IOExceptions thrown from 
 scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this 
 handling so that if exception is caught a reconnect is attempted (without 
 bothering the mapred client). After that, HBASE-4269 changed this behavior 
 back, but in both mapred and mapreduce APIs. The question is, is there any 
 reason not to handle all errors that the input format can handle? In other 
 words, why not try to reissue the request after *any* IOException? I see the 
 following disadvantages of current approach
  * the client may see exceptions like LeaseException and 
 ScannerTimeoutException if he fails to process all fetched data in timeout
  * to avoid ScannerTimeoutException the client must raise 
 hbase.regionserver.lease.period
  * timeouts for tasks is aready configured in mapred.task.timeout, so this 
 seems to me a bit redundant, because typically one needs to update both these 
 parameters
  * I don't see any possibility to get rid of LeaseException (this is 
 configured on server side)
 I think all of these issues would be gone, if the DoNotRetryIOException would 
 not be rethrown. -On the other hand, handling errors in InputFormat has 
 disadvantage, that it may hide from the user some inefficiency. Eg. if I have 
 very big scanner.caching, and I manage to process only a few rows in timeout, 
 I will end up with single row being fetched many times (and will not be 
 explicitly notified about this). Could we solve this problem by adding some 
 counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5757) TableInputFormat should handle as many errors as possible

2012-04-25 Thread Jan Lukavsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Lukavsky updated HBASE-5757:


Summary: TableInputFormat should handle as many errors as possible  (was: 
TableInputFormat should handle as much errors as possible)

 TableInputFormat should handle as many errors as possible
 -

 Key: HBASE-5757
 URL: https://issues.apache.org/jira/browse/HBASE-5757
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.6
Reporter: Jan Lukavsky
 Attachments: HBASE-5757.patch


 Prior to HBASE-4196 there was different handling of IOExceptions thrown from 
 scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this 
 handling so that if exception is caught a reconnect is attempted (without 
 bothering the mapred client). After that, HBASE-4269 changed this behavior 
 back, but in both mapred and mapreduce APIs. The question is, is there any 
 reason not to handle all errors that the input format can handle? In other 
 words, why not try to reissue the request after *any* IOException? I see the 
 following disadvantages of current approach
  * the client may see exceptions like LeaseException and 
 ScannerTimeoutException if he fails to process all fetched data in timeout
  * to avoid ScannerTimeoutException the client must raise 
 hbase.regionserver.lease.period
  * timeouts for tasks is aready configured in mapred.task.timeout, so this 
 seems to me a bit redundant, because typically one needs to update both these 
 parameters
  * I don't see any possibility to get rid of LeaseException (this is 
 configured on server side)
 I think all of these issues would be gone, if the DoNotRetryIOException would 
 not be rethrown. -On the other hand, handling errors in InputFormat has 
 disadvantage, that it may hide from the user some inefficiency. Eg. if I have 
 very big scanner.caching, and I manage to process only a few rows in timeout, 
 I will end up with single row being fetched many times (and will not be 
 explicitly notified about this). Could we solve this problem by adding some 
 counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4297) TableMapReduceUtil overwrites user supplied options

2011-09-08 Thread Jan Lukavsky (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100347#comment-13100347
 ] 

Jan Lukavsky commented on HBASE-4297:
-

Hi Stack,

I've tested the patch against cdh3u1 and it works fine for us. I haven't seen 
any negative side affects so far.

 TableMapReduceUtil overwrites user supplied options
 ---

 Key: HBASE-4297
 URL: https://issues.apache.org/jira/browse/HBASE-4297
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Jan Lukavsky
 Attachments: HBASE-4297.patch


 Job configuration is overwritten by hbase-default and hbase-site in 
 TableMapReduceUtil.initTable(Mapper|Reducer)Job, causing unexpected behavior 
 in the following code:
 {noformat}
 Configuration conf = HBaseConfiguration.create();
 // change keyvalue size
 conf.setInt(hbase.client.keyvalue.maxsize, 20971520);
 Job job = new Job(conf, ...);
 TableMapReduceUtil.initTableMapperJob(...);
 // the job doesn't have the option changed, uses it from hbase-site or 
 hbase-default
 job.submit();
 {noformat}
 Although in this case it could be fixed by moving the set() after 
 initTableMapperJob(), in case where user wants to change some option using 
 GenericOptionsParser and -D this is impossible, making this cool feature 
 useless.
 In the 0.20.x era this code behaved as expected. The solution of this problem 
 should be that we don't overwrite the options, but just read them if they are 
 missing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4297) TableMapReduceUtil overwrites user supplied options

2011-08-30 Thread Jan Lukavsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Lukavsky updated HBASE-4297:


Status: Patch Available  (was: Open)

 TableMapReduceUtil overwrites user supplied options
 ---

 Key: HBASE-4297
 URL: https://issues.apache.org/jira/browse/HBASE-4297
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Jan Lukavsky
 Attachments: HBASE-4297.patch


 Job configuration is overwritten by hbase-default and hbase-site in 
 TableMapReduceUtil.initTable(Mapper|Reducer)Job, causing unexpected behavior 
 in the following code:
 {noformat}
 Configuration conf = HBaseConfiguration.create();
 // change keyvalue size
 conf.setInt(hbase.client.keyvalue.maxsize, 20971520);
 Job job = new Job(conf, ...);
 TableMapReduceUtil.initTableMapperJob(...);
 // the job doesn't have the option changed, uses it from hbase-site or 
 hbase-default
 job.submit();
 {noformat}
 Although in this case it could be fixed by moving the set() after 
 initTableMapperJob(), in case where user wants to change some option using 
 GenericOptionsParser and -D this is impossible, making this cool feature 
 useless.
 In the 0.20.x era this code behaved as expected. The solution of this problem 
 should be that we don't overwrite the options, but just read them if they are 
 missing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3578) TableInputFormat does not setup the configuration for HBase mapreduce jobs correctly

2011-08-29 Thread Jan Lukavsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Lukavsky updated HBASE-3578:


Attachment: HBASE-3578.patch

 TableInputFormat does not setup the configuration for HBase mapreduce jobs 
 correctly
 

 Key: HBASE-3578
 URL: https://issues.apache.org/jira/browse/HBASE-3578
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.0, 0.90.1
Reporter: Dan Harvey
Assignee: Dan Harvey
 Fix For: 0.92.0

 Attachments: HBASE-3578.patch, mapreduce_configuration.patch


 In 0.20.x and earlier TableMapReduceUtil (and other Input/OutputFormat 
 classes) used to setup the HTable with a HBaseConfiguration object, now that 
 has been deprecated in #HBASE-2036 they are constructed with Hadoop 
 configuration objects which do not contain the configuration xml file 
 resources required to setup HBase. I think it is currently expected this is 
 done when constructing the job but as this needs to be done for every HBase 
 mapreduce job it would be cleaner if the TableMapReduceUtil class did this 
 whilst setting up the TableInput/OutputFormat classes. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3578) TableInputFormat does not setup the configuration for HBase mapreduce jobs correctly

2011-08-29 Thread Jan Lukavsky (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092950#comment-13092950
 ] 

Jan Lukavsky commented on HBASE-3578:
-

Hi,

I think solution to this issue causes problems when job wants to change hbase 
specific options. Eg.

{noformat}
Configuration conf = HBaseConfiguration.create();

// change keyvalue size
conf.setInt(hbase.client.keyvalue.maxsize, 20971520);

Job job = new Job(conf, ...);

TableMapReduceUtil.initTableMapperJob(...);

// the job doesn't have the option changed, uses it from hbase-site or 
hbase-default
job.submit();

{noformat}

Although in this case it could be fixed by moving the set() after 
initTableMapperJob(), in case where user want's to change some option using 
GenericOptionsParser and -D this is impossible, making this cool feature 
useless.

In the 0.20.x era this code behaved as expected. The solution of this problem 
should be that we don't overwrite the options, but just read them if they are 
missing. I attached patch I think will fix this.




 TableInputFormat does not setup the configuration for HBase mapreduce jobs 
 correctly
 

 Key: HBASE-3578
 URL: https://issues.apache.org/jira/browse/HBASE-3578
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.0, 0.90.1
Reporter: Dan Harvey
Assignee: Dan Harvey
 Fix For: 0.92.0

 Attachments: HBASE-3578.patch, HBASE-3578.patch, 
 mapreduce_configuration.patch


 In 0.20.x and earlier TableMapReduceUtil (and other Input/OutputFormat 
 classes) used to setup the HTable with a HBaseConfiguration object, now that 
 has been deprecated in #HBASE-2036 they are constructed with Hadoop 
 configuration objects which do not contain the configuration xml file 
 resources required to setup HBase. I think it is currently expected this is 
 done when constructing the job but as this needs to be done for every HBase 
 mapreduce job it would be cleaner if the TableMapReduceUtil class did this 
 whilst setting up the TableInput/OutputFormat classes. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3578) TableInputFormat does not setup the configuration for HBase mapreduce jobs correctly

2011-08-29 Thread Jan Lukavsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Lukavsky updated HBASE-3578:


Attachment: HBASE-3578.patch

 TableInputFormat does not setup the configuration for HBase mapreduce jobs 
 correctly
 

 Key: HBASE-3578
 URL: https://issues.apache.org/jira/browse/HBASE-3578
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.0, 0.90.1
Reporter: Dan Harvey
Assignee: Dan Harvey
 Fix For: 0.92.0

 Attachments: HBASE-3578.patch, HBASE-3578.patch, 
 mapreduce_configuration.patch


 In 0.20.x and earlier TableMapReduceUtil (and other Input/OutputFormat 
 classes) used to setup the HTable with a HBaseConfiguration object, now that 
 has been deprecated in #HBASE-2036 they are constructed with Hadoop 
 configuration objects which do not contain the configuration xml file 
 resources required to setup HBase. I think it is currently expected this is 
 done when constructing the job but as this needs to be done for every HBase 
 mapreduce job it would be cleaner if the TableMapReduceUtil class did this 
 whilst setting up the TableInput/OutputFormat classes. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira