[jira] [Commented] (HBASE-6028) Implement a cancel for in-progress compactions

2015-07-07 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617698#comment-14617698
 ] 

Ishan Chhabra commented on HBASE-6028:
--

[~esteban], are you working on this?

 Implement a cancel for in-progress compactions
 --

 Key: HBASE-6028
 URL: https://issues.apache.org/jira/browse/HBASE-6028
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Derek Wollenstein
Assignee: Esteban Gutierrez
Priority: Minor
  Labels: beginner

 Depending on current server load, it can be extremely expensive to run 
 periodic minor / major compactions.  It would be helpful to have a feature 
 where a user could use the shell or a client tool to explicitly cancel an 
 in-progress compactions.  This would allow a system to recover when too many 
 regions became eligible for compactions at once



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11789) LoadIncrementalHFiles is not picking up the -D option

2014-08-21 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105762#comment-14105762
 ] 

Ishan Chhabra commented on HBASE-11789:
---

Can you apply this to 0.96 and 0.94 head as well?

 LoadIncrementalHFiles is not picking up the -D option 
 --

 Key: HBASE-11789
 URL: https://issues.apache.org/jira/browse/HBASE-11789
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0, 0.98.5, 2.0.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: 0.99.0, 2.0.0, 0.98.6

 Attachments: HBASE-11789-v0.patch


 LoadIncrementalHFiles is not using the Tool class correctly, preventing to 
 use the -D options.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11642) EOL 0.96

2014-08-05 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085885#comment-14085885
 ] 

Ishan Chhabra commented on HBASE-11642:
---

[~lhofhansl], I agree with your reasoning of maintaining 0.94 a little longer 
(given the incompatibility and a stop the world upgrade process), but I am 
worried we might end up in a Python 2.x 3.x like scenario. Declaring an EOL for 
0.94 might be a good way to trigger upgrades for many clients who have not done 
it yet and will bring more focus to the community (I see many JIRAs with won't 
backport for 0.94, but some patches attached, and people will keep on coming 
back to the mailing lists seeking help for these and other issues).

 EOL 0.96
 

 Key: HBASE-11642
 URL: https://issues.apache.org/jira/browse/HBASE-11642
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack

 Do the work to EOL 0.96.
 + No more patches on 0.96
 + Remove 0.96 from downloads.
 + If user has issue with 0.96 and needs fix, fix it in 0.98 and have the user 
 upgrade to get the fix.
 + Write email to user list stating 0.96 has been EOL'd September 1st? And add 
 notice to refguide.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11642) EOL 0.96

2014-08-05 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085992#comment-14085992
 ] 

Ishan Chhabra commented on HBASE-11642:
---

Hmm, I agree and can relate to the pain associated with a co-ordinated upgrade 
having gone through it myself. 

Thinking about it more, I agree a full EOL is not a good idea for 0.94, but is 
there are way to make its status more clearer and explicit (maybe extended 
maintenance like python 2.x) so that people running smaller clusters would 
consider upgrades and newer users don't use 0.94 accidentally (I see some of 
them doing that mostly because they start with CDH 4). It is easy for a casual 
observer to think that 0.94 is having active releases, whereas it is made clear 
in the JIRAs that new features should not be added to this branch. 

As a side note, this line should be definitely fixed in the download pages (eg. 
http://www.carfab.com/apachesoftware/hbase/)

The 0.96.x series supercedes 0.94.x. We are leaving the 'stable' pointer on 
the latest 0.94.x for now while 0.96 is still 'fresh'.


 EOL 0.96
 

 Key: HBASE-11642
 URL: https://issues.apache.org/jira/browse/HBASE-11642
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack

 Do the work to EOL 0.96.
 + No more patches on 0.96
 + Remove 0.96 from downloads.
 + If user has issue with 0.96 and needs fix, fix it in 0.98 and have the user 
 upgrade to get the fix.
 + Write email to user list stating 0.96 has been EOL'd September 1st? And add 
 notice to refguide.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11642) EOL 0.96

2014-08-04 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085154#comment-14085154
 ] 

Ishan Chhabra commented on HBASE-11642:
---

Is there a plan to EOL 0.94 soon as well?

 EOL 0.96
 

 Key: HBASE-11642
 URL: https://issues.apache.org/jira/browse/HBASE-11642
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack

 Do the work to EOL 0.96.
 + No more patches on 0.96
 + Remove 0.96 from downloads.
 + If user has issue with 0.96 and needs fix, fix it in 0.98 and have the user 
 upgrade to get the fix.
 + Write email to user list stating 0.96 has been EOL'd September 1st? And add 
 notice to refguide.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11635) Deprecate TableMapReduceUtil.setScannerCaching

2014-08-03 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084206#comment-14084206
 ] 

Ishan Chhabra commented on HBASE-11635:
---

[~ndimiduk], The config is also used by hbase client to decide the value of 
caching if it is not set on the scan object. Here is the relevant piece of code 
from ClientScanner.java

{code}
  // Use the caching from the Scan.  If not set, use the default cache 
setting for this table.
  if (this.scan.getCaching()  0) {
this.caching = this.scan.getCaching();
  } else {
this.caching = conf.getInt(
HConstants.HBASE_CLIENT_SCANNER_CACHING,
HConstants.DEFAULT_HBASE_CLIENT_SCANNER_CACHING);
  }
{code}

If we deprecate and remove the config, it also means that you cannot set 
caching at the client side using this config. Is that ok?

 Deprecate TableMapReduceUtil.setScannerCaching
 --

 Key: HBASE-11635
 URL: https://issues.apache.org/jira/browse/HBASE-11635
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 1.0.0, 0.98.4, 2.0.0
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra

 See discussion in HBASE-11558.
 Currently there are 2 ways to specify scanner caching when invoking a MR job 
 using TableMapReduceUtil.
 1. By setting the caching on the Scan Object.
 2. By setting the hbase.client.scanner.caching config using 
 TableMapReduceUtil.setScannerCaching.
 This JIRA attempts to deprecate the latter. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11635) Deprecate TableMapReduceUtil.setScannerCaching

2014-07-31 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081345#comment-14081345
 ] 

Ishan Chhabra commented on HBASE-11635:
---

[~ndimiduk], I might have read your comment wrong, but were you also saying 
that we should get rid of the config altogether? 

 Deprecate TableMapReduceUtil.setScannerCaching
 --

 Key: HBASE-11635
 URL: https://issues.apache.org/jira/browse/HBASE-11635
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 1.0.0, 0.98.4, 2.0.0
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra

 See discussion in HBASE-11558.
 Currently there are 2 ways to specify scanner caching when invoking a MR job 
 using TableMapReduceUtil.
 1. By setting the caching on the Scan Object.
 2. By setting the hbase.client.scanner.caching config using 
 TableMapReduceUtil.setScannerCaching.
 This JIRA attempts to deprecate the latter. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11635) Deprecate TableMapReduceUtil.setScannerCaching

2014-07-31 Thread Ishan Chhabra (JIRA)
Ishan Chhabra created HBASE-11635:
-

 Summary: Deprecate TableMapReduceUtil.setScannerCaching
 Key: HBASE-11635
 URL: https://issues.apache.org/jira/browse/HBASE-11635
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 0.98.4, 1.0.0, 2.0.0
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra


See discussion in HBASE-11558.

Currently there are 2 ways to specify scanner caching when invoking a MR job 
using TableMapReduceUtil.

1. By setting the caching on the Scan Object.
2. By setting the hbase.client.scanner.caching config using 
TableMapReduceUtil.setScannerCaching.

This JIRA attempts to deprecate the latter. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11558) Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+

2014-07-30 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14078995#comment-14078995
 ] 

Ishan Chhabra commented on HBASE-11558:
---

[~ndimiduk], can you +1 and commit?

 Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+
 ---

 Key: HBASE-11558
 URL: https://issues.apache.org/jira/browse/HBASE-11558
 Project: HBase
  Issue Type: Bug
  Components: mapreduce, Scanners
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
 Fix For: 0.99.0, 0.96.3, 0.98.5, 2.0.0

 Attachments: HBASE_11558-0.96.patch, HBASE_11558-0.96_v2.patch, 
 HBASE_11558-0.98.patch, HBASE_11558-0.98_v2.patch, HBASE_11558.patch, 
 HBASE_11558_v2.patch, HBASE_11558_v2.patch


 0.94 and before, if one sets caching on the Scan object in the Job by calling 
 scan.setCaching(int) and passes it to TableMapReduceUtil, it is correctly 
 read and used by the mappers during a mapreduce job. This is because 
 Scan.write respects and serializes caching, which is used internally by 
 TableMapReduceUtil to serialize and transfer the scan object to the mappers.
 0.95+, after the move to protobuf, ProtobufUtil.toScan does not respect 
 caching anymore as ClientProtos.Scan does not have the field caching. Caching 
 is passed via the ScanRequest object to the server and so is not needed in 
 the Scan object. However, this breaks application code that relies on the 
 earlier behavior. This will lead to sudden degradation in Scan performance 
 0.96+ for users relying on the old behavior.
 There are 2 options here:
 1. Add caching to Scan object, adding an extra int to the payload for the 
 Scan object which is really not needed in the general case.
 2. Document and preach that TableMapReduceUtil.setScannerCaching must be 
 called by the client.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11558) Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+

2014-07-30 Thread Ishan Chhabra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chhabra updated HBASE-11558:
--

Release Note: 
TableMapReduceUtil now restores the option to set scanner caching by setting it 
on the Scan object that is passe in. The priority order for choosing the 
scanner caching is as follows:

1. Caching set on the scan object.
2. Caching specified via the config hbase.client.scanner.caching, which can 
either be set manually on the conf or via the helper method 
TableMapReduceUtil.setScannerCaching().
3. The default value HConstants.DEFAULT_HBASE_CLIENT_SCANNER_CACHING, which is 
set to 100 currently.

  was:
TableMapReduceUtil now restores the option to set scanner caching by setting it 
on the scanner object. The priority order for choosing the scanner caching is 
as follows:

1. Caching set on the scan object.
2. Caching specified via the config hbase.client.scanner.caching, which can 
either be set manually on the conf or via the helper method 
TableMapReduceUtil.setScannerCaching().
3. The default value HConstants.DEFAULT_HBASE_CLIENT_SCANNER_CACHING, which is 
set to 100 currently.


 Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+
 ---

 Key: HBASE-11558
 URL: https://issues.apache.org/jira/browse/HBASE-11558
 Project: HBase
  Issue Type: Bug
  Components: mapreduce, Scanners
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
 Fix For: 0.99.0, 0.96.3, 0.98.5, 2.0.0

 Attachments: HBASE_11558-0.96.patch, HBASE_11558-0.96_v2.patch, 
 HBASE_11558-0.98.patch, HBASE_11558-0.98_v2.patch, HBASE_11558.patch, 
 HBASE_11558_v2.patch, HBASE_11558_v2.patch


 0.94 and before, if one sets caching on the Scan object in the Job by calling 
 scan.setCaching(int) and passes it to TableMapReduceUtil, it is correctly 
 read and used by the mappers during a mapreduce job. This is because 
 Scan.write respects and serializes caching, which is used internally by 
 TableMapReduceUtil to serialize and transfer the scan object to the mappers.
 0.95+, after the move to protobuf, ProtobufUtil.toScan does not respect 
 caching anymore as ClientProtos.Scan does not have the field caching. Caching 
 is passed via the ScanRequest object to the server and so is not needed in 
 the Scan object. However, this breaks application code that relies on the 
 earlier behavior. This will lead to sudden degradation in Scan performance 
 0.96+ for users relying on the old behavior.
 There are 2 options here:
 1. Add caching to Scan object, adding an extra int to the payload for the 
 Scan object which is really not needed in the general case.
 2. Document and preach that TableMapReduceUtil.setScannerCaching must be 
 called by the client.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11558) Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+

2014-07-30 Thread Ishan Chhabra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chhabra updated HBASE-11558:
--

Release Note: 
TableMapReduceUtil now restores the option to set scanner caching by setting it 
on the scanner object. The priority order for choosing the scanner caching is 
as follows:

1. Caching set on the scan object.
2. Caching specified via the config hbase.client.scanner.caching, which can 
either be set manually on the conf or via the helper method 
TableMapReduceUtil.setScannerCaching().
3. The default value HConstants.DEFAULT_HBASE_CLIENT_SCANNER_CACHING, which is 
set to 100 currently.

 Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+
 ---

 Key: HBASE-11558
 URL: https://issues.apache.org/jira/browse/HBASE-11558
 Project: HBase
  Issue Type: Bug
  Components: mapreduce, Scanners
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
 Fix For: 0.99.0, 0.96.3, 0.98.5, 2.0.0

 Attachments: HBASE_11558-0.96.patch, HBASE_11558-0.96_v2.patch, 
 HBASE_11558-0.98.patch, HBASE_11558-0.98_v2.patch, HBASE_11558.patch, 
 HBASE_11558_v2.patch, HBASE_11558_v2.patch


 0.94 and before, if one sets caching on the Scan object in the Job by calling 
 scan.setCaching(int) and passes it to TableMapReduceUtil, it is correctly 
 read and used by the mappers during a mapreduce job. This is because 
 Scan.write respects and serializes caching, which is used internally by 
 TableMapReduceUtil to serialize and transfer the scan object to the mappers.
 0.95+, after the move to protobuf, ProtobufUtil.toScan does not respect 
 caching anymore as ClientProtos.Scan does not have the field caching. Caching 
 is passed via the ScanRequest object to the server and so is not needed in 
 the Scan object. However, this breaks application code that relies on the 
 earlier behavior. This will lead to sudden degradation in Scan performance 
 0.96+ for users relying on the old behavior.
 There are 2 options here:
 1. Add caching to Scan object, adding an extra int to the payload for the 
 Scan object which is really not needed in the general case.
 2. Document and preach that TableMapReduceUtil.setScannerCaching must be 
 called by the client.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11558) Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+

2014-07-30 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080304#comment-14080304
 ] 

Ishan Chhabra commented on HBASE-11558:
---

Updated release notes. It makes sense to remove the second method. Do you 
propose to delete the method or mark it as deprecated for now? Which branches 
should get this patch? I can open a separate JIRA and put in the patch there 
once the answers are clear.

 Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+
 ---

 Key: HBASE-11558
 URL: https://issues.apache.org/jira/browse/HBASE-11558
 Project: HBase
  Issue Type: Bug
  Components: mapreduce, Scanners
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
 Fix For: 0.99.0, 0.96.3, 0.98.5, 2.0.0

 Attachments: HBASE_11558-0.96.patch, HBASE_11558-0.96_v2.patch, 
 HBASE_11558-0.98.patch, HBASE_11558-0.98_v2.patch, HBASE_11558.patch, 
 HBASE_11558_v2.patch, HBASE_11558_v2.patch


 0.94 and before, if one sets caching on the Scan object in the Job by calling 
 scan.setCaching(int) and passes it to TableMapReduceUtil, it is correctly 
 read and used by the mappers during a mapreduce job. This is because 
 Scan.write respects and serializes caching, which is used internally by 
 TableMapReduceUtil to serialize and transfer the scan object to the mappers.
 0.95+, after the move to protobuf, ProtobufUtil.toScan does not respect 
 caching anymore as ClientProtos.Scan does not have the field caching. Caching 
 is passed via the ScanRequest object to the server and so is not needed in 
 the Scan object. However, this breaks application code that relies on the 
 earlier behavior. This will lead to sudden degradation in Scan performance 
 0.96+ for users relying on the old behavior.
 There are 2 options here:
 1. Add caching to Scan object, adding an extra int to the payload for the 
 Scan object which is really not needed in the general case.
 2. Document and preach that TableMapReduceUtil.setScannerCaching must be 
 called by the client.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11558) Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+

2014-07-29 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14078381#comment-14078381
 ] 

Ishan Chhabra commented on HBASE-11558:
---

Test failures are due to HBASE-11316

 Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+
 ---

 Key: HBASE-11558
 URL: https://issues.apache.org/jira/browse/HBASE-11558
 Project: HBase
  Issue Type: Bug
  Components: mapreduce, Scanners
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
 Fix For: 0.99.0, 0.96.3, 0.98.5, 2.0.0

 Attachments: HBASE_11558-0.96.patch, HBASE_11558-0.96_v2.patch, 
 HBASE_11558-0.98.patch, HBASE_11558-0.98_v2.patch, HBASE_11558.patch, 
 HBASE_11558_v2.patch, HBASE_11558_v2.patch


 0.94 and before, if one sets caching on the Scan object in the Job by calling 
 scan.setCaching(int) and passes it to TableMapReduceUtil, it is correctly 
 read and used by the mappers during a mapreduce job. This is because 
 Scan.write respects and serializes caching, which is used internally by 
 TableMapReduceUtil to serialize and transfer the scan object to the mappers.
 0.95+, after the move to protobuf, ProtobufUtil.toScan does not respect 
 caching anymore as ClientProtos.Scan does not have the field caching. Caching 
 is passed via the ScanRequest object to the server and so is not needed in 
 the Scan object. However, this breaks application code that relies on the 
 earlier behavior. This will lead to sudden degradation in Scan performance 
 0.96+ for users relying on the old behavior.
 There are 2 options here:
 1. Add caching to Scan object, adding an extra int to the payload for the 
 Scan object which is really not needed in the general case.
 2. Document and preach that TableMapReduceUtil.setScannerCaching must be 
 called by the client.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11558) Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+

2014-07-29 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14078385#comment-14078385
 ] 

Ishan Chhabra commented on HBASE-11558:
---

[~apurtell], how can I trigger the build for 0.96 and 0.98 patches? 

 Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+
 ---

 Key: HBASE-11558
 URL: https://issues.apache.org/jira/browse/HBASE-11558
 Project: HBase
  Issue Type: Bug
  Components: mapreduce, Scanners
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
 Fix For: 0.99.0, 0.96.3, 0.98.5, 2.0.0

 Attachments: HBASE_11558-0.96.patch, HBASE_11558-0.96_v2.patch, 
 HBASE_11558-0.98.patch, HBASE_11558-0.98_v2.patch, HBASE_11558.patch, 
 HBASE_11558_v2.patch, HBASE_11558_v2.patch


 0.94 and before, if one sets caching on the Scan object in the Job by calling 
 scan.setCaching(int) and passes it to TableMapReduceUtil, it is correctly 
 read and used by the mappers during a mapreduce job. This is because 
 Scan.write respects and serializes caching, which is used internally by 
 TableMapReduceUtil to serialize and transfer the scan object to the mappers.
 0.95+, after the move to protobuf, ProtobufUtil.toScan does not respect 
 caching anymore as ClientProtos.Scan does not have the field caching. Caching 
 is passed via the ScanRequest object to the server and so is not needed in 
 the Scan object. However, this breaks application code that relies on the 
 earlier behavior. This will lead to sudden degradation in Scan performance 
 0.96+ for users relying on the old behavior.
 There are 2 options here:
 1. Add caching to Scan object, adding an extra int to the payload for the 
 Scan object which is really not needed in the general case.
 2. Document and preach that TableMapReduceUtil.setScannerCaching must be 
 called by the client.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11558) Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+

2014-07-28 Thread Ishan Chhabra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chhabra updated HBASE-11558:
--

Status: Open  (was: Patch Available)

 Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+
 ---

 Key: HBASE-11558
 URL: https://issues.apache.org/jira/browse/HBASE-11558
 Project: HBase
  Issue Type: Bug
  Components: mapreduce, Scanners
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
 Fix For: 0.99.0, 0.96.3, 0.98.5, 2.0.0

 Attachments: HBASE_11558-0.96.patch, HBASE_11558-0.96_v2.patch, 
 HBASE_11558-0.98.patch, HBASE_11558-0.98_v2.patch, HBASE_11558.patch, 
 HBASE_11558_v2.patch


 0.94 and before, if one sets caching on the Scan object in the Job by calling 
 scan.setCaching(int) and passes it to TableMapReduceUtil, it is correctly 
 read and used by the mappers during a mapreduce job. This is because 
 Scan.write respects and serializes caching, which is used internally by 
 TableMapReduceUtil to serialize and transfer the scan object to the mappers.
 0.95+, after the move to protobuf, ProtobufUtil.toScan does not respect 
 caching anymore as ClientProtos.Scan does not have the field caching. Caching 
 is passed via the ScanRequest object to the server and so is not needed in 
 the Scan object. However, this breaks application code that relies on the 
 earlier behavior. This will lead to sudden degradation in Scan performance 
 0.96+ for users relying on the old behavior.
 There are 2 options here:
 1. Add caching to Scan object, adding an extra int to the payload for the 
 Scan object which is really not needed in the general case.
 2. Document and preach that TableMapReduceUtil.setScannerCaching must be 
 called by the client.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11558) Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+

2014-07-28 Thread Ishan Chhabra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chhabra updated HBASE-11558:
--

Attachment: HBASE_11558-0.98_v2.patch
HBASE_11558-0.96_v2.patch
HBASE_11558_v2.patch

Fixed ProtobufUtil test and enhanced it a bit. PTAL.

 Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+
 ---

 Key: HBASE-11558
 URL: https://issues.apache.org/jira/browse/HBASE-11558
 Project: HBase
  Issue Type: Bug
  Components: mapreduce, Scanners
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
 Fix For: 0.99.0, 0.96.3, 0.98.5, 2.0.0

 Attachments: HBASE_11558-0.96.patch, HBASE_11558-0.96_v2.patch, 
 HBASE_11558-0.98.patch, HBASE_11558-0.98_v2.patch, HBASE_11558.patch, 
 HBASE_11558_v2.patch


 0.94 and before, if one sets caching on the Scan object in the Job by calling 
 scan.setCaching(int) and passes it to TableMapReduceUtil, it is correctly 
 read and used by the mappers during a mapreduce job. This is because 
 Scan.write respects and serializes caching, which is used internally by 
 TableMapReduceUtil to serialize and transfer the scan object to the mappers.
 0.95+, after the move to protobuf, ProtobufUtil.toScan does not respect 
 caching anymore as ClientProtos.Scan does not have the field caching. Caching 
 is passed via the ScanRequest object to the server and so is not needed in 
 the Scan object. However, this breaks application code that relies on the 
 earlier behavior. This will lead to sudden degradation in Scan performance 
 0.96+ for users relying on the old behavior.
 There are 2 options here:
 1. Add caching to Scan object, adding an extra int to the payload for the 
 Scan object which is really not needed in the general case.
 2. Document and preach that TableMapReduceUtil.setScannerCaching must be 
 called by the client.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11558) Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+

2014-07-28 Thread Ishan Chhabra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chhabra updated HBASE-11558:
--

Status: Patch Available  (was: Open)

 Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+
 ---

 Key: HBASE-11558
 URL: https://issues.apache.org/jira/browse/HBASE-11558
 Project: HBase
  Issue Type: Bug
  Components: mapreduce, Scanners
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
 Fix For: 0.99.0, 0.96.3, 0.98.5, 2.0.0

 Attachments: HBASE_11558-0.96.patch, HBASE_11558-0.96_v2.patch, 
 HBASE_11558-0.98.patch, HBASE_11558-0.98_v2.patch, HBASE_11558.patch, 
 HBASE_11558_v2.patch


 0.94 and before, if one sets caching on the Scan object in the Job by calling 
 scan.setCaching(int) and passes it to TableMapReduceUtil, it is correctly 
 read and used by the mappers during a mapreduce job. This is because 
 Scan.write respects and serializes caching, which is used internally by 
 TableMapReduceUtil to serialize and transfer the scan object to the mappers.
 0.95+, after the move to protobuf, ProtobufUtil.toScan does not respect 
 caching anymore as ClientProtos.Scan does not have the field caching. Caching 
 is passed via the ScanRequest object to the server and so is not needed in 
 the Scan object. However, this breaks application code that relies on the 
 earlier behavior. This will lead to sudden degradation in Scan performance 
 0.96+ for users relying on the old behavior.
 There are 2 options here:
 1. Add caching to Scan object, adding an extra int to the payload for the 
 Scan object which is really not needed in the general case.
 2. Document and preach that TableMapReduceUtil.setScannerCaching must be 
 called by the client.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11558) Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+

2014-07-26 Thread Ishan Chhabra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chhabra updated HBASE-11558:
--

Attachment: HBASE_11558.patch
HBASE_11558-0.98.patch
HBASE_11558-0.96.patch

Patch for trunk, 96 and 98.

 Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+
 ---

 Key: HBASE-11558
 URL: https://issues.apache.org/jira/browse/HBASE-11558
 Project: HBase
  Issue Type: Bug
  Components: mapreduce, Scanners
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
 Fix For: 0.99.0, 0.98.5, 2.0.0

 Attachments: HBASE_11558-0.96.patch, HBASE_11558-0.98.patch, 
 HBASE_11558.patch


 0.94 and before, if one sets caching on the Scan object in the Job by calling 
 scan.setCaching(int) and passes it to TableMapReduceUtil, it is correctly 
 read and used by the mappers during a mapreduce job. This is because 
 Scan.write respects and serializes caching, which is used internally by 
 TableMapReduceUtil to serialize and transfer the scan object to the mappers.
 0.95+, after the move to protobuf, ProtobufUtil.toScan does not respect 
 caching anymore as ClientProtos.Scan does not have the field caching. Caching 
 is passed via the ScanRequest object to the server and so is not needed in 
 the Scan object. However, this breaks application code that relies on the 
 earlier behavior. This will lead to sudden degradation in Scan performance 
 0.96+ for users relying on the old behavior.
 There are 2 options here:
 1. Add caching to Scan object, adding an extra int to the payload for the 
 Scan object which is really not needed in the general case.
 2. Document and preach that TableMapReduceUtil.setScannerCaching must be 
 called by the client.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11558) Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+

2014-07-26 Thread Ishan Chhabra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chhabra updated HBASE-11558:
--

Fix Version/s: 0.96.3
   Status: Patch Available  (was: Open)

Attached patch and added 0.96 as a fix version.

 Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+
 ---

 Key: HBASE-11558
 URL: https://issues.apache.org/jira/browse/HBASE-11558
 Project: HBase
  Issue Type: Bug
  Components: mapreduce, Scanners
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
 Fix For: 0.99.0, 0.96.3, 0.98.5, 2.0.0

 Attachments: HBASE_11558-0.96.patch, HBASE_11558-0.98.patch, 
 HBASE_11558.patch


 0.94 and before, if one sets caching on the Scan object in the Job by calling 
 scan.setCaching(int) and passes it to TableMapReduceUtil, it is correctly 
 read and used by the mappers during a mapreduce job. This is because 
 Scan.write respects and serializes caching, which is used internally by 
 TableMapReduceUtil to serialize and transfer the scan object to the mappers.
 0.95+, after the move to protobuf, ProtobufUtil.toScan does not respect 
 caching anymore as ClientProtos.Scan does not have the field caching. Caching 
 is passed via the ScanRequest object to the server and so is not needed in 
 the Scan object. However, this breaks application code that relies on the 
 earlier behavior. This will lead to sudden degradation in Scan performance 
 0.96+ for users relying on the old behavior.
 There are 2 options here:
 1. Add caching to Scan object, adding an extra int to the payload for the 
 Scan object which is really not needed in the general case.
 2. Document and preach that TableMapReduceUtil.setScannerCaching must be 
 called by the client.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11558) Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+

2014-07-24 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073049#comment-14073049
 ] 

Ishan Chhabra commented on HBASE-11558:
---

[~apurtell], If caching is set during a general scan (not MapReduce), it will 
be serialized and sent in the openScanner request even though it is not needed. 
However, it would just be 3-4 bytes more overhead, and only in the openScanner 
call and not the next call. 

If this is ok, I would be happy to put a patch up.

 Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+
 ---

 Key: HBASE-11558
 URL: https://issues.apache.org/jira/browse/HBASE-11558
 Project: HBase
  Issue Type: Bug
  Components: mapreduce, Scanners
Reporter: Ishan Chhabra
Assignee: Andrew Purtell
 Fix For: 0.99.0, 0.98.5, 2.0.0


 0.94 and before, if one sets caching on the Scan object in the Job by calling 
 scan.setCaching(int) and passes it to TableMapReduceUtil, it is correctly 
 read and used by the mappers during a mapreduce job. This is because 
 Scan.write respects and serializes caching, which is used internally by 
 TableMapReduceUtil to serialize and transfer the scan object to the mappers.
 0.95+, after the move to protobuf, ProtobufUtil.toScan does not respect 
 caching anymore as ClientProtos.Scan does not have the field caching. Caching 
 is passed via the ScanRequest object to the server and so is not needed in 
 the Scan object. However, this breaks application code that relies on the 
 earlier behavior. This will lead to sudden degradation in Scan performance 
 0.96+ for users relying on the old behavior.
 There are 2 options here:
 1. Add caching to Scan object, adding an extra int to the payload for the 
 Scan object which is really not needed in the general case.
 2. Document and preach that TableMapReduceUtil.setScannerCaching must be 
 called by the client.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11552) Read/Write requests count metric value is too short

2014-07-21 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068959#comment-14068959
 ] 

Ishan Chhabra commented on HBASE-11552:
---

This duplicates https://issues.apache.org/jira/browse/HBASE-10645

 Read/Write requests count metric value is too short
 ---

 Key: HBASE-11552
 URL: https://issues.apache.org/jira/browse/HBASE-11552
 Project: HBase
  Issue Type: Bug
  Components: metrics
Affects Versions: 0.94.21
Reporter: Adrian Muraru
 Fix For: 0.94.22

 Attachments: HBASE-11552_0.94_v1.diff


 I am using {{readRequestsCount}} and {{writeRequestsCount}} counters to plot 
 HBase activity in opentsdb and noticed that they are exported as int value 
 although the underlying counter backed by a {{long}} counter.
 Metric should be a {{long}} as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11558) Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+

2014-07-21 Thread Ishan Chhabra (JIRA)
Ishan Chhabra created HBASE-11558:
-

 Summary: Caching set on Scan object gets lost when using 
TableMapReduceUtil in 0.95+
 Key: HBASE-11558
 URL: https://issues.apache.org/jira/browse/HBASE-11558
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Affects Versions: 0.95.0
Reporter: Ishan Chhabra


0.94 and before, if one sets caching on the Scan object in the Job by calling 
scan.setCaching(int) and passes it to TableMapReduceUtil, it is correctly read 
and used by the mappers during a mapreduce job. This is because Scan.write 
respects and serializes caching, which is used internally by TableMapReduceUtil 
to serialize and transfer the scan object to the mappers.

0.95+, after the move to protobuf, ProtobufUtil.toScan does not respect caching 
anymore as ClientProtos.Scan does not have the field caching. Caching is passed 
via the ScanRequest object to the server and so is not needed in the Scan 
object. However, this breaks application code that relies on the earlier 
behavior. This will lead to sudden degradation in Scan performance 0.96+ for 
users relying on the old behavior.

There are 2 options here:
1. Add caching to Scan object, adding an extra int to the payload for the Scan 
object which is really not needed in the general case.
2. Document and preach that TableMapReduceUtil.setScannerCaching must be called 
by the client.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11558) Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+

2014-07-21 Thread Ishan Chhabra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chhabra updated HBASE-11558:
--

Affects Version/s: 0.98.0
   0.96.0

 Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+
 ---

 Key: HBASE-11558
 URL: https://issues.apache.org/jira/browse/HBASE-11558
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Affects Versions: 0.98.0, 0.95.0, 0.96.0
Reporter: Ishan Chhabra

 0.94 and before, if one sets caching on the Scan object in the Job by calling 
 scan.setCaching(int) and passes it to TableMapReduceUtil, it is correctly 
 read and used by the mappers during a mapreduce job. This is because 
 Scan.write respects and serializes caching, which is used internally by 
 TableMapReduceUtil to serialize and transfer the scan object to the mappers.
 0.95+, after the move to protobuf, ProtobufUtil.toScan does not respect 
 caching anymore as ClientProtos.Scan does not have the field caching. Caching 
 is passed via the ScanRequest object to the server and so is not needed in 
 the Scan object. However, this breaks application code that relies on the 
 earlier behavior. This will lead to sudden degradation in Scan performance 
 0.96+ for users relying on the old behavior.
 There are 2 options here:
 1. Add caching to Scan object, adding an extra int to the payload for the 
 Scan object which is really not needed in the general case.
 2. Document and preach that TableMapReduceUtil.setScannerCaching must be 
 called by the client.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11558) Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+

2014-07-21 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069719#comment-14069719
 ] 

Ishan Chhabra commented on HBASE-11558:
---

Unfortunately our configuration has this value set to 1 (carried over from the 
default in 0.94) and we faced this issue. Another fellow in the mailing list 
got perplexed because of this (not sure if he went from 5000 to 100 or 5000 to 
1).

 Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+
 ---

 Key: HBASE-11558
 URL: https://issues.apache.org/jira/browse/HBASE-11558
 Project: HBase
  Issue Type: Bug
  Components: mapreduce, Scanners
Affects Versions: 0.98.0, 0.95.0, 0.96.0
Reporter: Ishan Chhabra

 0.94 and before, if one sets caching on the Scan object in the Job by calling 
 scan.setCaching(int) and passes it to TableMapReduceUtil, it is correctly 
 read and used by the mappers during a mapreduce job. This is because 
 Scan.write respects and serializes caching, which is used internally by 
 TableMapReduceUtil to serialize and transfer the scan object to the mappers.
 0.95+, after the move to protobuf, ProtobufUtil.toScan does not respect 
 caching anymore as ClientProtos.Scan does not have the field caching. Caching 
 is passed via the ScanRequest object to the server and so is not needed in 
 the Scan object. However, this breaks application code that relies on the 
 earlier behavior. This will lead to sudden degradation in Scan performance 
 0.96+ for users relying on the old behavior.
 There are 2 options here:
 1. Add caching to Scan object, adding an extra int to the payload for the 
 Scan object which is really not needed in the general case.
 2. Document and preach that TableMapReduceUtil.setScannerCaching must be 
 called by the client.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10646) Enable security features by default for 1.0

2014-06-03 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017200#comment-14017200
 ] 

Ishan Chhabra commented on HBASE-10646:
---

Will main RPCs like Get, Put, etc (apart from the admin RPCs) also be secured 
after that change? Any extra overhead in these RPCs would be unacceptable in 
our use case. 

Also, +1 for a simple security = false option. I believe many users don't need 
security and any extra overhead (in terms of deployment complexity or runtime 
overheads) would not be preferable. 

 Enable security features by default for 1.0
 ---

 Key: HBASE-10646
 URL: https://issues.apache.org/jira/browse/HBASE-10646
 Project: HBase
  Issue Type: Task
Affects Versions: 0.99.0
Reporter: Andrew Purtell

 As discussed in the last PMC meeting, we should enable security features by 
 default in 1.0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10921) Port HBASE-10323 'Auto detect data block encoding in HFileOutputFormat' to 0.94 / 0.96

2014-04-08 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963297#comment-13963297
 ] 

Ishan Chhabra commented on HBASE-10921:
---

HBASE-10323 had a patch attached for 0.94. Is  the one attached here the same, 
just rebased on top of 0.94 head?

 Port HBASE-10323 'Auto detect data block encoding in HFileOutputFormat' to 
 0.94 / 0.96
 --

 Key: HBASE-10921
 URL: https://issues.apache.org/jira/browse/HBASE-10921
 Project: HBase
  Issue Type: Task
Affects Versions: 0.96.2, 0.94.18
Reporter: Ted Yu
Assignee: Kashif J S
 Fix For: 0.94.19, 0.96.3

 Attachments: HBASE-10921-0.94-v1.patch, HBASE-10921-0.96-v1.patch


 This issue is to backport auto detection of data block encoding in 
 HFileOutputFormat to 0.94 and 0.96 branches.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10380) Add bytesBinary and filter options to CopyTable

2014-01-22 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878402#comment-13878402
 ] 

Ishan Chhabra commented on HBASE-10380:
---

Sure. I didn't know about ParseFiler. I tried to build my own textual language 
initially, but it became complicated quickly.

 Add bytesBinary and filter options to CopyTable
 ---

 Key: HBASE-10380
 URL: https://issues.apache.org/jira/browse/HBASE-10380
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
Priority: Minor
 Attachments: HBASE_10380_0.94-v1.patch


 Add options in CopyTable to:
 1. Specify the start and stop row in bytesBinary format 
 2. Use filters



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HBASE-10380) Add bytesBinary and filter options to CopyTable

2014-01-20 Thread Ishan Chhabra (JIRA)
Ishan Chhabra created HBASE-10380:
-

 Summary: Add bytesBinary and filter options to CopyTable
 Key: HBASE-10380
 URL: https://issues.apache.org/jira/browse/HBASE-10380
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10380) Add bytesBinary and filter options to CopyTable

2014-01-20 Thread Ishan Chhabra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chhabra updated HBASE-10380:
--

Status: Patch Available  (was: Open)

 Add bytesBinary and filter options to CopyTable
 ---

 Key: HBASE-10380
 URL: https://issues.apache.org/jira/browse/HBASE-10380
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
Priority: Minor

 Add options in CopyTable to:
 1. specify the start and stop row in bytesBinary format 
 2. Use filters



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10380) Add bytesBinary and filter options to CopyTable

2014-01-20 Thread Ishan Chhabra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chhabra updated HBASE-10380:
--

Description: 
Add options in CopyTable to:
1. specify the start and stop row in bytesBinary format 
2. Use filters

 Add bytesBinary and filter options to CopyTable
 ---

 Key: HBASE-10380
 URL: https://issues.apache.org/jira/browse/HBASE-10380
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
Priority: Minor

 Add options in CopyTable to:
 1. specify the start and stop row in bytesBinary format 
 2. Use filters



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10380) Add bytesBinary and filter options to CopyTable

2014-01-20 Thread Ishan Chhabra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chhabra updated HBASE-10380:
--

Description: 
Add options in CopyTable to:
1. Specify the start and stop row in bytesBinary format 
2. Use filters

  was:
Add options in CopyTable to:
1. specify the start and stop row in bytesBinary format 
2. Use filters


 Add bytesBinary and filter options to CopyTable
 ---

 Key: HBASE-10380
 URL: https://issues.apache.org/jira/browse/HBASE-10380
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
Priority: Minor
 Attachments: HBASE_10380_0.94-v1.patch


 Add options in CopyTable to:
 1. Specify the start and stop row in bytesBinary format 
 2. Use filters



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10380) Add bytesBinary and filter options to CopyTable

2014-01-20 Thread Ishan Chhabra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chhabra updated HBASE-10380:
--

Attachment: HBASE_10380_0.94-v1.patch

 Add bytesBinary and filter options to CopyTable
 ---

 Key: HBASE-10380
 URL: https://issues.apache.org/jira/browse/HBASE-10380
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
Priority: Minor
 Attachments: HBASE_10380_0.94-v1.patch


 Add options in CopyTable to:
 1. specify the start and stop row in bytesBinary format 
 2. Use filters



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10380) Add bytesBinary and filter options to CopyTable

2014-01-20 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876257#comment-13876257
 ] 

Ishan Chhabra commented on HBASE-10380:
---

For filters, the patch allows one to specify a file containing the filter in a 
serialized form. This seemed to be the only generic way to specify filters and 
allows complex filters (including filter lists).

 Add bytesBinary and filter options to CopyTable
 ---

 Key: HBASE-10380
 URL: https://issues.apache.org/jira/browse/HBASE-10380
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
Priority: Minor
 Attachments: HBASE_10380_0.94-v1.patch


 Add options in CopyTable to:
 1. Specify the start and stop row in bytesBinary format 
 2. Use filters



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10380) Add bytesBinary and filter options to CopyTable

2014-01-20 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876259#comment-13876259
 ] 

Ishan Chhabra commented on HBASE-10380:
---

If the approach looks good, then I can build a patch for trunk.

 Add bytesBinary and filter options to CopyTable
 ---

 Key: HBASE-10380
 URL: https://issues.apache.org/jira/browse/HBASE-10380
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
Priority: Minor
 Attachments: HBASE_10380_0.94-v1.patch


 Add options in CopyTable to:
 1. Specify the start and stop row in bytesBinary format 
 2. Use filters



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10380) Add bytesBinary and filter options to CopyTable

2014-01-20 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876880#comment-13876880
 ] 

Ishan Chhabra commented on HBASE-10380:
---

Ok. Ill submit a trunk patch then. I tend to create 0.94 patches first since we 
are running that internally.

 Add bytesBinary and filter options to CopyTable
 ---

 Key: HBASE-10380
 URL: https://issues.apache.org/jira/browse/HBASE-10380
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
Priority: Minor
 Attachments: HBASE_10380_0.94-v1.patch


 Add options in CopyTable to:
 1. Specify the start and stop row in bytesBinary format 
 2. Use filters



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10323) Auto detect data block encoding in HFileOutputFormat

2014-01-19 Thread Ishan Chhabra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chhabra updated HBASE-10323:
--

Attachment: HBASE_10323-trunk-v4.patch
HBASE_10323-0.94.15-v5.patch

 Auto detect data block encoding in HFileOutputFormat
 

 Key: HBASE-10323
 URL: https://issues.apache.org/jira/browse/HBASE-10323
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
 Fix For: 0.99.0

 Attachments: HBASE_10323-0.94.15-v1.patch, 
 HBASE_10323-0.94.15-v2.patch, HBASE_10323-0.94.15-v3.patch, 
 HBASE_10323-0.94.15-v4.patch, HBASE_10323-0.94.15-v5.patch, 
 HBASE_10323-trunk-v1.patch, HBASE_10323-trunk-v2.patch, 
 HBASE_10323-trunk-v3.patch, HBASE_10323-trunk-v4.patch


 Currently, one has to specify the data block encoding of the table explicitly 
 using the config parameter 
 hbase.mapreduce.hfileoutputformat.datablock.encoding when doing a bulkload 
 load. This option is easily missed, not documented and also works differently 
 than compression, block size and bloom filter type, which are auto detected. 
 The solution would be to add support to auto detect datablock encoding 
 similar to other parameters. 
 The current patch does the following:
 1. Automatically detects datablock encoding in HFileOutputFormat.
 2. Keeps the legacy option of manually specifying the datablock encoding
 around as a method to override auto detections.
 3. Moves string conf parsing to the start of the program so that it fails
 fast during starting up instead of failing during record writes. It also
 makes the internals of the program type safe.
 4. Adds missing doc strings and unit tests for code serializing and
 deserializing config paramerters for bloom filer type, block size and
 datablock encoding.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10323) Auto detect data block encoding in HFileOutputFormat

2014-01-19 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876054#comment-13876054
 ] 

Ishan Chhabra commented on HBASE-10323:
---

Added the @VisibleForTesting annotations where needed and fixed the '{' in 
newline. I didn't make the constants package-private since no other class needs 
them at the moment. When some other class in the package or a test needs it, 
they could be made package private then. 

 Auto detect data block encoding in HFileOutputFormat
 

 Key: HBASE-10323
 URL: https://issues.apache.org/jira/browse/HBASE-10323
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
 Fix For: 0.99.0

 Attachments: HBASE_10323-0.94.15-v1.patch, 
 HBASE_10323-0.94.15-v2.patch, HBASE_10323-0.94.15-v3.patch, 
 HBASE_10323-0.94.15-v4.patch, HBASE_10323-0.94.15-v5.patch, 
 HBASE_10323-trunk-v1.patch, HBASE_10323-trunk-v2.patch, 
 HBASE_10323-trunk-v3.patch, HBASE_10323-trunk-v4.patch


 Currently, one has to specify the data block encoding of the table explicitly 
 using the config parameter 
 hbase.mapreduce.hfileoutputformat.datablock.encoding when doing a bulkload 
 load. This option is easily missed, not documented and also works differently 
 than compression, block size and bloom filter type, which are auto detected. 
 The solution would be to add support to auto detect datablock encoding 
 similar to other parameters. 
 The current patch does the following:
 1. Automatically detects datablock encoding in HFileOutputFormat.
 2. Keeps the legacy option of manually specifying the datablock encoding
 around as a method to override auto detections.
 3. Moves string conf parsing to the start of the program so that it fails
 fast during starting up instead of failing during record writes. It also
 makes the internals of the program type safe.
 4. Adds missing doc strings and unit tests for code serializing and
 deserializing config paramerters for bloom filer type, block size and
 datablock encoding.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10323) Auto detect data block encoding in HFileOutputFormat

2014-01-15 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13873071#comment-13873071
 ] 

Ishan Chhabra commented on HBASE-10323:
---

Can someone else looks and +1? [~lhofhansl]?

 Auto detect data block encoding in HFileOutputFormat
 

 Key: HBASE-10323
 URL: https://issues.apache.org/jira/browse/HBASE-10323
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
 Fix For: 0.99.0

 Attachments: HBASE_10323-0.94.15-v1.patch, 
 HBASE_10323-0.94.15-v2.patch, HBASE_10323-0.94.15-v3.patch, 
 HBASE_10323-0.94.15-v4.patch, HBASE_10323-trunk-v1.patch, 
 HBASE_10323-trunk-v2.patch, HBASE_10323-trunk-v3.patch


 Currently, one has to specify the data block encoding of the table explicitly 
 using the config parameter 
 hbase.mapreduce.hfileoutputformat.datablock.encoding when doing a bulkload 
 load. This option is easily missed, not documented and also works differently 
 than compression, block size and bloom filter type, which are auto detected. 
 The solution would be to add support to auto detect datablock encoding 
 similar to other parameters. 
 The current patch does the following:
 1. Automatically detects datablock encoding in HFileOutputFormat.
 2. Keeps the legacy option of manually specifying the datablock encoding
 around as a method to override auto detections.
 3. Moves string conf parsing to the start of the program so that it fails
 fast during starting up instead of failing during record writes. It also
 makes the internals of the program type safe.
 4. Adds missing doc strings and unit tests for code serializing and
 deserializing config paramerters for bloom filer type, block size and
 datablock encoding.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10323) Auto detect data block encoding in HFileOutputFormat

2014-01-14 Thread Ishan Chhabra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chhabra updated HBASE-10323:
--

Attachment: HBASE_10323-trunk-v3.patch
HBASE_10323-0.94.15-v4.patch

Changed trunk patch to work directly with DataBlockingEncoding instead of 
HfileDataBlockEncoder.

 Auto detect data block encoding in HFileOutputFormat
 

 Key: HBASE-10323
 URL: https://issues.apache.org/jira/browse/HBASE-10323
 Project: HBase
  Issue Type: Improvement
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
 Fix For: 0.99.0

 Attachments: HBASE_10323-0.94.15-v1.patch, 
 HBASE_10323-0.94.15-v2.patch, HBASE_10323-0.94.15-v3.patch, 
 HBASE_10323-0.94.15-v4.patch, HBASE_10323-trunk-v1.patch, 
 HBASE_10323-trunk-v2.patch, HBASE_10323-trunk-v3.patch


 Currently, one has to specify the data block encoding of the table explicitly 
 using the config parameter 
 hbase.mapreduce.hfileoutputformat.datablock.encoding when doing a bulkload 
 load. This option is easily missed, not documented and also works differently 
 than compression, block size and bloom filter type, which are auto detected. 
 The solution would be to add support to auto detect datablock encoding 
 similar to other parameters. 
 The current patch does the following:
 1. Automatically detects datablock encoding in HFileOutputFormat.
 2. Keeps the legacy option of manually specifying the datablock encoding
 around as a method to override auto detections.
 3. Moves string conf parsing to the start of the program so that it fails
 fast during starting up instead of failing during record writes. It also
 makes the internals of the program type safe.
 4. Adds missing doc strings and unit tests for code serializing and
 deserializing config paramerters for bloom filer type, block size and
 datablock encoding.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HBASE-10323) Auto detect data block encoding in HFileOutputFormat

2014-01-12 Thread Ishan Chhabra (JIRA)
Ishan Chhabra created HBASE-10323:
-

 Summary: Auto detect data block encoding in HFileOutputFormat
 Key: HBASE-10323
 URL: https://issues.apache.org/jira/browse/HBASE-10323
 Project: HBase
  Issue Type: Improvement
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra


Currently, one has to specify the data block encoding of the table explicitly 
using the config parameter 
hbase.mapreduce.hfileoutputformat.datablock.encoding when doing a bulkload 
load. This option is easily missed, not documented and also works differently 
than compression, block size and bloom filter type, which are auto detected. 

The solution would be to add support to auto detect datablock encoding similar 
to other parameters. 

The current patch does the following:
1. Automatically detects datablock encoding in HFileOutputFormat.
2. Keeps the legacy option of manually specifying the datablock encoding
around as a method to override auto detections.
3. Moves string conf parsing to the start of the program so that it fails
fast during starting up instead of failing during record writes. It also
makes the internals of the program type safe.
4. Adds missing doc strings and unit tests for code serializing and
deserializing config paramerters for bloom filer type, block size and
datablock encoding.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10323) Auto detect data block encoding in HFileOutputFormat

2014-01-12 Thread Ishan Chhabra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chhabra updated HBASE-10323:
--

Status: Patch Available  (was: Open)

 Auto detect data block encoding in HFileOutputFormat
 

 Key: HBASE-10323
 URL: https://issues.apache.org/jira/browse/HBASE-10323
 Project: HBase
  Issue Type: Improvement
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
 Attachments: HBASE_10323-0.94.15-v1.patch


 Currently, one has to specify the data block encoding of the table explicitly 
 using the config parameter 
 hbase.mapreduce.hfileoutputformat.datablock.encoding when doing a bulkload 
 load. This option is easily missed, not documented and also works differently 
 than compression, block size and bloom filter type, which are auto detected. 
 The solution would be to add support to auto detect datablock encoding 
 similar to other parameters. 
 The current patch does the following:
 1. Automatically detects datablock encoding in HFileOutputFormat.
 2. Keeps the legacy option of manually specifying the datablock encoding
 around as a method to override auto detections.
 3. Moves string conf parsing to the start of the program so that it fails
 fast during starting up instead of failing during record writes. It also
 makes the internals of the program type safe.
 4. Adds missing doc strings and unit tests for code serializing and
 deserializing config paramerters for bloom filer type, block size and
 datablock encoding.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10323) Auto detect data block encoding in HFileOutputFormat

2014-01-12 Thread Ishan Chhabra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chhabra updated HBASE-10323:
--

Attachment: HBASE_10323-0.94.15-v1.patch

 Auto detect data block encoding in HFileOutputFormat
 

 Key: HBASE-10323
 URL: https://issues.apache.org/jira/browse/HBASE-10323
 Project: HBase
  Issue Type: Improvement
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
 Attachments: HBASE_10323-0.94.15-v1.patch


 Currently, one has to specify the data block encoding of the table explicitly 
 using the config parameter 
 hbase.mapreduce.hfileoutputformat.datablock.encoding when doing a bulkload 
 load. This option is easily missed, not documented and also works differently 
 than compression, block size and bloom filter type, which are auto detected. 
 The solution would be to add support to auto detect datablock encoding 
 similar to other parameters. 
 The current patch does the following:
 1. Automatically detects datablock encoding in HFileOutputFormat.
 2. Keeps the legacy option of manually specifying the datablock encoding
 around as a method to override auto detections.
 3. Moves string conf parsing to the start of the program so that it fails
 fast during starting up instead of failing during record writes. It also
 makes the internals of the program type safe.
 4. Adds missing doc strings and unit tests for code serializing and
 deserializing config paramerters for bloom filer type, block size and
 datablock encoding.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10323) Auto detect data block encoding in HFileOutputFormat

2014-01-12 Thread Ishan Chhabra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chhabra updated HBASE-10323:
--

Attachment: HBASE_10323-trunk-v1.patch
HBASE_10323-0.94.15-v2.patch

 Auto detect data block encoding in HFileOutputFormat
 

 Key: HBASE-10323
 URL: https://issues.apache.org/jira/browse/HBASE-10323
 Project: HBase
  Issue Type: Improvement
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
 Attachments: HBASE_10323-0.94.15-v1.patch, 
 HBASE_10323-0.94.15-v2.patch, HBASE_10323-trunk-v1.patch


 Currently, one has to specify the data block encoding of the table explicitly 
 using the config parameter 
 hbase.mapreduce.hfileoutputformat.datablock.encoding when doing a bulkload 
 load. This option is easily missed, not documented and also works differently 
 than compression, block size and bloom filter type, which are auto detected. 
 The solution would be to add support to auto detect datablock encoding 
 similar to other parameters. 
 The current patch does the following:
 1. Automatically detects datablock encoding in HFileOutputFormat.
 2. Keeps the legacy option of manually specifying the datablock encoding
 around as a method to override auto detections.
 3. Moves string conf parsing to the start of the program so that it fails
 fast during starting up instead of failing during record writes. It also
 makes the internals of the program type safe.
 4. Adds missing doc strings and unit tests for code serializing and
 deserializing config paramerters for bloom filer type, block size and
 datablock encoding.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10323) Auto detect data block encoding in HFileOutputFormat

2014-01-12 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869184#comment-13869184
 ] 

Ishan Chhabra commented on HBASE-10323:
---

Added javadoc for parameters and uploaded patch for trunk.

[~lhofhansl], what else shoud be auto detected? I can add that as a part of 
this or a separate JIRA.

 Auto detect data block encoding in HFileOutputFormat
 

 Key: HBASE-10323
 URL: https://issues.apache.org/jira/browse/HBASE-10323
 Project: HBase
  Issue Type: Improvement
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
 Attachments: HBASE_10323-0.94.15-v1.patch, 
 HBASE_10323-0.94.15-v2.patch, HBASE_10323-trunk-v1.patch


 Currently, one has to specify the data block encoding of the table explicitly 
 using the config parameter 
 hbase.mapreduce.hfileoutputformat.datablock.encoding when doing a bulkload 
 load. This option is easily missed, not documented and also works differently 
 than compression, block size and bloom filter type, which are auto detected. 
 The solution would be to add support to auto detect datablock encoding 
 similar to other parameters. 
 The current patch does the following:
 1. Automatically detects datablock encoding in HFileOutputFormat.
 2. Keeps the legacy option of manually specifying the datablock encoding
 around as a method to override auto detections.
 3. Moves string conf parsing to the start of the program so that it fails
 fast during starting up instead of failing during record writes. It also
 makes the internals of the program type safe.
 4. Adds missing doc strings and unit tests for code serializing and
 deserializing config paramerters for bloom filer type, block size and
 datablock encoding.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10323) Auto detect data block encoding in HFileOutputFormat

2014-01-12 Thread Ishan Chhabra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chhabra updated HBASE-10323:
--

Attachment: HBASE_10323-trunk-v2.patch
HBASE_10323-0.94.15-v3.patch

 Auto detect data block encoding in HFileOutputFormat
 

 Key: HBASE-10323
 URL: https://issues.apache.org/jira/browse/HBASE-10323
 Project: HBase
  Issue Type: Improvement
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
 Attachments: HBASE_10323-0.94.15-v1.patch, 
 HBASE_10323-0.94.15-v2.patch, HBASE_10323-0.94.15-v3.patch, 
 HBASE_10323-trunk-v1.patch, HBASE_10323-trunk-v2.patch


 Currently, one has to specify the data block encoding of the table explicitly 
 using the config parameter 
 hbase.mapreduce.hfileoutputformat.datablock.encoding when doing a bulkload 
 load. This option is easily missed, not documented and also works differently 
 than compression, block size and bloom filter type, which are auto detected. 
 The solution would be to add support to auto detect datablock encoding 
 similar to other parameters. 
 The current patch does the following:
 1. Automatically detects datablock encoding in HFileOutputFormat.
 2. Keeps the legacy option of manually specifying the datablock encoding
 around as a method to override auto detections.
 3. Moves string conf parsing to the start of the program so that it fails
 fast during starting up instead of failing during record writes. It also
 makes the internals of the program type safe.
 4. Adds missing doc strings and unit tests for code serializing and
 deserializing config paramerters for bloom filer type, block size and
 datablock encoding.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10323) Auto detect data block encoding in HFileOutputFormat

2014-01-12 Thread Ishan Chhabra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chhabra updated HBASE-10323:
--

Attachment: (was: HBASE_10323-trunk-v2.patch)

 Auto detect data block encoding in HFileOutputFormat
 

 Key: HBASE-10323
 URL: https://issues.apache.org/jira/browse/HBASE-10323
 Project: HBase
  Issue Type: Improvement
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
 Attachments: HBASE_10323-0.94.15-v1.patch, 
 HBASE_10323-0.94.15-v2.patch, HBASE_10323-0.94.15-v3.patch, 
 HBASE_10323-trunk-v1.patch


 Currently, one has to specify the data block encoding of the table explicitly 
 using the config parameter 
 hbase.mapreduce.hfileoutputformat.datablock.encoding when doing a bulkload 
 load. This option is easily missed, not documented and also works differently 
 than compression, block size and bloom filter type, which are auto detected. 
 The solution would be to add support to auto detect datablock encoding 
 similar to other parameters. 
 The current patch does the following:
 1. Automatically detects datablock encoding in HFileOutputFormat.
 2. Keeps the legacy option of manually specifying the datablock encoding
 around as a method to override auto detections.
 3. Moves string conf parsing to the start of the program so that it fails
 fast during starting up instead of failing during record writes. It also
 makes the internals of the program type safe.
 4. Adds missing doc strings and unit tests for code serializing and
 deserializing config paramerters for bloom filer type, block size and
 datablock encoding.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10323) Auto detect data block encoding in HFileOutputFormat

2014-01-12 Thread Ishan Chhabra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chhabra updated HBASE-10323:
--

Attachment: HBASE_10323-trunk-v2.patch

 Auto detect data block encoding in HFileOutputFormat
 

 Key: HBASE-10323
 URL: https://issues.apache.org/jira/browse/HBASE-10323
 Project: HBase
  Issue Type: Improvement
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
 Attachments: HBASE_10323-0.94.15-v1.patch, 
 HBASE_10323-0.94.15-v2.patch, HBASE_10323-0.94.15-v3.patch, 
 HBASE_10323-trunk-v1.patch, HBASE_10323-trunk-v2.patch


 Currently, one has to specify the data block encoding of the table explicitly 
 using the config parameter 
 hbase.mapreduce.hfileoutputformat.datablock.encoding when doing a bulkload 
 load. This option is easily missed, not documented and also works differently 
 than compression, block size and bloom filter type, which are auto detected. 
 The solution would be to add support to auto detect datablock encoding 
 similar to other parameters. 
 The current patch does the following:
 1. Automatically detects datablock encoding in HFileOutputFormat.
 2. Keeps the legacy option of manually specifying the datablock encoding
 around as a method to override auto detections.
 3. Moves string conf parsing to the start of the program so that it fails
 fast during starting up instead of failing during record writes. It also
 makes the internals of the program type safe.
 4. Adds missing doc strings and unit tests for code serializing and
 deserializing config paramerters for bloom filer type, block size and
 datablock encoding.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10323) Auto detect data block encoding in HFileOutputFormat

2014-01-12 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869325#comment-13869325
 ] 

Ishan Chhabra commented on HBASE-10323:
---

I was able to run the maven site successfully on my box. Can't figure out why 
it is failing based on the console output. Can somebody help?

 Auto detect data block encoding in HFileOutputFormat
 

 Key: HBASE-10323
 URL: https://issues.apache.org/jira/browse/HBASE-10323
 Project: HBase
  Issue Type: Improvement
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
 Attachments: HBASE_10323-0.94.15-v1.patch, 
 HBASE_10323-0.94.15-v2.patch, HBASE_10323-0.94.15-v3.patch, 
 HBASE_10323-trunk-v1.patch, HBASE_10323-trunk-v2.patch


 Currently, one has to specify the data block encoding of the table explicitly 
 using the config parameter 
 hbase.mapreduce.hfileoutputformat.datablock.encoding when doing a bulkload 
 load. This option is easily missed, not documented and also works differently 
 than compression, block size and bloom filter type, which are auto detected. 
 The solution would be to add support to auto detect datablock encoding 
 similar to other parameters. 
 The current patch does the following:
 1. Automatically detects datablock encoding in HFileOutputFormat.
 2. Keeps the legacy option of manually specifying the datablock encoding
 around as a method to override auto detections.
 3. Moves string conf parsing to the start of the program so that it fails
 fast during starting up instead of failing during record writes. It also
 makes the internals of the program type safe.
 4. Adds missing doc strings and unit tests for code serializing and
 deserializing config paramerters for bloom filer type, block size and
 datablock encoding.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-9934) Suport not forwarding edits in replication

2013-11-11 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13819329#comment-13819329
 ] 

Ishan Chhabra commented on HBASE-9934:
--

I agree. 

Let me setup a graph of HBase clusters for further discussion. Lets say we have 
6 clusters, C1..C6.
C1..C3 want to replicate to each other in the NxN fashion (i.e. with no 
forwarding) and the following master master replications are setup: C4 - C3, 
C5 - C3 and C6 - C5

C4 - C3 - C5 - C6
   /\
C1 - C2

1. The source cluster sets the replication scope for the column family that is 
being replicated as SINGLE_HOP_ONLY and the targets clusters do not forward 
KeyValues with families that have their scope set as this. The decision on 
whether a write should be forwarded ahead thus lies with the source cluster in 
this case. This will work if we just had C1..C3 but will *not* work in the 
above example since a write from C4 will not have the correct scope set and 
will circulate more in C1..C3 than needed. We could say that such a mixed 
setting is not supported, but it is hard to prevent someone from shooting 
themselves in their foot.

2.  Having thought about this, this looks more like a property of the link 
(whether it is a part of mesh network or a standard Master-Slave / 
Master-Master link). If a link is a part of the mesh network, then *all* writes 
coming from that link (including ones that could have originated at a different 
cluster) should not be forwarded. To do this setup, we would have to add this 
as a property for the peer (in the zookeeper state?) and then when an edit is 
sent across a link, it should be marked as do not forward. This could be a 
part of the WALEdit key or we could add support for tags to WALEdit (similar to 
cells) and add it there.

Thoughts?

 Suport not forwarding edits in replication
 --

 Key: HBASE-9934
 URL: https://issues.apache.org/jira/browse/HBASE-9934
 Project: HBase
  Issue Type: New Feature
  Components: Replication
Reporter: Ishan Chhabra
Priority: Minor

 This is to setup NxN replication.
 See background discussion here: 
 http://mail-archives.apache.org/mod_mbox/hbase-user/201311.mbox/%3CCAOiuM-4UMmLA7UHMp4hhjpLWUrHDxg1t4tN4aWvnZUMcTxG%2BKQ%40mail.gmail.com%3E
 We can add a new mode in replication to not forward edits from other 
 clusters. Not sure what should be done when some clusters are configured with 
 this setting and some aren't.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9934) Mesh replication (a.k.a. multi master replication)

2013-11-11 Thread Ishan Chhabra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chhabra updated HBASE-9934:
-

Summary: Mesh replication (a.k.a. multi master replication)  (was: Suport 
not forwarding edits in replication)

 Mesh replication (a.k.a. multi master replication)
 --

 Key: HBASE-9934
 URL: https://issues.apache.org/jira/browse/HBASE-9934
 Project: HBase
  Issue Type: New Feature
  Components: Replication
Reporter: Ishan Chhabra
Priority: Minor

 This is to setup NxN replication.
 See background discussion here: 
 http://mail-archives.apache.org/mod_mbox/hbase-user/201311.mbox/%3CCAOiuM-4UMmLA7UHMp4hhjpLWUrHDxg1t4tN4aWvnZUMcTxG%2BKQ%40mail.gmail.com%3E
 We can add a new mode in replication to not forward edits from other 
 clusters. Not sure what should be done when some clusters are configured with 
 this setting and some aren't.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HBASE-9950) Row level replication

2013-11-11 Thread Ishan Chhabra (JIRA)
Ishan Chhabra created HBASE-9950:


 Summary: Row level replication
 Key: HBASE-9950
 URL: https://issues.apache.org/jira/browse/HBASE-9950
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Ishan Chhabra
Priority: Minor


We have a replication setup with the same table and column family being present 
in multiple data centers. Currently, all of them have exactly the same data, 
but each cluster doesn't need all the data. Rows need to be present in only x 
out of the total y clusters. This information varies at the row level and thus 
more granular replication cannot be achieved by setting up cluster level 
replication. 

Adding row level replication should solve this.





--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9934) Mesh replication (a.k.a. multi master replication)

2013-11-11 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13819517#comment-13819517
 ] 

Ishan Chhabra commented on HBASE-9934:
--

Changed the title to Mesh replication as that describes the feature request 
better. Multi-master replication may not be the best term but is used by the 
DBA community for these setups for MySQL or other RDBMS.

 Mesh replication (a.k.a. multi master replication)
 --

 Key: HBASE-9934
 URL: https://issues.apache.org/jira/browse/HBASE-9934
 Project: HBase
  Issue Type: New Feature
  Components: Replication
Reporter: Ishan Chhabra
Priority: Minor

 This is to setup NxN replication.
 See background discussion here: 
 http://mail-archives.apache.org/mod_mbox/hbase-user/201311.mbox/%3CCAOiuM-4UMmLA7UHMp4hhjpLWUrHDxg1t4tN4aWvnZUMcTxG%2BKQ%40mail.gmail.com%3E
 We can add a new mode in replication to not forward edits from other 
 clusters. Not sure what should be done when some clusters are configured with 
 this setting and some aren't.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HBASE-9951) Tags for HLogKey

2013-11-11 Thread Ishan Chhabra (JIRA)
Ishan Chhabra created HBASE-9951:


 Summary: Tags for HLogKey
 Key: HBASE-9951
 URL: https://issues.apache.org/jira/browse/HBASE-9951
 Project: HBase
  Issue Type: New Feature
  Components: wal
Reporter: Ishan Chhabra


Similar to the Cell Interface, adding tags for the HLogKey could be useful for 
multiple scenarios. 

My primary use cases are driven from replication though:
1. To record whether a WALEdit should be forwarded further to other clusters 
(see [#HBASE-9934])
2. To record which clusters the WALEdit should be forwarded to (for Row level 
replication)
3. To mark a record for not replicating (these are some special cases in our 
usage and cannot be handled using a separate column family)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9934) Mesh replication (a.k.a. multi master replication)

2013-11-11 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13819608#comment-13819608
 ] 

Ishan Chhabra commented on HBASE-9934:
--

That is the first proposal. It will not work where people have a mixed setup. 
The 2nd one, where a link (source, sink) pair is specified to be belonging to a 
mesh network should work better, but might be more dev work.

Also, when you were cleaning my comment, the diagram got changed. Below is the 
fixed version.

C4 - C3 - C5 - C6
/ \
   C1 - C2

 Mesh replication (a.k.a. multi master replication)
 --

 Key: HBASE-9934
 URL: https://issues.apache.org/jira/browse/HBASE-9934
 Project: HBase
  Issue Type: New Feature
  Components: Replication
Reporter: Ishan Chhabra
Priority: Minor

 This is to setup NxN replication.
 See background discussion here: 
 http://mail-archives.apache.org/mod_mbox/hbase-user/201311.mbox/%3CCAOiuM-4UMmLA7UHMp4hhjpLWUrHDxg1t4tN4aWvnZUMcTxG%2BKQ%40mail.gmail.com%3E
 We can add a new mode in replication to not forward edits from other 
 clusters. Not sure what should be done when some clusters are configured with 
 this setting and some aren't.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9934) Mesh replication (a.k.a. multi master replication)

2013-11-11 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13819610#comment-13819610
 ] 

Ishan Chhabra commented on HBASE-9934:
--

Ok. The above one is not right again. The spaces are removed when I save and 
the weird strike though comes in. C1 and C2 are connected to C3 and not C4.

 Mesh replication (a.k.a. multi master replication)
 --

 Key: HBASE-9934
 URL: https://issues.apache.org/jira/browse/HBASE-9934
 Project: HBase
  Issue Type: New Feature
  Components: Replication
Reporter: Ishan Chhabra
Priority: Minor

 This is to setup NxN replication.
 See background discussion here: 
 http://mail-archives.apache.org/mod_mbox/hbase-user/201311.mbox/%3CCAOiuM-4UMmLA7UHMp4hhjpLWUrHDxg1t4tN4aWvnZUMcTxG%2BKQ%40mail.gmail.com%3E
 We can add a new mode in replication to not forward edits from other 
 clusters. Not sure what should be done when some clusters are configured with 
 this setting and some aren't.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9950) Row level replication

2013-11-11 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13819629#comment-13819629
 ] 

Ishan Chhabra commented on HBASE-9950:
--

[~stack] not sure yet. Since there is no notion of row level data in hbase 
storage, I would have to create some special KVs that are stored for the row, 
which sounds very hacky.

[~apurtell] Replication scope is defined at the CF level, so I don't think Ill 
be able to use it. I do need to plug in custom replication policy though if 
this is not a core feature. There are no observers for replication, are there?

 Row level replication
 -

 Key: HBASE-9950
 URL: https://issues.apache.org/jira/browse/HBASE-9950
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Ishan Chhabra
Priority: Minor

 We have a replication setup with the same table and column family being 
 present in multiple data centers. Currently, all of them have exactly the 
 same data, but each cluster doesn't need all the data. Rows need to be 
 present in only x out of the total y clusters. This information varies at the 
 row level and thus more granular replication cannot be achieved by setting up 
 cluster level replication. 
 Adding row level replication should solve this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9934) Mesh replication (a.k.a. multi master replication)

2013-11-11 Thread Ishan Chhabra (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13819887#comment-13819887
 ] 

Ishan Chhabra commented on HBASE-9934:
--

[~nidmhbase] Lars' understanding is correct. That is why this should be 
specified at peer level and not CF level. Also, when per cf peer definition 
will be supported, this would fit in automatically at peer level. 

[~lhofhansl] Ill give this a shot. Can you assign this to me?

 Mesh replication (a.k.a. multi master replication)
 --

 Key: HBASE-9934
 URL: https://issues.apache.org/jira/browse/HBASE-9934
 Project: HBase
  Issue Type: New Feature
  Components: Replication
Reporter: Ishan Chhabra
Priority: Minor

 This is to setup NxN replication.
 See background discussion here: 
 http://mail-archives.apache.org/mod_mbox/hbase-user/201311.mbox/%3CCAOiuM-4UMmLA7UHMp4hhjpLWUrHDxg1t4tN4aWvnZUMcTxG%2BKQ%40mail.gmail.com%3E
 We can add a new mode in replication to not forward edits from other 
 clusters. Not sure what should be done when some clusters are configured with 
 this setting and some aren't.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HBASE-9934) Suport not forwarding edits in replication

2013-11-08 Thread Ishan Chhabra (JIRA)
Ishan Chhabra created HBASE-9934:


 Summary: Suport not forwarding edits in replication
 Key: HBASE-9934
 URL: https://issues.apache.org/jira/browse/HBASE-9934
 Project: HBase
  Issue Type: New Feature
  Components: Replication
Reporter: Ishan Chhabra
Priority: Minor


This is to setup NxN replication.

See background discussion here: 
http://mail-archives.apache.org/mod_mbox/hbase-user/201311.mbox/%3CCAOiuM-4UMmLA7UHMp4hhjpLWUrHDxg1t4tN4aWvnZUMcTxG%2BKQ%40mail.gmail.com%3E

We can add a new mode in replication to not forward edits from other clusters. 
Not sure what should be done when some clusters are configured with this 
setting and some aren't.



--
This message was sent by Atlassian JIRA
(v6.1#6144)