[jira] [Commented] (HBASE-4060) Making region assignment more robust

2011-07-24 Thread Eran Kutner (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070154#comment-13070154
 ] 

Eran Kutner commented on HBASE-4060:


I will try to elaborate a bit on what I had in mind, I think it is not very far 
from what Andrew suggested earlier.
First I should say that I am not familiar enough with the current 
implementation so my understanding may not be correct or accurate. However, 
based on what I understand, the current implementation doesn't seem to be 
robust enough, because it is based on active communication between the master 
and RSs, which leaves room for timeouts and failures.
My suggestion is to be more proactive about monitoring the assignment of 
regions and allow the RSs themselves to know which regions are assigned to them 
at any time.
I suggest opening a new znode in ZK for listing the regions and their 
assignment. It can be something like /hbase/regions/table/region, so each 
region will have a znode. Under that will be a znode for the assigned RS.
When the master assigns a region to a RS it should delete the old owner record 
from the list and add the new one.
When a RS gets an assignment command from the master it should list the 
children of the znode corresponding to the assigned region and set a watcher 
for that. The RS should verify it is indeed the owner registered in ZK. If it 
is not it should immediately refuse to accept the region assignment command.
If the RS receives an event trigger from one of the watchers it had set, it 
should re-check that region assignment and validate it is still the owner of 
the region. If it's not, it should relinquish control over the region.
The process so far should guarantee that there are never double assigned 
regions, however it may create orphan regions which are not assigned to any RS. 
To resolve that the master should periodically check for unassigned regions and 
reassign them.



 Making region assignment more robust
 

 Key: HBASE-4060
 URL: https://issues.apache.org/jira/browse/HBASE-4060
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
 Fix For: 0.92.0


 From Eran Kutner:
 My concern is that the region allocation process seems to rely too much on
 timing considerations and doesn't seem to take enough measures to guarantee
 conflicts do not occur. I understand that in a distributed environment, when
 you don't get a timely response from a remote machine you can't know for
 sure if it did or did not receive the request, however there are things that
 can be done to mitigate this and reduce the conflict time significantly. For
 example, when I run dbck it knows that some regions are multiply assigned,
 the master could do the same and try to resolve the conflict. Another
 approach would be to handle late responses, even if the response from the
 remote machine arrives after it was assumed to be dead the master should
 have enough information to know it had created a conflict by assigning the
 region to another server. An even better solution, I think, is for the RS to
 periodically test that it is indeed the rightful owner of every region it
 holds and relinquish control over the region if it's not.
 Obviously a state where two RSs hold the same region is pathological and can
 lead to data loss, as demonstrated in my case. The system should be able to
 actively protect itself against such a scenario. It probably doesn't need
 saying but there is really nothing worse for a data storage system than data
 loss.
 In my case the problem didn't happen in the initial phase but after
 disabling and enabling a table with about 12K regions.
 For more background information, see 'Errors after major compaction' 
 discussion on u...@hbase.apache.org

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2011-07-13 Thread Eran Kutner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eran Kutner updated HBASE-3996:
---

Attachment: (was: MultiTableInputFormat.patch)

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.90.4


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2011-07-13 Thread Eran Kutner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eran Kutner updated HBASE-3996:
---

Attachment: (was: TestMultiTableInputFormat.java.patch)

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.90.4


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2011-07-13 Thread Eran Kutner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eran Kutner updated HBASE-3996:
---

Attachment: TestMultiTableInputFormat.java.patch
MultiTableInputFormat.patch

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.90.4

 Attachments: MultiTableInputFormat.patch, 
 TestMultiTableInputFormat.java.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2011-07-13 Thread Eran Kutner (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13064813#comment-13064813
 ] 

Eran Kutner commented on HBASE-3996:


Cleaned up the patch as much as I can, hopefully I didn't mess it up.

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.90.4

 Attachments: MultiTableInputFormat.patch, 
 TestMultiTableInputFormat.java.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2011-06-23 Thread Eran Kutner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eran Kutner updated HBASE-3996:
---

Attachment: (was: MultiTableInputFormat.patch)

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.90.4

 Attachments: MultiTableInputFormat.patch, 
 TestMultiTableInputFormat.java.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2011-06-23 Thread Eran Kutner (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053681#comment-13053681
 ] 

Eran Kutner commented on HBASE-3996:


Thanks stack.

I hope I finally got Eclipse to properly manage the tabs and line lengths (I'm 
not really a Java developer so this is all new to me).

{quote}In TableSplit you create an HTable instance. Do you need to? And when 
you create it, though I believe it will be less of a problem going forward, can 
you use the constructor that takes a Configuration and table name? Is there a 
close in Split interface? If so, you might want to call close of your HTable in 
there. (Where is it used? Each split needs its own HTable?) Use the constructor 
that takes a Configuration here too...{quote}

There are actually two issues here, I added the configuration and closed the 
table in getSplits(), that's the easy one.
HTable per split is needed because it is used for reading the data from the 
split by the cluster nodes when the job is running. However, in order to 
support passing the configuration, I move the Htable creation out of TableSplit 
and into MutiTableInputFormatBase. I also modified TableRecordReaderImpl to 
close the table after reading all the records in the split. I believe this is 
OK, and the tests are passing fine, but it wasn't like that in the existing, 
single table, implementation so I hope I'm not missing (and messing) anything.

{quote}You don't need the e.printStackTrace in below{quote}
Right, removed and fixed the spelling in the warning.

{quote}By any chance is the code here in MultiTableInputFormatBase where we are 
checking start and end rows copied from elsewhere?{quote}
It's copied from TableInputFormatBase, as I said my code is closely based on 
the single table code.

{quote}You remove the hashCode in TableSplit. Should it have one?{quote}
I actually don't know if it needs one or not (it does seem to work fine without 
it) but I didn't remove it intentionally. I wrote my original code based on the 
0.90.3 branch and when I copied to trunk I missed this change. It's back 
now.{quote}

{quote}therwise patch looks great. Test too.{quote}
Thanks!

Hope that's it.


 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.90.4

 Attachments: MultiTableInputFormat.patch, 
 TestMultiTableInputFormat.java.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2011-06-23 Thread Eran Kutner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eran Kutner updated HBASE-3996:
---

Attachment: (was: MultiTableInputFormat.patch)

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.90.4

 Attachments: MultiTableInputFormat.patch, 
 TestMultiTableInputFormat.java.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2011-06-23 Thread Eran Kutner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eran Kutner updated HBASE-3996:
---

Attachment: MultiTableInputFormat.patch

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.90.4

 Attachments: MultiTableInputFormat.patch, 
 TestMultiTableInputFormat.java.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2011-06-21 Thread Eran Kutner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eran Kutner updated HBASE-3996:
---

Attachment: (was: MultiTableInputFormat.java)

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.90.4

 Attachments: MultiTableInputFormat.patch, 
 TestMultiTableInputFormat.java.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2011-06-21 Thread Eran Kutner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eran Kutner updated HBASE-3996:
---

Attachment: (was: TableSplit.java)

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.90.4

 Attachments: MultiTableInputFormat.patch, 
 TestMultiTableInputFormat.java.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2011-06-21 Thread Eran Kutner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eran Kutner updated HBASE-3996:
---

Attachment: (was: TableMapReduceUtil.java)

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.90.4

 Attachments: MultiTableInputFormat.patch, 
 TestMultiTableInputFormat.java.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2011-06-21 Thread Eran Kutner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eran Kutner updated HBASE-3996:
---

Attachment: (was: MultiTableInputCollection.java)

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.90.4

 Attachments: MultiTableInputFormat.patch, 
 TestMultiTableInputFormat.java.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2011-06-21 Thread Eran Kutner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eran Kutner updated HBASE-3996:
---

Attachment: (was: MultiTableInputFormatBase.java)

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.90.4

 Attachments: MultiTableInputFormat.patch, 
 TestMultiTableInputFormat.java.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2011-06-21 Thread Eran Kutner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eran Kutner updated HBASE-3996:
---

Attachment: MultiTableInputFormat.patch

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.90.4

 Attachments: MultiTableInputFormat.patch, 
 TestMultiTableInputFormat.java.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2011-06-21 Thread Eran Kutner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eran Kutner updated HBASE-3996:
---

Attachment: TestMultiTableInputFormat.java.patch

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.90.4

 Attachments: MultiTableInputFormat.patch, 
 TestMultiTableInputFormat.java.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2011-06-21 Thread Eran Kutner (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052764#comment-13052764
 ] 

Eran Kutner commented on HBASE-3996:


Should be better now. 
Cleaned up the javadocs and added a unit test based on the original 
TableInputFormat test.
Let me know if there is anything I missed.

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.90.4

 Attachments: MultiTableInputFormat.patch, 
 TestMultiTableInputFormat.java.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2011-06-21 Thread Eran Kutner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eran Kutner updated HBASE-3996:
---

Attachment: MultiTableInputFormat.patch

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.90.4

 Attachments: MultiTableInputFormat.patch, 
 TestMultiTableInputFormat.java.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2011-06-21 Thread Eran Kutner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eran Kutner updated HBASE-3996:
---

Attachment: (was: MultiTableInputFormat.patch)

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.90.4

 Attachments: MultiTableInputFormat.patch, 
 TestMultiTableInputFormat.java.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2011-06-16 Thread Eran Kutner (JIRA)
Support multiple tables and scanners as input to the mapper in map/reduce jobs
--

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.90.4


It seems that in many cases feeding data from multiple tables or multiple 
scanners on a single table can save a lot of time when running map/reduce jobs.
I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2011-06-16 Thread Eran Kutner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eran Kutner updated HBASE-3996:
---

Attachment: MultiTableInputCollection.java

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.90.4

 Attachments: MultiTableInputCollection.java, 
 MultiTableInputFormat.java


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2011-06-16 Thread Eran Kutner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eran Kutner updated HBASE-3996:
---

Attachment: MultiTableInputFormat.java

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.90.4

 Attachments: MultiTableInputCollection.java, 
 MultiTableInputFormat.java


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2011-06-16 Thread Eran Kutner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eran Kutner updated HBASE-3996:
---

Attachment: TableSplit.java

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.90.4

 Attachments: MultiTableInputCollection.java, 
 MultiTableInputFormat.java, MultiTableInputFormatBase.java, 
 TableMapReduceUtil.java, TableSplit.java


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2011-06-16 Thread Eran Kutner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eran Kutner updated HBASE-3996:
---

Attachment: TableMapReduceUtil.java

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.90.4

 Attachments: MultiTableInputCollection.java, 
 MultiTableInputFormat.java, MultiTableInputFormatBase.java, 
 TableMapReduceUtil.java, TableSplit.java


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2011-06-16 Thread Eran Kutner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eran Kutner updated HBASE-3996:
---

Attachment: MultiTableInputFormatBase.java

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.90.4

 Attachments: MultiTableInputCollection.java, 
 MultiTableInputFormat.java, MultiTableInputFormatBase.java, 
 TableMapReduceUtil.java, TableSplit.java


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2011-06-16 Thread Eran Kutner (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050368#comment-13050368
 ] 

Eran Kutner commented on HBASE-3996:


I've added three new classes:
MultiTableInputCollection is a collection of table+scanner pairs that should be 
used as input to the mapper.
MultiTableInputFormatBase and MultiTableInputFormat are closely based on the 
non-multi version with required adaptations.
I've also updated TableMapReduceUtil and TableSplit to support these new 
classes.

Usage example:
Scan scan1 = new Scan();
scan1.setStartRow(start1);
scan1.setStopRow(end1);

Scan scan2 = new Scan();
scan2.setStartRow(start2);
scan2.setStopRow(end2);

MultiTableInputCollection mtic = new MultiTableInputCollection();
mtic.Add(tableName1, scan1);
mtic.Add(tableName2, scan2);

TableMapReduceUtil.initTableMapperJob(mtic, TestTableMapper.class, 
Text.class, IntWritable.class, job1);



 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.90.4

 Attachments: MultiTableInputCollection.java, 
 MultiTableInputFormat.java, MultiTableInputFormatBase.java, 
 TableMapReduceUtil.java, TableSplit.java


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira