[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]

2014-05-13 Thread Ian Friedman (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997042#comment-13997042
 ] 

Ian Friedman commented on HBASE-10076:
--

never mind, I see it was in the parent ticket, I should probably read before I 
comment

> Backport MapReduce over snapshot files [0.94]
> -
>
> Key: HBASE-10076
> URL: https://issues.apache.org/jira/browse/HBASE-10076
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Attachments: hbase-10076-v0.patch
>
>
> MapReduce over Snapshots would be valuable on 0.94.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]

2014-05-13 Thread Ian Friedman (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997041#comment-13997041
 ] 

Ian Friedman commented on HBASE-10076:
--

Hey [~lhofhansl], [~bryanck], is this the latest version of the patch for 94? 
If not, anyone know where I can get the latest version?

> Backport MapReduce over snapshot files [0.94]
> -
>
> Key: HBASE-10076
> URL: https://issues.apache.org/jira/browse/HBASE-10076
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Attachments: hbase-10076-v0.patch
>
>
> MapReduce over Snapshots would be valuable on 0.94.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]

2013-12-17 Thread Bryan Keller (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850710#comment-13850710
 ] 

Bryan Keller commented on HBASE-10076:
--

In some simple tests, I saw a 4-5x speed improvement, which is similar to what 
you are seeing. In production, we saw a lower 3x improvement in our main jobs, 
but that is because we are more bound by CPU now than before. I did some quick 
profiling and it appears there is room for some optimization in HRegion to 
reduce CPU usage, which would further improve performance for more CPU 
intensive jobs.


> Backport MapReduce over snapshot files [0.94]
> -
>
> Key: HBASE-10076
> URL: https://issues.apache.org/jira/browse/HBASE-10076
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Attachments: hbase-10076-v0.patch
>
>
> MapReduce over Snapshots would be valuable on 0.94.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]

2013-12-16 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849932#comment-13849932
 ] 

Enis Soztutar commented on HBASE-10076:
---

bq. I'm curious what kind of performance improvement you are seeing with the 
snapshot scan
In my tests, I was able to get 50-60 MB/s per region, versus 10-11 MB/s for 
regular scans (see 
https://issues.apache.org/jira/browse/HBASE-8369?focusedCommentId=13805748&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13805748).
 
Bryan, can you also share your performance numbers if you can quantify? 

> Backport MapReduce over snapshot files [0.94]
> -
>
> Key: HBASE-10076
> URL: https://issues.apache.org/jira/browse/HBASE-10076
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.15
>
> Attachments: hbase-10076-v0.patch
>
>
> MapReduce over Snapshots would be valuable on 0.94.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]

2013-12-16 Thread Bryan Keller (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849882#comment-13849882
 ] 

Bryan Keller commented on HBASE-10076:
--

I should say the code is the same, except for the TableMapReduceUtil code I 
needed to cut-n-paste...

> Backport MapReduce over snapshot files [0.94]
> -
>
> Key: HBASE-10076
> URL: https://issues.apache.org/jira/browse/HBASE-10076
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.15
>
> Attachments: hbase-10076-v0.patch
>
>
> MapReduce over Snapshots would be valuable on 0.94.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]

2013-12-16 Thread Bryan Keller (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849880#comment-13849880
 ] 

Bryan Keller commented on HBASE-10076:
--

No, the map-reduce code from the v5 patch is the same as I'm using now in our 
production servers. I created some utilities for creating and cleaning up 
snapshots seamlessly, but it is orthogonal to this.

I'm curious what kind of performance improvement you are seeing with the 
snapshot scan (if you have tested it)?


> Backport MapReduce over snapshot files [0.94]
> -
>
> Key: HBASE-10076
> URL: https://issues.apache.org/jira/browse/HBASE-10076
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.15
>
> Attachments: hbase-10076-v0.patch
>
>
> MapReduce over Snapshots would be valuable on 0.94.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]

2013-12-16 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849848#comment-13849848
 ] 

Lars Hofhansl commented on HBASE-10076:
---

[~bryanck], did you find anything worthwhile to include?

> Backport MapReduce over snapshot files [0.94]
> -
>
> Key: HBASE-10076
> URL: https://issues.apache.org/jira/browse/HBASE-10076
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.15
>
> Attachments: hbase-10076-v0.patch
>
>
> MapReduce over Snapshots would be valuable on 0.94.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]

2013-12-14 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848455#comment-13848455
 ] 

Lars Hofhansl commented on HBASE-10076:
---

Thanks Bryan. Much appreciated!

> Backport MapReduce over snapshot files [0.94]
> -
>
> Key: HBASE-10076
> URL: https://issues.apache.org/jira/browse/HBASE-10076
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.15
>
> Attachments: hbase-10076-v0.patch
>
>
> MapReduce over Snapshots would be valuable on 0.94.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]

2013-12-14 Thread Bryan Keller (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848405#comment-13848405
 ] 

Bryan Keller commented on HBASE-10076:
--

BTW I have made a few minor enhancements since submitting that patch. I'll 
check this weekend to see if it is anything worthwhile.

> Backport MapReduce over snapshot files [0.94]
> -
>
> Key: HBASE-10076
> URL: https://issues.apache.org/jira/browse/HBASE-10076
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.15
>
> Attachments: hbase-10076-v0.patch
>
>
> MapReduce over Snapshots would be valuable on 0.94.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]

2013-12-14 Thread Bryan Keller (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848404#comment-13848404
 ] 

Bryan Keller commented on HBASE-10076:
--

Yes, there is some minor cut-n-pasting involved. We could easily package this 
to be available outside of the distribution if that is deemed necessary.


> Backport MapReduce over snapshot files [0.94]
> -
>
> Key: HBASE-10076
> URL: https://issues.apache.org/jira/browse/HBASE-10076
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.15
>
> Attachments: hbase-10076-v0.patch
>
>
> MapReduce over Snapshots would be valuable on 0.94.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]

2013-12-13 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848276#comment-13848276
 ] 

Lars Hofhansl commented on HBASE-10076:
---

With a little bit of cut'n'paste (TableMapReduceUtil's convertScanToString, 
convertStringToScan)  this can indeed be done with [~bryanck]'s latest 0.94 on 
HBASE-8369 without any additional code in HBase.


> Backport MapReduce over snapshot files [0.94]
> -
>
> Key: HBASE-10076
> URL: https://issues.apache.org/jira/browse/HBASE-10076
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.15
>
> Attachments: hbase-10076-v0.patch
>
>
> MapReduce over Snapshots would be valuable on 0.94.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]

2013-12-13 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847935#comment-13847935
 ] 

Lars Hofhansl commented on HBASE-10076:
---

Based on recent discussion we might just add the necessary hooks to 0.94, in 
order to allow implementing the M/R outside of HBase.

> Backport MapReduce over snapshot files [0.94]
> -
>
> Key: HBASE-10076
> URL: https://issues.apache.org/jira/browse/HBASE-10076
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.15
>
> Attachments: hbase-10076-v0.patch
>
>
> MapReduce over Snapshots would be valuable on 0.94.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]

2013-12-12 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847210#comment-13847210
 ] 

Enis Soztutar commented on HBASE-10076:
---

bq. Looks like in 0.94 its just added based on the config
In trunk, initTableSnapshotMapper() calls: 
{code}
initTableMapperJob(snapshotName, scan, mapper, outputKeyClass,
outputValueClass, job, addDependencyJars, false, 
TableSnapshotInputFormat.class);
{code}
the false is initCredentials to the overloaded function, no? I don't have 
eclipse open, I cannot check : ) 

bq. how can we test that the locality selection is correct? Its not really 
covered anywhere in this patch or the original
It is using the HDFSBlocksDistribution, which I though is tested on it's own. 
Did not check whether there is actual coverage though. It should be possible to 
mock that up I guess. 

> Backport MapReduce over snapshot files [0.94]
> -
>
> Key: HBASE-10076
> URL: https://issues.apache.org/jira/browse/HBASE-10076
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.15
>
> Attachments: hbase-10076-v0.patch
>
>
> MapReduce over Snapshots would be valuable on 0.94.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]

2013-12-12 Thread Jesse Yates (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847018#comment-13847018
 ] 

Jesse Yates commented on HBASE-10076:
-

[~enis] other questions Lars and I had offline - how can we test that the 
locality selection is correct? Its not really covered anywhere in this patch or 
the original

> Backport MapReduce over snapshot files [0.94]
> -
>
> Key: HBASE-10076
> URL: https://issues.apache.org/jira/browse/HBASE-10076
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.15
>
> Attachments: hbase-10076-v0.patch
>
>
> MapReduce over Snapshots would be valuable on 0.94.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]

2013-12-12 Thread Jesse Yates (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847017#comment-13847017
 ] 

Jesse Yates commented on HBASE-10076:
-

bq. Any interest in bringing in the new test TestCellUtil. 
testOverlappingKeys() ? Not needed that much, just checking

Done - good call.

bq. Not sure about the change in TableInputFormatBase. Is this needed? Let's 
leave this out otherwise.

Not needed, but this is a cleaner, better implementation and good to reuse the 
util, now that we have it. Agree its a little extra and a bit unneccessary, but 
trivially so.

bq. In the original patch, TableMapReduceUtil. initTableMapperJob() now accepts 
an initCredentials param, because we do not want to get tokens from HBase at 
all. Otherwise, if hbase is used with security, offline clusters won't work.

Looks like in 0.94 its just added based on the config. In trunk, looks like 
TableMapReduceUtil.initTableSnapshotMapperJob just calls initTableMapperJob 
without any parameter, which always initializes the credentials. Maybe a bug in 
trunk? Seems like you would need a more sweeping change to make it configurable 
as well.

> Backport MapReduce over snapshot files [0.94]
> -
>
> Key: HBASE-10076
> URL: https://issues.apache.org/jira/browse/HBASE-10076
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.15
>
> Attachments: hbase-10076-v0.patch
>
>
> MapReduce over Snapshots would be valuable on 0.94.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]

2013-12-12 Thread Jesse Yates (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846971#comment-13846971
 ] 

Jesse Yates commented on HBASE-10076:
-

thanks for the feedback Enis! I'll update the patch, in the event that someone 
wants it for their installation, or have it ready if we can get it into 0.96 
(as per discussion on HBASE-8369).

> Backport MapReduce over snapshot files [0.94]
> -
>
> Key: HBASE-10076
> URL: https://issues.apache.org/jira/browse/HBASE-10076
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.15
>
> Attachments: hbase-10076-v0.patch
>
>
> MapReduce over Snapshots would be valuable on 0.94.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]

2013-12-12 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846960#comment-13846960
 ] 

Andrew Purtell commented on HBASE-10076:


See 
https://issues.apache.org/jira/browse/HBASE-8369?focusedCommentId=13846957&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846957

> Backport MapReduce over snapshot files [0.94]
> -
>
> Key: HBASE-10076
> URL: https://issues.apache.org/jira/browse/HBASE-10076
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.15
>
> Attachments: hbase-10076-v0.patch
>
>
> MapReduce over Snapshots would be valuable on 0.94.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]

2013-12-12 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846900#comment-13846900
 ] 

Enis Soztutar commented on HBASE-10076:
---

Looks all the pieces are here :) 
- remove  System.out.println("In restore!");
- we should remove the scanMetrics from ClientScanner from the original patch:
{code}
 - protected ScanMetrics scanMetrics = null;
{code}
- Any interest in bringing in the new test TestCellUtil. testOverlappingKeys() 
? Not needed that much, just checking
- It would be good to have IntegrationTestTableSnapshotInputFormat in the same 
package (mapreduce) 
- Not sure about the chang in TableInputFormatBase. Is this needed? Let's leave 
this out otherwise. 
- In the original patch, TableMapReduceUtil. initTableMapperJob() now accepts 
an initCredentials param, because we do not want to get tokens from HBase at 
all. Otherwise, if hbase is used with security, offline clusters won't work. 


> Backport MapReduce over snapshot files [0.94]
> -
>
> Key: HBASE-10076
> URL: https://issues.apache.org/jira/browse/HBASE-10076
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.15
>
> Attachments: hbase-10076-v0.patch
>
>
> MapReduce over Snapshots would be valuable on 0.94.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]

2013-12-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846786#comment-13846786
 ] 

Lars Hofhansl commented on HBASE-10076:
---

Looks good upon first inspection.

> Backport MapReduce over snapshot files [0.94]
> -
>
> Key: HBASE-10076
> URL: https://issues.apache.org/jira/browse/HBASE-10076
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.15
>
> Attachments: hbase-10076-v0.patch
>
>
> MapReduce over Snapshots would be valuable on 0.94.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]

2013-12-03 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838586#comment-13838586
 ] 

Lars Hofhansl commented on HBASE-10076:
---

[~jesse_yates], FYI.

> Backport MapReduce over snapshot files [0.94]
> -
>
> Key: HBASE-10076
> URL: https://issues.apache.org/jira/browse/HBASE-10076
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
> Fix For: 0.94.15
>
>
> MapReduce over Snapshots would be valuable on 0.94.



--
This message was sent by Atlassian JIRA
(v6.1#6144)