[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]
[ https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997042#comment-13997042 ] Ian Friedman commented on HBASE-10076: -- never mind, I see it was in the parent ticket, I should probably read before I comment > Backport MapReduce over snapshot files [0.94] > - > > Key: HBASE-10076 > URL: https://issues.apache.org/jira/browse/HBASE-10076 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Jesse Yates > Attachments: hbase-10076-v0.patch > > > MapReduce over Snapshots would be valuable on 0.94. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]
[ https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997041#comment-13997041 ] Ian Friedman commented on HBASE-10076: -- Hey [~lhofhansl], [~bryanck], is this the latest version of the patch for 94? If not, anyone know where I can get the latest version? > Backport MapReduce over snapshot files [0.94] > - > > Key: HBASE-10076 > URL: https://issues.apache.org/jira/browse/HBASE-10076 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Jesse Yates > Attachments: hbase-10076-v0.patch > > > MapReduce over Snapshots would be valuable on 0.94. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]
[ https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850710#comment-13850710 ] Bryan Keller commented on HBASE-10076: -- In some simple tests, I saw a 4-5x speed improvement, which is similar to what you are seeing. In production, we saw a lower 3x improvement in our main jobs, but that is because we are more bound by CPU now than before. I did some quick profiling and it appears there is room for some optimization in HRegion to reduce CPU usage, which would further improve performance for more CPU intensive jobs. > Backport MapReduce over snapshot files [0.94] > - > > Key: HBASE-10076 > URL: https://issues.apache.org/jira/browse/HBASE-10076 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Jesse Yates > Attachments: hbase-10076-v0.patch > > > MapReduce over Snapshots would be valuable on 0.94. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]
[ https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849932#comment-13849932 ] Enis Soztutar commented on HBASE-10076: --- bq. I'm curious what kind of performance improvement you are seeing with the snapshot scan In my tests, I was able to get 50-60 MB/s per region, versus 10-11 MB/s for regular scans (see https://issues.apache.org/jira/browse/HBASE-8369?focusedCommentId=13805748&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13805748). Bryan, can you also share your performance numbers if you can quantify? > Backport MapReduce over snapshot files [0.94] > - > > Key: HBASE-10076 > URL: https://issues.apache.org/jira/browse/HBASE-10076 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Jesse Yates > Fix For: 0.94.15 > > Attachments: hbase-10076-v0.patch > > > MapReduce over Snapshots would be valuable on 0.94. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]
[ https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849882#comment-13849882 ] Bryan Keller commented on HBASE-10076: -- I should say the code is the same, except for the TableMapReduceUtil code I needed to cut-n-paste... > Backport MapReduce over snapshot files [0.94] > - > > Key: HBASE-10076 > URL: https://issues.apache.org/jira/browse/HBASE-10076 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Jesse Yates > Fix For: 0.94.15 > > Attachments: hbase-10076-v0.patch > > > MapReduce over Snapshots would be valuable on 0.94. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]
[ https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849880#comment-13849880 ] Bryan Keller commented on HBASE-10076: -- No, the map-reduce code from the v5 patch is the same as I'm using now in our production servers. I created some utilities for creating and cleaning up snapshots seamlessly, but it is orthogonal to this. I'm curious what kind of performance improvement you are seeing with the snapshot scan (if you have tested it)? > Backport MapReduce over snapshot files [0.94] > - > > Key: HBASE-10076 > URL: https://issues.apache.org/jira/browse/HBASE-10076 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Jesse Yates > Fix For: 0.94.15 > > Attachments: hbase-10076-v0.patch > > > MapReduce over Snapshots would be valuable on 0.94. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]
[ https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849848#comment-13849848 ] Lars Hofhansl commented on HBASE-10076: --- [~bryanck], did you find anything worthwhile to include? > Backport MapReduce over snapshot files [0.94] > - > > Key: HBASE-10076 > URL: https://issues.apache.org/jira/browse/HBASE-10076 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Jesse Yates > Fix For: 0.94.15 > > Attachments: hbase-10076-v0.patch > > > MapReduce over Snapshots would be valuable on 0.94. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]
[ https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848455#comment-13848455 ] Lars Hofhansl commented on HBASE-10076: --- Thanks Bryan. Much appreciated! > Backport MapReduce over snapshot files [0.94] > - > > Key: HBASE-10076 > URL: https://issues.apache.org/jira/browse/HBASE-10076 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Jesse Yates > Fix For: 0.94.15 > > Attachments: hbase-10076-v0.patch > > > MapReduce over Snapshots would be valuable on 0.94. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]
[ https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848405#comment-13848405 ] Bryan Keller commented on HBASE-10076: -- BTW I have made a few minor enhancements since submitting that patch. I'll check this weekend to see if it is anything worthwhile. > Backport MapReduce over snapshot files [0.94] > - > > Key: HBASE-10076 > URL: https://issues.apache.org/jira/browse/HBASE-10076 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Jesse Yates > Fix For: 0.94.15 > > Attachments: hbase-10076-v0.patch > > > MapReduce over Snapshots would be valuable on 0.94. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]
[ https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848404#comment-13848404 ] Bryan Keller commented on HBASE-10076: -- Yes, there is some minor cut-n-pasting involved. We could easily package this to be available outside of the distribution if that is deemed necessary. > Backport MapReduce over snapshot files [0.94] > - > > Key: HBASE-10076 > URL: https://issues.apache.org/jira/browse/HBASE-10076 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Jesse Yates > Fix For: 0.94.15 > > Attachments: hbase-10076-v0.patch > > > MapReduce over Snapshots would be valuable on 0.94. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]
[ https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848276#comment-13848276 ] Lars Hofhansl commented on HBASE-10076: --- With a little bit of cut'n'paste (TableMapReduceUtil's convertScanToString, convertStringToScan) this can indeed be done with [~bryanck]'s latest 0.94 on HBASE-8369 without any additional code in HBase. > Backport MapReduce over snapshot files [0.94] > - > > Key: HBASE-10076 > URL: https://issues.apache.org/jira/browse/HBASE-10076 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Jesse Yates > Fix For: 0.94.15 > > Attachments: hbase-10076-v0.patch > > > MapReduce over Snapshots would be valuable on 0.94. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]
[ https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847935#comment-13847935 ] Lars Hofhansl commented on HBASE-10076: --- Based on recent discussion we might just add the necessary hooks to 0.94, in order to allow implementing the M/R outside of HBase. > Backport MapReduce over snapshot files [0.94] > - > > Key: HBASE-10076 > URL: https://issues.apache.org/jira/browse/HBASE-10076 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Jesse Yates > Fix For: 0.94.15 > > Attachments: hbase-10076-v0.patch > > > MapReduce over Snapshots would be valuable on 0.94. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]
[ https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847210#comment-13847210 ] Enis Soztutar commented on HBASE-10076: --- bq. Looks like in 0.94 its just added based on the config In trunk, initTableSnapshotMapper() calls: {code} initTableMapperJob(snapshotName, scan, mapper, outputKeyClass, outputValueClass, job, addDependencyJars, false, TableSnapshotInputFormat.class); {code} the false is initCredentials to the overloaded function, no? I don't have eclipse open, I cannot check : ) bq. how can we test that the locality selection is correct? Its not really covered anywhere in this patch or the original It is using the HDFSBlocksDistribution, which I though is tested on it's own. Did not check whether there is actual coverage though. It should be possible to mock that up I guess. > Backport MapReduce over snapshot files [0.94] > - > > Key: HBASE-10076 > URL: https://issues.apache.org/jira/browse/HBASE-10076 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Jesse Yates > Fix For: 0.94.15 > > Attachments: hbase-10076-v0.patch > > > MapReduce over Snapshots would be valuable on 0.94. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]
[ https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847018#comment-13847018 ] Jesse Yates commented on HBASE-10076: - [~enis] other questions Lars and I had offline - how can we test that the locality selection is correct? Its not really covered anywhere in this patch or the original > Backport MapReduce over snapshot files [0.94] > - > > Key: HBASE-10076 > URL: https://issues.apache.org/jira/browse/HBASE-10076 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Jesse Yates > Fix For: 0.94.15 > > Attachments: hbase-10076-v0.patch > > > MapReduce over Snapshots would be valuable on 0.94. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]
[ https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847017#comment-13847017 ] Jesse Yates commented on HBASE-10076: - bq. Any interest in bringing in the new test TestCellUtil. testOverlappingKeys() ? Not needed that much, just checking Done - good call. bq. Not sure about the change in TableInputFormatBase. Is this needed? Let's leave this out otherwise. Not needed, but this is a cleaner, better implementation and good to reuse the util, now that we have it. Agree its a little extra and a bit unneccessary, but trivially so. bq. In the original patch, TableMapReduceUtil. initTableMapperJob() now accepts an initCredentials param, because we do not want to get tokens from HBase at all. Otherwise, if hbase is used with security, offline clusters won't work. Looks like in 0.94 its just added based on the config. In trunk, looks like TableMapReduceUtil.initTableSnapshotMapperJob just calls initTableMapperJob without any parameter, which always initializes the credentials. Maybe a bug in trunk? Seems like you would need a more sweeping change to make it configurable as well. > Backport MapReduce over snapshot files [0.94] > - > > Key: HBASE-10076 > URL: https://issues.apache.org/jira/browse/HBASE-10076 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Jesse Yates > Fix For: 0.94.15 > > Attachments: hbase-10076-v0.patch > > > MapReduce over Snapshots would be valuable on 0.94. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]
[ https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846971#comment-13846971 ] Jesse Yates commented on HBASE-10076: - thanks for the feedback Enis! I'll update the patch, in the event that someone wants it for their installation, or have it ready if we can get it into 0.96 (as per discussion on HBASE-8369). > Backport MapReduce over snapshot files [0.94] > - > > Key: HBASE-10076 > URL: https://issues.apache.org/jira/browse/HBASE-10076 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Jesse Yates > Fix For: 0.94.15 > > Attachments: hbase-10076-v0.patch > > > MapReduce over Snapshots would be valuable on 0.94. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]
[ https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846960#comment-13846960 ] Andrew Purtell commented on HBASE-10076: See https://issues.apache.org/jira/browse/HBASE-8369?focusedCommentId=13846957&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846957 > Backport MapReduce over snapshot files [0.94] > - > > Key: HBASE-10076 > URL: https://issues.apache.org/jira/browse/HBASE-10076 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Jesse Yates > Fix For: 0.94.15 > > Attachments: hbase-10076-v0.patch > > > MapReduce over Snapshots would be valuable on 0.94. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]
[ https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846900#comment-13846900 ] Enis Soztutar commented on HBASE-10076: --- Looks all the pieces are here :) - remove System.out.println("In restore!"); - we should remove the scanMetrics from ClientScanner from the original patch: {code} - protected ScanMetrics scanMetrics = null; {code} - Any interest in bringing in the new test TestCellUtil. testOverlappingKeys() ? Not needed that much, just checking - It would be good to have IntegrationTestTableSnapshotInputFormat in the same package (mapreduce) - Not sure about the chang in TableInputFormatBase. Is this needed? Let's leave this out otherwise. - In the original patch, TableMapReduceUtil. initTableMapperJob() now accepts an initCredentials param, because we do not want to get tokens from HBase at all. Otherwise, if hbase is used with security, offline clusters won't work. > Backport MapReduce over snapshot files [0.94] > - > > Key: HBASE-10076 > URL: https://issues.apache.org/jira/browse/HBASE-10076 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Jesse Yates > Fix For: 0.94.15 > > Attachments: hbase-10076-v0.patch > > > MapReduce over Snapshots would be valuable on 0.94. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]
[ https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846786#comment-13846786 ] Lars Hofhansl commented on HBASE-10076: --- Looks good upon first inspection. > Backport MapReduce over snapshot files [0.94] > - > > Key: HBASE-10076 > URL: https://issues.apache.org/jira/browse/HBASE-10076 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Jesse Yates > Fix For: 0.94.15 > > Attachments: hbase-10076-v0.patch > > > MapReduce over Snapshots would be valuable on 0.94. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10076) Backport MapReduce over snapshot files [0.94]
[ https://issues.apache.org/jira/browse/HBASE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838586#comment-13838586 ] Lars Hofhansl commented on HBASE-10076: --- [~jesse_yates], FYI. > Backport MapReduce over snapshot files [0.94] > - > > Key: HBASE-10076 > URL: https://issues.apache.org/jira/browse/HBASE-10076 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > Fix For: 0.94.15 > > > MapReduce over Snapshots would be valuable on 0.94. -- This message was sent by Atlassian JIRA (v6.1#6144)