[jira] [Commented] (PIG-1825) ability to turn off the write ahead log for pig's HBaseStorage
[ https://issues.apache.org/jira/browse/PIG-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031068#comment-13031068 ] Dmitriy V. Ryaboy commented on PIG-1825: Cool. At this point I don't think we need testStoreToHBase_2_no_WAL() ? HBase itself doesn't actually test noWAL directly. I'm ok with not testing the full path, just testing that we are using the HBase api correctly. I do almost want to make it "-noSafety" just to be clear about what one is doing when invoking this "optimization" > ability to turn off the write ahead log for pig's HBaseStorage > -- > > Key: PIG-1825 > URL: https://issues.apache.org/jira/browse/PIG-1825 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0 >Reporter: Corbin Hoenes >Priority: Minor > Attachments: HBaseStorage_noWAL.patch, PIG-1825_1.patch, > PIG-1825_2.patch > > > Added an option to allow a caller of HBaseStorage to turn off the > WriteAheadLog feature while doing bulk loads into hbase. > From the performance tuning wikipage: > http://wiki.apache.org/hadoop/PerformanceTuning > "To speed up the inserts in a non critical job (like an import job), you can > use Put.writeToWAL(false) to bypass writing to the write ahead log." > We've tested this on HBase 0.20.6 and it helps dramatically. > The -noWAL options is passed in just like other options for hbase storage: > STORE myalias INTO 'MyTable' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('mycolumnfamily:field1 > mycolumnfamily:field2','-noWAL'); > This would be my first patch so please educate me with any steps I need to > do. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1825) ability to turn off the write ahead log for pig's HBaseStorage
[ https://issues.apache.org/jira/browse/PIG-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031024#comment-13031024 ] Dmitriy V. Ryaboy commented on PIG-1825: btw, skipping WAL is a BAD idea. Cause, no WAL, even during *normal* operation. WALs are useful for many things, recovery being only one of them. >From the HBase book at >http://hbase.apache.org/book.html#perf.hbase.client.putwal : 13.7.7. Turn off WAL on Puts A frequently discussed option for increasing throughput on Puts is to call writeToWAL(false). Turning this off means that the RegionServer will not write the Put to the Write Ahead Log, only into the memstore, HOWEVER the consequence is that if there is a RegionServer failure there will be data loss. If writeToWAL(false) is used, do so with extreme caution. You may find in actuality that it makes little difference if your load is well distributed across the cluster. In general, it is best to use WAL for Puts, and where loading throughput is a concern to use bulk loading techniques instead. > ability to turn off the write ahead log for pig's HBaseStorage > -- > > Key: PIG-1825 > URL: https://issues.apache.org/jira/browse/PIG-1825 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0 >Reporter: Corbin Hoenes >Priority: Minor > Attachments: HBaseStorage_noWAL.patch, PIG-1825_1.patch > > > Added an option to allow a caller of HBaseStorage to turn off the > WriteAheadLog feature while doing bulk loads into hbase. > From the performance tuning wikipage: > http://wiki.apache.org/hadoop/PerformanceTuning > "To speed up the inserts in a non critical job (like an import job), you can > use Put.writeToWAL(false) to bypass writing to the write ahead log." > We've tested this on HBase 0.20.6 and it helps dramatically. > The -noWAL options is passed in just like other options for hbase storage: > STORE myalias INTO 'MyTable' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('mycolumnfamily:field1 > mycolumnfamily:field2','-noWAL'); > This would be my first patch so please educate me with any steps I need to > do. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1825) ability to turn off the write ahead log for pig's HBaseStorage
[ https://issues.apache.org/jira/browse/PIG-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031012#comment-13031012 ] Dmitriy V. Ryaboy commented on PIG-1825: The patch is really straightforward and the test doesn't actually test the patch, except to make sure the argument doesn't break parsing. WAL behavior is not actually verified. Two things we can do here: 1) make a createPut() method in HBStorage, call it from putNext(), and in a test create our own HBS, call createPut(), and check that put.getWriteToWal() returns the right value 2) ignore the trivial test. Option 1 is the right thing to do, 2 I can probably be convinced of. As is we shouldn't commit, since the test just adds extra time to unit tests without doing much useful work. > ability to turn off the write ahead log for pig's HBaseStorage > -- > > Key: PIG-1825 > URL: https://issues.apache.org/jira/browse/PIG-1825 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0 >Reporter: Corbin Hoenes >Priority: Minor > Attachments: HBaseStorage_noWAL.patch, PIG-1825_1.patch > > > Added an option to allow a caller of HBaseStorage to turn off the > WriteAheadLog feature while doing bulk loads into hbase. > From the performance tuning wikipage: > http://wiki.apache.org/hadoop/PerformanceTuning > "To speed up the inserts in a non critical job (like an import job), you can > use Put.writeToWAL(false) to bypass writing to the write ahead log." > We've tested this on HBase 0.20.6 and it helps dramatically. > The -noWAL options is passed in just like other options for hbase storage: > STORE myalias INTO 'MyTable' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('mycolumnfamily:field1 > mycolumnfamily:field2','-noWAL'); > This would be my first patch so please educate me with any steps I need to > do. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1825) ability to turn off the write ahead log for pig's HBaseStorage
[ https://issues.apache.org/jira/browse/PIG-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990759#comment-12990759 ] Alan Gates commented on PIG-1825: - Unit tests pass. The output of test-patch: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] [exec] As this points out, the functionality isn't tested. Before we can check it in we'll need a test added to the hbase unit tests that shows that you can write to hbase with this option set. > ability to turn off the write ahead log for pig's HBaseStorage > -- > > Key: PIG-1825 > URL: https://issues.apache.org/jira/browse/PIG-1825 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0 >Reporter: Corbin Hoenes >Priority: Minor > Fix For: 0.8.0 > > Attachments: HBaseStorage_noWAL.patch > > > Added an option to allow a caller of HBaseStorage to turn off the > WriteAheadLog feature while doing bulk loads into hbase. > From the performance tuning wikipage: > http://wiki.apache.org/hadoop/PerformanceTuning > "To speed up the inserts in a non critical job (like an import job), you can > use Put.writeToWAL(false) to bypass writing to the write ahead log." > We've tested this on HBase 0.20.6 and it helps dramatically. > The -noWAL options is passed in just like other options for hbase storage: > STORE myalias INTO 'MyTable' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('mycolumnfamily:field1 > mycolumnfamily:field2','-noWAL'); > This would be my first patch so please educate me with any steps I need to > do. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1825) ability to turn off the write ahead log for pig's HBaseStorage
[ https://issues.apache.org/jira/browse/PIG-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990274#comment-12990274 ] Alan Gates commented on PIG-1825: - Starting unit tests and test-patch > ability to turn off the write ahead log for pig's HBaseStorage > -- > > Key: PIG-1825 > URL: https://issues.apache.org/jira/browse/PIG-1825 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0 >Reporter: Corbin Hoenes >Priority: Minor > Fix For: 0.8.0 > > Attachments: HBaseStorage_noWAL.patch > > > Added an option to allow a caller of HBaseStorage to turn off the > WriteAheadLog feature while doing bulk loads into hbase. > From the performance tuning wikipage: > http://wiki.apache.org/hadoop/PerformanceTuning > "To speed up the inserts in a non critical job (like an import job), you can > use Put.writeToWAL(false) to bypass writing to the write ahead log." > We've tested this on HBase 0.20.6 and it helps dramatically. > The -noWAL options is passed in just like other options for hbase storage: > STORE myalias INTO 'MyTable' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('mycolumnfamily:field1 > mycolumnfamily:field2','-noWAL'); > This would be my first patch so please educate me with any steps I need to > do. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1825) ability to turn off the write ahead log for pig's HBaseStorage
[ https://issues.apache.org/jira/browse/PIG-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989436#comment-12989436 ] Dmitriy V. Ryaboy commented on PIG-1825: Sounds fine to me (though I haven't read the patch yet). HBase 0.90 has significant speed improvements but I imagine it still writes a WAL and you can still turn it off. > ability to turn off the write ahead log for pig's HBaseStorage > -- > > Key: PIG-1825 > URL: https://issues.apache.org/jira/browse/PIG-1825 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0 >Reporter: Corbin Hoenes >Priority: Minor > Fix For: 0.8.0 > > Attachments: HBaseStorage_noWAL.patch > > > Added an option to allow a caller of HBaseStorage to turn off the > WriteAheadLog feature while doing bulk loads into hbase. > From the performance tuning wikipage: > http://wiki.apache.org/hadoop/PerformanceTuning > "To speed up the inserts in a non critical job (like an import job), you can > use Put.writeToWAL(false) to bypass writing to the write ahead log." > We've tested this on HBase 0.20.6 and it helps dramatically. > The -noWAL options is passed in just like other options for hbase storage: > STORE myalias INTO 'MyTable' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('mycolumnfamily:field1 > mycolumnfamily:field2','-noWAL'); > This would be my first patch so please educate me with any steps I need to > do. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1825) ability to turn off the write ahead log for pig's HBaseStorage
[ https://issues.apache.org/jira/browse/PIG-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989330#comment-12989330 ] Alan Gates commented on PIG-1825: - Dmitriy, is this something we should check in? You seemed to indicate that this was no longer necessary after we moved to HBase 0.89 or above. > ability to turn off the write ahead log for pig's HBaseStorage > -- > > Key: PIG-1825 > URL: https://issues.apache.org/jira/browse/PIG-1825 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0 >Reporter: Corbin Hoenes >Priority: Minor > Fix For: 0.8.0 > > Attachments: HBaseStorage_noWAL.patch > > > Added an option to allow a caller of HBaseStorage to turn off the > WriteAheadLog feature while doing bulk loads into hbase. > From the performance tuning wikipage: > http://wiki.apache.org/hadoop/PerformanceTuning > "To speed up the inserts in a non critical job (like an import job), you can > use Put.writeToWAL(false) to bypass writing to the write ahead log." > We've tested this on HBase 0.20.6 and it helps dramatically. > The -noWAL options is passed in just like other options for hbase storage: > STORE myalias INTO 'MyTable' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('mycolumnfamily:field1 > mycolumnfamily:field2','-noWAL'); > This would be my first patch so please educate me with any steps I need to > do. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira