[jira] [Commented] (PIG-1825) ability to turn off the write ahead log for pig's HBaseStorage

2011-05-10 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031068#comment-13031068
 ] 

Dmitriy V. Ryaboy commented on PIG-1825:


Cool. At this point I don't think we need testStoreToHBase_2_no_WAL() ?

HBase itself doesn't actually test noWAL directly. I'm ok with not testing the 
full path, just testing that we are using the HBase api correctly.

I do almost want to make it "-noSafety" just to be clear about what one is 
doing when invoking this "optimization"


> ability to turn off the write ahead log for pig's HBaseStorage
> --
>
> Key: PIG-1825
> URL: https://issues.apache.org/jira/browse/PIG-1825
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Corbin Hoenes
>Priority: Minor
> Attachments: HBaseStorage_noWAL.patch, PIG-1825_1.patch, 
> PIG-1825_2.patch
>
>
> Added an option to allow a caller of HBaseStorage to turn off the 
> WriteAheadLog feature while doing bulk loads into hbase.
> From the performance tuning wikipage: 
> http://wiki.apache.org/hadoop/PerformanceTuning
> "To speed up the inserts in a non critical job (like an import job), you can 
> use Put.writeToWAL(false) to bypass writing to the write ahead log."
> We've tested this on HBase 0.20.6 and it helps dramatically.  
> The -noWAL options is passed in just like other options for hbase storage:
> STORE myalias INTO 'MyTable' USING 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('mycolumnfamily:field1 
> mycolumnfamily:field2','-noWAL');
> This would be my first patch so please educate me with any steps I need to 
> do.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1825) ability to turn off the write ahead log for pig's HBaseStorage

2011-05-09 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031024#comment-13031024
 ] 

Dmitriy V. Ryaboy commented on PIG-1825:


btw, skipping WAL is a BAD idea. Cause, no WAL, even during *normal* operation. 
WALs are useful for many things, recovery being only one of them.

>From the HBase book at 
>http://hbase.apache.org/book.html#perf.hbase.client.putwal :

13.7.7. Turn off WAL on Puts
A frequently discussed option for increasing throughput on Puts is to call 
writeToWAL(false). Turning this off means that the RegionServer will not write 
the Put to the Write Ahead Log, only into the memstore, HOWEVER the consequence 
is that if there is a RegionServer failure there will be data loss. If 
writeToWAL(false) is used, do so with extreme caution. You may find in 
actuality that it makes little difference if your load is well distributed 
across the cluster.

In general, it is best to use WAL for Puts, and where loading throughput is a 
concern to use bulk loading techniques instead.

> ability to turn off the write ahead log for pig's HBaseStorage
> --
>
> Key: PIG-1825
> URL: https://issues.apache.org/jira/browse/PIG-1825
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Corbin Hoenes
>Priority: Minor
> Attachments: HBaseStorage_noWAL.patch, PIG-1825_1.patch
>
>
> Added an option to allow a caller of HBaseStorage to turn off the 
> WriteAheadLog feature while doing bulk loads into hbase.
> From the performance tuning wikipage: 
> http://wiki.apache.org/hadoop/PerformanceTuning
> "To speed up the inserts in a non critical job (like an import job), you can 
> use Put.writeToWAL(false) to bypass writing to the write ahead log."
> We've tested this on HBase 0.20.6 and it helps dramatically.  
> The -noWAL options is passed in just like other options for hbase storage:
> STORE myalias INTO 'MyTable' USING 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('mycolumnfamily:field1 
> mycolumnfamily:field2','-noWAL');
> This would be my first patch so please educate me with any steps I need to 
> do.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1825) ability to turn off the write ahead log for pig's HBaseStorage

2011-05-09 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031012#comment-13031012
 ] 

Dmitriy V. Ryaboy commented on PIG-1825:


The patch is really straightforward and the test doesn't actually test the 
patch, except to make sure the argument doesn't break parsing.  WAL behavior is 
not actually verified.

Two things we can do here: 
1) make a createPut() method in HBStorage, call it from putNext(), and in a 
test create our own HBS, call createPut(), and check that put.getWriteToWal() 
returns the right value
2) ignore the trivial test.

Option 1 is the right thing to do, 2 I can probably be convinced of. As is we 
shouldn't commit, since the test just adds extra time to unit tests without 
doing much useful work.

> ability to turn off the write ahead log for pig's HBaseStorage
> --
>
> Key: PIG-1825
> URL: https://issues.apache.org/jira/browse/PIG-1825
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Corbin Hoenes
>Priority: Minor
> Attachments: HBaseStorage_noWAL.patch, PIG-1825_1.patch
>
>
> Added an option to allow a caller of HBaseStorage to turn off the 
> WriteAheadLog feature while doing bulk loads into hbase.
> From the performance tuning wikipage: 
> http://wiki.apache.org/hadoop/PerformanceTuning
> "To speed up the inserts in a non critical job (like an import job), you can 
> use Put.writeToWAL(false) to bypass writing to the write ahead log."
> We've tested this on HBase 0.20.6 and it helps dramatically.  
> The -noWAL options is passed in just like other options for hbase storage:
> STORE myalias INTO 'MyTable' USING 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('mycolumnfamily:field1 
> mycolumnfamily:field2','-noWAL');
> This would be my first patch so please educate me with any steps I need to 
> do.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (PIG-1825) ability to turn off the write ahead log for pig's HBaseStorage

2011-02-04 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990759#comment-12990759
 ] 

Alan Gates commented on PIG-1825:
-

Unit tests pass.  The output of test-patch:

[exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no tests are needed for 
this patch.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec]
 [exec]
 [exec]

As this points out, the functionality isn't tested.  Before we can check it in 
we'll need a test added to the hbase unit tests that shows that you can write 
to hbase with this option set.

> ability to turn off the write ahead log for pig's HBaseStorage
> --
>
> Key: PIG-1825
> URL: https://issues.apache.org/jira/browse/PIG-1825
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Corbin Hoenes
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: HBaseStorage_noWAL.patch
>
>
> Added an option to allow a caller of HBaseStorage to turn off the 
> WriteAheadLog feature while doing bulk loads into hbase.
> From the performance tuning wikipage: 
> http://wiki.apache.org/hadoop/PerformanceTuning
> "To speed up the inserts in a non critical job (like an import job), you can 
> use Put.writeToWAL(false) to bypass writing to the write ahead log."
> We've tested this on HBase 0.20.6 and it helps dramatically.  
> The -noWAL options is passed in just like other options for hbase storage:
> STORE myalias INTO 'MyTable' USING 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('mycolumnfamily:field1 
> mycolumnfamily:field2','-noWAL');
> This would be my first patch so please educate me with any steps I need to 
> do.  

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (PIG-1825) ability to turn off the write ahead log for pig's HBaseStorage

2011-02-03 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990274#comment-12990274
 ] 

Alan Gates commented on PIG-1825:
-

Starting unit tests and test-patch

> ability to turn off the write ahead log for pig's HBaseStorage
> --
>
> Key: PIG-1825
> URL: https://issues.apache.org/jira/browse/PIG-1825
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Corbin Hoenes
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: HBaseStorage_noWAL.patch
>
>
> Added an option to allow a caller of HBaseStorage to turn off the 
> WriteAheadLog feature while doing bulk loads into hbase.
> From the performance tuning wikipage: 
> http://wiki.apache.org/hadoop/PerformanceTuning
> "To speed up the inserts in a non critical job (like an import job), you can 
> use Put.writeToWAL(false) to bypass writing to the write ahead log."
> We've tested this on HBase 0.20.6 and it helps dramatically.  
> The -noWAL options is passed in just like other options for hbase storage:
> STORE myalias INTO 'MyTable' USING 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('mycolumnfamily:field1 
> mycolumnfamily:field2','-noWAL');
> This would be my first patch so please educate me with any steps I need to 
> do.  

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (PIG-1825) ability to turn off the write ahead log for pig's HBaseStorage

2011-02-01 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989436#comment-12989436
 ] 

Dmitriy V. Ryaboy commented on PIG-1825:


Sounds fine to me (though I haven't read the patch yet).
HBase 0.90 has significant speed improvements but I imagine it still writes a 
WAL and you can still turn it off.

> ability to turn off the write ahead log for pig's HBaseStorage
> --
>
> Key: PIG-1825
> URL: https://issues.apache.org/jira/browse/PIG-1825
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Corbin Hoenes
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: HBaseStorage_noWAL.patch
>
>
> Added an option to allow a caller of HBaseStorage to turn off the 
> WriteAheadLog feature while doing bulk loads into hbase.
> From the performance tuning wikipage: 
> http://wiki.apache.org/hadoop/PerformanceTuning
> "To speed up the inserts in a non critical job (like an import job), you can 
> use Put.writeToWAL(false) to bypass writing to the write ahead log."
> We've tested this on HBase 0.20.6 and it helps dramatically.  
> The -noWAL options is passed in just like other options for hbase storage:
> STORE myalias INTO 'MyTable' USING 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('mycolumnfamily:field1 
> mycolumnfamily:field2','-noWAL');
> This would be my first patch so please educate me with any steps I need to 
> do.  

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (PIG-1825) ability to turn off the write ahead log for pig's HBaseStorage

2011-02-01 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989330#comment-12989330
 ] 

Alan Gates commented on PIG-1825:
-

Dmitriy, is this something we should check in?  You seemed to indicate that 
this was no longer necessary after we moved to HBase 0.89 or above.

> ability to turn off the write ahead log for pig's HBaseStorage
> --
>
> Key: PIG-1825
> URL: https://issues.apache.org/jira/browse/PIG-1825
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Corbin Hoenes
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: HBaseStorage_noWAL.patch
>
>
> Added an option to allow a caller of HBaseStorage to turn off the 
> WriteAheadLog feature while doing bulk loads into hbase.
> From the performance tuning wikipage: 
> http://wiki.apache.org/hadoop/PerformanceTuning
> "To speed up the inserts in a non critical job (like an import job), you can 
> use Put.writeToWAL(false) to bypass writing to the write ahead log."
> We've tested this on HBase 0.20.6 and it helps dramatically.  
> The -noWAL options is passed in just like other options for hbase storage:
> STORE myalias INTO 'MyTable' USING 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('mycolumnfamily:field1 
> mycolumnfamily:field2','-noWAL');
> This would be my first patch so please educate me with any steps I need to 
> do.  

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira