Hey Dmitry: I took a quick look.
Your files are missing a copyright? I like your using of BinaryComparatory and the lte, gte, options in skipRegion setting up filters. Regards: " // No way to know max.. just return 0. Sorry, reporting on the last slice is janky. // So is reporting on the first slice, by the way -- it will start out too high, possibly at 100%. if (endRow_.length==0) return 0; " ...if your keys are kinda regular, you might be able to do better in a slice. See in Bytes where there are methods that do BigDecimal math. You can ask them to divide the slice. Might work. Then you could do progress (Looks like you are doing some later in the file -- does it work?). Try to use the same version of " HBaseConfiguration conf = new HBaseConfiguration();" throughout rather than create a new one each time. Can be more costly. Whats this? if (counterHelper_ == null) counterHelper_ = new PigCounterHelper(); A pig counter? You don't want to use hbase counters? Whats the lzo stuff about? It seems to be for loading files. Are you lzo'ing your hbase content? Oh man ... base64'ing.... There are two files w/ mention of hbase, is that right? St.Ack On Mon, May 3, 2010 at 12:23 PM, Dmitriy Ryaboy <dmit...@twitter.com> wrote: > Hi folks, > I recently rewrote the Pig HBase loader to work with binary data, push down > filters, and do other things that make it more versatile. > If you use, or plan to use, both Pig and HBase, please try it out, take a > look at the code, let me know what you think. I am just starting to learn > about HBase, so I am especially interested to learn if there are HBase > capabilities I am not using and should be. > > The code is part of our "ElephantBird" project, here: > > http://github.com/kevinweil/elephant-bird/ > and more specifically: > http://github.com/kevinweil/elephant-bird/tree/master/src/java/com/twitter/elephantbird/pig/load/ > > Thanks, > -Dmitriy >