HbaseStorage is broken in Pig 0.5.0, see https://issues.apache.org/jira/browse/PIG-970
The fix for that has been checked into trunk. You can either check
out from trunk and build to get that, or can check out from the 0.5.0
branch and then apply the patches in PIG-970 to that code base. If
you choose the latter let me know and I'll help you figure out which
files in the JIRA you actually need.
Alan.
On Nov 20, 2009, at 8:35 AM, joris lops wrote:
Hi all,
We use pig 0.20.1 from a release distribution (no trunk).
The hbase-0.20.1/conf/hbase-site.xml is included in CLASSPATH.
The hbase-site.xml content (is this correct?):
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
<description>The directory shared by region servers.
</description>
</property>
</configuration>
Is my URL to the database correct ('hbase://test') and which pig
version
should I use (mapreduce or local?)
Now I get different results and a stacktrace (from the commands
below).
Maybe this is helpful; the bold part in the stacktrace indicates a
wrong URL
to the databasetable, is this an correct interpretation of the
stacktrace?
pig -x mapreduce
grunt> B = load 'hbase://test' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('data') as (col_a);
grunt> dump B;
still the same result:
Retrying connect to server: localhost/127.0.0.1:60000 <
http://127.0.0.1:60000>. Already tried 0 time(s).
pig-0.5.0/bin/pig -x local
grunt> B = load 'hbase://test' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('data') as (col_a);
grunt> dump B;
2009-11-20 17:15:01,725 [main] ERROR
org.apache.pig.tools.grunt.Grunt -
ERROR 1002: Unable to store alias B
Details at logfile: /Users/jorislops/Desktop/pig_1258733695965.log
Pig Stack Trace
---------------
ERROR 1002: Unable to store alias B
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066:
Unable to
open iterator for alias B
at org.apache.pig.PigServer.openIterator(PigServer.java:475)
at
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:
532)
at
org
.apache
.pig
.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:
190)
at
org
.apache
.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
at
org
.apache
.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
at org.apache.pig.Main.main(Main.java:363)
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR
1002:
Unable to store alias B
at org.apache.pig.PigServer.store(PigServer.java:530)
at org.apache.pig.PigServer.openIterator(PigServer.java:458)
... 6 more
Caused by: org.apache.pig.backend.executionengine.ExecException:
ERROR 0:
Wrong FS: hbase://test, expected: file:///
at
org
.apache
.pig
.backend
.local
.executionengine
.LocalExecutionEngine.execute(LocalExecutionEngine.java:184)
at
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:
773)
at org.apache.pig.PigServer.store(PigServer.java:522)
... 7 more
*Caused by: java.lang.IllegalArgumentException: Wrong FS: hbase://
test,
expected: file:///*
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:305)
at
org
.apache
.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47)
at
org
.apache
.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:
357)
at
org
.apache
.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:643)
at
org
.apache
.pig
.backend
.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203)
at
org
.apache
.pig
.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:
131)
at
org
.apache
.pig
.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:
147)
at
org.apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:532)
at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:
346)
at
org
.apache
.pig
.backend
.hadoop
.executionengine
.physicalLayer.relationalOperators.POLoad.setUp(POLoad.java:103)
at
org
.apache
.pig
.backend
.hadoop
.executionengine
.physicalLayer.relationalOperators.POLoad.getNext(POLoad.java:131)
at
org
.apache
.pig
.backend
.hadoop
.executionengine
.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:
231)
at
org
.apache
.pig
.backend
.local
.executionengine
.physicalLayer.counters.POCounter.getNext(POCounter.java:71)
at
org
.apache
.pig
.backend
.hadoop
.executionengine
.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:
231)
at
org
.apache
.pig
.backend
.hadoop
.executionengine
.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
at
org
.apache
.pig
.backend
.local
.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:
146)
at
org
.apache
.pig
.backend
.local
.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109)
at
org
.apache
.pig
.backend
.local
.executionengine
.LocalExecutionEngine.execute(LocalExecutionEngine.java:165)
... 9 more
Thanks,
Joris
2009/11/19 Jeff Zhang <zjf...@gmail.com>
Hi Morris,
Do you use the pig in trunk ?
If you want to use hbase, you should put hbase configuration in
hbase-site.xml, and put this file on your classpath.
Jeff Zhang
On Thu, Nov 19, 2009 at 8:20 AM, Morris Swertz <m.a.swe...@rug.nl>
wrote:
Hi all,
I try to load data from HBase into pig with HBaseStorage.
Something is
going wrong because no data from HBase (test table) shows up in
Pig; only
errors.
I configured the Hadoop and HBase in Pseudo-Distributed Operation
mode.
What follows are the commands that I did and the output it produced.
//try with pig in remote mode!
pig -x mapreduce
B = load 'hbase://test' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('data') as (col_a);
dump B;
output:
009-11-19 13:56:02,810 [main] INFO
org
.apache
.pig
.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2009-11-19 13:56:02,810 [main] INFO
org
.apache
.pig
.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2009-11-19 13:56:04,708 [main] INFO
org
.apache
.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2009-11-19 13:56:04,729 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics
with processName=JobTracker, sessionId= - already initialized
2009-11-19 13:56:04,739 [Thread-5] WARN
org.apache.hadoop.mapred.JobClient
- Use GenericOptionsParser for parsing the arguments. Applications
should
implement Tool for the same.
2009-11-19 13:56:05,024 [Thread-5] INFO
org.apache.pig.backend.hadoop.hbase.HBaseStorage - tablename:
file:/Users/jorislops/Desktop/pig-0.5.0/test
2009-11-19 13:56:05,231 [main] INFO
org
.apache
.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2009-11-19 13:56:06,222 [Thread-5] INFO
org.apache.hadoop.ipc.Client -
Retrying connect to server: localhost/127.0.0.1:60000 <
http://127.0.0.1:60000>. Already tried 0 time(s).
2009-11-19 13:56:06,222 [Thread-5] INFO
org.apache.hadoop.ipc.Client -
Retrying connect to server: localhost/127.0.0.1:60000 <
http://127.0.0.1:60000>. Already tried 1 time(s).
//port 60000 is used by a java program
pig -x local
B = load 'test' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('data') as (col_a);
dump B;
output:
2009-11-19 13:53:18,425 [main] INFO
org.apache.pig.backend.local.executionengine.LocalPigLauncher -
Successfully stored result in: "file:/tmp/temp-1663248768/
tmp-1939618752"
2009-11-19 13:53:18,436 [main] INFO
org.apache.pig.backend.local.executionengine.LocalPigLauncher -
Records
written : 0
2009-11-19 13:53:18,436 [main] INFO
org.apache.pig.backend.local.executionengine.LocalPigLauncher -
Bytes
written : 0
2009-11-19 13:53:18,436 [main] INFO
org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100%
complete!
2009-11-19 13:53:18,436 [main] INFO
org.apache.pig.backend.local.executionengine.LocalPigLauncher -
Success!!
//there is nothing in /tmp/temp-1663248768/tmp-1939618752 (it's
empty)
I tried different paths to the HBase table 'hbase://test', 'test',
hbase://localhost:60000/test
How I stated the system (Hadoop + HBase) is started and I verified
that's
working as I expected.
bin/hadoop namenode -format
bin/start-all.sh
//both Namenode and Jobtrackter are running verified by
http://localhost:50070 and http://localhost:500040
bin/start-hbase.sh
//both mater and regionserver are running check by localhost:60010
localhost:20 localhost:30
//also zookeeper Quorum is started at port localhost:2181
//fill a test table in hbase
hbase-0.20.1/bin/hbase shell
create 'test', 'data'
put 'test', 'row1', 'data', 'value1'
scan 'test'
//localhost:60010 show that the test table is in HBase.
Hope that someone knows the solution.
Thanks,
Joris