Hi all, We use pig 0.20.1 from a release distribution (no trunk).
The hbase-0.20.1/conf/hbase-site.xml is included in CLASSPATH. The hbase-site.xml content (is this correct?): <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hbase.rootdir</name> <value>hdfs://localhost:9000/hbase</value> <description>The directory shared by region servers. </description> </property> </configuration> Is my URL to the database correct ('hbase://test') and which pig version should I use (mapreduce or local?) Now I get different results and a stacktrace (from the commands below). Maybe this is helpful; the bold part in the stacktrace indicates a wrong URL to the databasetable, is this an correct interpretation of the stacktrace? pig -x mapreduce grunt> B = load 'hbase://test' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('data') as (col_a); grunt> dump B; still the same result: Retrying connect to server: localhost/127.0.0.1:60000 < http://127.0.0.1:60000>. Already tried 0 time(s). pig-0.5.0/bin/pig -x local grunt> B = load 'hbase://test' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('data') as (col_a); grunt> dump B; 2009-11-20 17:15:01,725 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1002: Unable to store alias B Details at logfile: /Users/jorislops/Desktop/pig_1258733695965.log Pig Stack Trace --------------- ERROR 1002: Unable to store alias B org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias B at org.apache.pig.PigServer.openIterator(PigServer.java:475) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:363) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias B at org.apache.pig.PigServer.store(PigServer.java:530) at org.apache.pig.PigServer.openIterator(PigServer.java:458) ... 6 more Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Wrong FS: hbase://test, expected: file:/// at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:184) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773) at org.apache.pig.PigServer.store(PigServer.java:522) ... 7 more *Caused by: java.lang.IllegalArgumentException: Wrong FS: hbase://test, expected: file:///* at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:305) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:357) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:643) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:147) at org.apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:532) at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:346) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.setUp(POLoad.java:103) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNext(POLoad.java:131) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109) at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165) ... 9 more Thanks, Joris 2009/11/19 Jeff Zhang <zjf...@gmail.com> > Hi Morris, > > Do you use the pig in trunk ? > If you want to use hbase, you should put hbase configuration in > hbase-site.xml, and put this file on your classpath. > > > Jeff Zhang > > > On Thu, Nov 19, 2009 at 8:20 AM, Morris Swertz <m.a.swe...@rug.nl> wrote: > > > > > Hi all, > > > > I try to load data from HBase into pig with HBaseStorage. Something is > > going wrong because no data from HBase (test table) shows up in Pig; only > > errors. > > > > I configured the Hadoop and HBase in Pseudo-Distributed Operation mode. > > > > What follows are the commands that I did and the output it produced. > > > > > > //try with pig in remote mode! > > > > pig -x mapreduce > > > > B = load 'hbase://test' using > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('data') as (col_a); > > > > dump B; > > > > output: > > > > 009-11-19 13:56:02,810 [main] INFO > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > > - MR plan size before optimization: 1 > > > > 2009-11-19 13:56:02,810 [main] INFO > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > > - MR plan size after optimization: 1 > > > > 2009-11-19 13:56:04,708 [main] INFO > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > > - Setting up single store job > > > > 2009-11-19 13:56:04,729 [main] INFO > > org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics > > with processName=JobTracker, sessionId= - already initialized > > > > 2009-11-19 13:56:04,739 [Thread-5] WARN > org.apache.hadoop.mapred.JobClient > > - Use GenericOptionsParser for parsing the arguments. Applications should > > implement Tool for the same. > > > > 2009-11-19 13:56:05,024 [Thread-5] INFO > > org.apache.pig.backend.hadoop.hbase.HBaseStorage - tablename: > > file:/Users/jorislops/Desktop/pig-0.5.0/test > > > > 2009-11-19 13:56:05,231 [main] INFO > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > - 0% complete > > > > 2009-11-19 13:56:06,222 [Thread-5] INFO org.apache.hadoop.ipc.Client - > > Retrying connect to server: localhost/127.0.0.1:60000 < > > http://127.0.0.1:60000>. Already tried 0 time(s). > > > > 2009-11-19 13:56:06,222 [Thread-5] INFO org.apache.hadoop.ipc.Client - > > Retrying connect to server: localhost/127.0.0.1:60000 < > > http://127.0.0.1:60000>. Already tried 1 time(s). > > > > > > > > //port 60000 is used by a java program > > > > > > > > pig -x local > > > > B = load 'test' using > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('data') as (col_a); > > > > dump B; > > > > output: > > > > 2009-11-19 13:53:18,425 [main] INFO > > org.apache.pig.backend.local.executionengine.LocalPigLauncher - > > Successfully stored result in: "file:/tmp/temp-1663248768/tmp-1939618752" > > > > 2009-11-19 13:53:18,436 [main] INFO > > org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records > > written : 0 > > > > 2009-11-19 13:53:18,436 [main] INFO > > org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes > > written : 0 > > > > 2009-11-19 13:53:18,436 [main] INFO > > org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% > > complete! > > > > 2009-11-19 13:53:18,436 [main] INFO > > org.apache.pig.backend.local.executionengine.LocalPigLauncher - > Success!! > > > > //there is nothing in /tmp/temp-1663248768/tmp-1939618752 (it's empty) > > > > > > > > I tried different paths to the HBase table 'hbase://test', 'test', > > hbase://localhost:60000/test > > > > > > > > How I stated the system (Hadoop + HBase) is started and I verified that's > > working as I expected. > > > > > > > > bin/hadoop namenode -format > > > > bin/start-all.sh > > > > //both Namenode and Jobtrackter are running verified by > > http://localhost:50070 and http://localhost:500040 > > > > > > > > bin/start-hbase.sh > > > > //both mater and regionserver are running check by localhost:60010 > > localhost:20 localhost:30 > > > > //also zookeeper Quorum is started at port localhost:2181 > > > > > > > > //fill a test table in hbase > > > > hbase-0.20.1/bin/hbase shell > > > > create 'test', 'data' > > > > put 'test', 'row1', 'data', 'value1' > > > > scan 'test' > > > > //localhost:60010 show that the test table is in HBase. > > > > > > > > Hope that someone knows the solution. > > > > Thanks, > > > > Joris > > > > > > > > > > > > > > > > > > > > > > >