HbaseStorage is broken in Pig 0.5.0, see https://issues.apache.org/jira/browse/PIG-970 The fix for that has been checked into trunk. You can either check out from trunk and build to get that, or can check out from the 0.5.0 branch and then apply the patches in PIG-970 to that code base. If you choose the latter let me know and I'll help you figure out which files in the JIRA you actually need.

Alan.

On Nov 20, 2009, at 8:35 AM, joris lops wrote:

Hi all,

We use pig 0.20.1 from a release distribution (no trunk).

The hbase-0.20.1/conf/hbase-site.xml is included in CLASSPATH.
The hbase-site.xml content (is this correct?):
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
 <property>
   <name>hbase.rootdir</name>
   <value>hdfs://localhost:9000/hbase</value>
   <description>The directory shared by region servers.
   </description>
 </property>
</configuration>

Is my URL to the database correct ('hbase://test') and which pig version
should I use (mapreduce or local?)
Now I get different results and a stacktrace (from the commands below). Maybe this is helpful; the bold part in the stacktrace indicates a wrong URL to the databasetable, is this an correct interpretation of the stacktrace?

pig -x mapreduce
grunt> B = load 'hbase://test' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('data') as (col_a);
grunt> dump B;
still the same result:
Retrying connect to server: localhost/127.0.0.1:60000 <
http://127.0.0.1:60000>. Already tried 0 time(s).

pig-0.5.0/bin/pig -x local
grunt> B = load 'hbase://test' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('data') as (col_a);
grunt> dump B;
2009-11-20 17:15:01,725 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1002: Unable to store alias B
Details at logfile: /Users/jorislops/Desktop/pig_1258733695965.log

Pig Stack Trace
---------------
ERROR 1002: Unable to store alias B

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
open iterator for alias B
   at org.apache.pig.PigServer.openIterator(PigServer.java:475)
   at
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java: 532)
   at
org .apache .pig .tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java: 190)
   at
org .apache .pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
   at
org .apache .pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
   at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
   at org.apache.pig.Main.main(Main.java:363)
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002:
Unable to store alias B
   at org.apache.pig.PigServer.store(PigServer.java:530)
   at org.apache.pig.PigServer.openIterator(PigServer.java:458)
   ... 6 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0:
Wrong FS: hbase://test, expected: file:///
   at
org .apache .pig .backend .local .executionengine .LocalExecutionEngine.execute(LocalExecutionEngine.java:184)
   at
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java: 773)
   at org.apache.pig.PigServer.store(PigServer.java:522)
   ... 7 more
*Caused by: java.lang.IllegalArgumentException: Wrong FS: hbase:// test,
expected: file:///*
   at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:305)
   at
org .apache .hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47)
   at
org .apache .hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java: 357)
   at
org .apache .hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
   at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:643)
   at
org .apache .pig .backend .hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203)
   at
org .apache .pig .backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java: 131)
   at
org .apache .pig .backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java: 147) at org.apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:532) at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java: 346)
   at
org .apache .pig .backend .hadoop .executionengine .physicalLayer.relationalOperators.POLoad.setUp(POLoad.java:103)
   at
org .apache .pig .backend .hadoop .executionengine .physicalLayer.relationalOperators.POLoad.getNext(POLoad.java:131)
   at
org .apache .pig .backend .hadoop .executionengine .physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java: 231)
   at
org .apache .pig .backend .local .executionengine .physicalLayer.counters.POCounter.getNext(POCounter.java:71)
   at
org .apache .pig .backend .hadoop .executionengine .physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java: 231)
   at
org .apache .pig .backend .hadoop .executionengine .physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
   at
org .apache .pig .backend .local .executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java: 146)
   at
org .apache .pig .backend .local .executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109)
   at
org .apache .pig .backend .local .executionengine .LocalExecutionEngine.execute(LocalExecutionEngine.java:165)
   ... 9 more

Thanks,
Joris









2009/11/19 Jeff Zhang <zjf...@gmail.com>

Hi Morris,

Do you use the pig in trunk ?
If you want to use hbase, you should put hbase configuration in
hbase-site.xml, and put this file on your classpath.


Jeff Zhang


On Thu, Nov 19, 2009 at 8:20 AM, Morris Swertz <m.a.swe...@rug.nl> wrote:


Hi all,

I try to load data from HBase into pig with HBaseStorage. Something is going wrong because no data from HBase (test table) shows up in Pig; only
errors.

I configured the Hadoop and HBase in Pseudo-Distributed Operation mode.

What follows are the commands that I did and the output it produced.


//try with pig in remote mode!

pig -x mapreduce

B = load 'hbase://test' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('data') as (col_a);

dump B;

output:

009-11-19 13:56:02,810 [main] INFO

org .apache .pig .backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1

2009-11-19 13:56:02,810 [main] INFO

org .apache .pig .backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1

2009-11-19 13:56:04,708 [main] INFO

org .apache .pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job

2009-11-19 13:56:04,729 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2009-11-19 13:56:04,739 [Thread-5] WARN
org.apache.hadoop.mapred.JobClient
- Use GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.

2009-11-19 13:56:05,024 [Thread-5] INFO
org.apache.pig.backend.hadoop.hbase.HBaseStorage - tablename:
file:/Users/jorislops/Desktop/pig-0.5.0/test

2009-11-19 13:56:05,231 [main] INFO

org .apache .pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete

2009-11-19 13:56:06,222 [Thread-5] INFO org.apache.hadoop.ipc.Client -
Retrying connect to server: localhost/127.0.0.1:60000 <
http://127.0.0.1:60000>. Already tried 0 time(s).

2009-11-19 13:56:06,222 [Thread-5] INFO org.apache.hadoop.ipc.Client -
Retrying connect to server: localhost/127.0.0.1:60000 <
http://127.0.0.1:60000>. Already tried 1 time(s).



//port 60000 is used by a java program



pig -x local

B = load 'test' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('data') as (col_a);

dump B;

output:

2009-11-19 13:53:18,425 [main] INFO
org.apache.pig.backend.local.executionengine.LocalPigLauncher -
Successfully stored result in: "file:/tmp/temp-1663248768/ tmp-1939618752"

2009-11-19 13:53:18,436 [main] INFO
org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records
written : 0

2009-11-19 13:53:18,436 [main] INFO
org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes
written : 0

2009-11-19 13:53:18,436 [main] INFO
org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100%
complete!

2009-11-19 13:53:18,436 [main] INFO
org.apache.pig.backend.local.executionengine.LocalPigLauncher -
Success!!

//there is nothing in /tmp/temp-1663248768/tmp-1939618752 (it's empty)



I tried different paths to the HBase table 'hbase://test', 'test',
hbase://localhost:60000/test



How I stated the system (Hadoop + HBase) is started and I verified that's
working as I expected.



bin/hadoop namenode -format

bin/start-all.sh

//both Namenode and Jobtrackter are running verified by
http://localhost:50070 and http://localhost:500040



bin/start-hbase.sh

//both mater and regionserver are running check by localhost:60010
localhost:20 localhost:30

//also zookeeper Quorum is started at port localhost:2181



//fill a test table in hbase

hbase-0.20.1/bin/hbase shell

create 'test', 'data'

put 'test', 'row1', 'data', 'value1'

scan 'test'

//localhost:60010 show that the test table is in HBase.



Hope that someone knows the solution.

Thanks,

Joris













Reply via email to