[ https://issues.apache.org/jira/browse/PIG-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai resolved PIG-2289. ----------------------------- Resolution: Invalid > HBaseStorage do not care about delimiter in STORE > ------------------------------------------------- > > Key: PIG-2289 > URL: https://issues.apache.org/jira/browse/PIG-2289 > Project: Pig > Issue Type: Bug > Components: internal-udfs > Affects Versions: 0.9.1, 0.10 > Environment: Hadoop, Hbase, zookeeper from cdh3u1 > Pig from github (version 0.9.1 then trunk:0.10) > Reporter: Damien Hardy > > I want to store in Hbase a set of tupple generated by pig streaming (inspired > by http://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/ ) > Here is my script : > set debug 'off' > DEFINE iplookup `wrapper.sh GeoIP` > ship ('wrapper.sh') > cache('/GeoIP/GeoIPcity.dat#GeoIP'); > A = load 'log' using > org.apache.pig.backend.hadoop.hbase.HBaseStorage('default:body','-gt=_f:squid_t:201109161405 > -lte=_f:squid_t:201109161410 -loadKey') AS (rowkey, data); > B = LIMIT A 10; > C = FOREACH B { > t = > REGEX_EXTRACT(data,'([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}):([0-9]+) > ',1); > generate rowkey, t; > } > D = STREAM C THROUGH iplookup AS (rowkey, ip, country_code, country, state, > city); > DESCRIBE D; > -- DUMP D; > STORE D INTO 'geoip_pig' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('location:ip > location:country_code location:country location:state location:city') ; > The "DESCRIBE D;" show : > D: {rowkey: bytearray,ip: bytearray,country_code: bytearray,country: > bytearray,state: bytearray,city: bytearray} > as expected > Store juste get the rowkey and put the rest of the tuple in the first column > (location:ip) as you can see : > hbase(main):033:0> get 'geoip_pig', > "_f:squid_t:20110916140500_b:squid_s:200-1VPVjbVwywTpNtLA4mHl+A==" > COLUMN CELL > > > location:city timestamp=1316180980265, value= > location:country timestamp=1316180980265, value= > location:country_code timestamp=1316180980265, value= > location:ip timestamp=1316180980265, > value=90.9.213.170,FR,France,A9,Llupia > location:state timestamp=1316180980265, value= > 5 row(s) in 0.0150 seconds > I tried also with option '-delim=,' without more effect. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira