blob handling in hive
Hi, I am using sqoop to export data from mysql to hive. I noticed that hive don't have blob data type yet. is there anyway I can do so hive can store blob ? Jimmy
Re: blob handling in hive
One way is to store blob in HBase and use HBaseHandler to access your blob. On Tue, Oct 12, 2010 at 2:14 PM, Jinsong Hu wrote: > Hi, > I am using sqoop to export data from mysql to hive. I noticed that hive > don't have blob data type yet. is there anyway I can do so hive can store > blob ? > > Jimmy >
Re: blob handling in hive
storing the blob in hbase is too costly. hbase compaction costs lots of cpu. All I want to do is to be able to read the byte array out of a sequence file, and map that byte array to an hive column. I can write a SerDe for this purpose. I tried to define the data to be array. I then tried to write custom SerDe, after I get the byte array out of the disk, I need to map it, so I wrote the code: columnTypes =TypeInfoUtils.getTypeInfosFromTypeString("int,string,array"); but then how to I convert the data in the row.set() method ? I tried this: byte [] bContent=ev.get_content()==null ? null : (ev.get_content().getData()==null ? null : ev.get_content().getData()); org.apache.hadoop.hive.serde2.io.ByteWritable tContent = bContent==null ? new org.apache.hadoop.hive.serde2.io.ByteWritable() : new org.apache.hadoop.hive.serde2.io.ByteWritable(bContent[0]) ; row.set(2, tContent); this works for a single byte, but doesn't work for byte array. Any way that I can get the byte array returned in sql is appreciated. Jimmy -- From: "Ted Yu" Sent: Tuesday, October 12, 2010 2:19 PM To: Subject: Re: blob handling in hive One way is to store blob in HBase and use HBaseHandler to access your blob. On Tue, Oct 12, 2010 at 2:14 PM, Jinsong Hu wrote: Hi, I am using sqoop to export data from mysql to hive. I noticed that hive don't have blob data type yet. is there anyway I can do so hive can store blob ? Jimmy
Re: blob handling in hive
How about creating org.apache.hadoop.hive.serde2.io.BytesWritable which wraps byte[] ? On Tue, Oct 12, 2010 at 3:49 PM, Jinsong Hu wrote: > storing the blob in hbase is too costly. hbase compaction costs lots of > cpu. All I want to do is to be able to read the byte array out of a sequence > file, and map that byte array to an hive column. > I can write a SerDe for this purpose. > > I tried to define the data to be array. I then tried to write > custom SerDe, after I get the byte array out of the disk, I need to map > it, > > so I wrote the code: > columnTypes > =TypeInfoUtils.getTypeInfosFromTypeString("int,string,array"); > > but then how to I convert the data in the row.set() method ? > > I tried this: > > byte [] bContent=ev.get_content()==null ? null : > (ev.get_content().getData()==null ? null : ev.get_content().getData()); > org.apache.hadoop.hive.serde2.io.ByteWritable tContent = > bContent==null ? new org.apache.hadoop.hive.serde2.io.ByteWritable() : new > org.apache.hadoop.hive.serde2.io.ByteWritable(bContent[0]) ; >row.set(2, tContent); > > this works for a single byte, but doesn't work for byte array. > Any way that I can get the byte array returned in sql is appreciated. > > Jimmy > > ------ > From: "Ted Yu" > Sent: Tuesday, October 12, 2010 2:19 PM > To: > Subject: Re: blob handling in hive > > > One way is to store blob in HBase and use HBaseHandler to access your >> blob. >> >> On Tue, Oct 12, 2010 at 2:14 PM, Jinsong Hu >> wrote: >> >> Hi, >>> I am using sqoop to export data from mysql to hive. I noticed that hive >>> don't have blob data type yet. is there anyway I can do so hive can store >>> blob ? >>> >>> Jimmy >>> >>> >>
Re: blob handling in hive
I thought about that too. but then I need to write an bytes inspector and stick that into hive inspector factory. we also need to create a new datatype , such as blob , in hive's supported data types. Adding a new supported data type to hive is a non-trivial task, as more code will need to be touched. I am just wondering if it is possible to get what I want to do without such big change. Jimmy. -- From: "Ted Yu" Sent: Tuesday, October 12, 2010 4:12 PM To: Subject: Re: blob handling in hive How about creating org.apache.hadoop.hive.serde2.io.BytesWritable which wraps byte[] ? On Tue, Oct 12, 2010 at 3:49 PM, Jinsong Hu wrote: storing the blob in hbase is too costly. hbase compaction costs lots of cpu. All I want to do is to be able to read the byte array out of a sequence file, and map that byte array to an hive column. I can write a SerDe for this purpose. I tried to define the data to be array. I then tried to write custom SerDe, after I get the byte array out of the disk, I need to map it, so I wrote the code: columnTypes =TypeInfoUtils.getTypeInfosFromTypeString("int,string,array"); but then how to I convert the data in the row.set() method ? I tried this: byte [] bContent=ev.get_content()==null ? null : (ev.get_content().getData()==null ? null : ev.get_content().getData()); org.apache.hadoop.hive.serde2.io.ByteWritable tContent = bContent==null ? new org.apache.hadoop.hive.serde2.io.ByteWritable() : new org.apache.hadoop.hive.serde2.io.ByteWritable(bContent[0]) ; row.set(2, tContent); this works for a single byte, but doesn't work for byte array. Any way that I can get the byte array returned in sql is appreciated. Jimmy -- From: "Ted Yu" Sent: Tuesday, October 12, 2010 2:19 PM To: Subject: Re: blob handling in hive One way is to store blob in HBase and use HBaseHandler to access your blob. On Tue, Oct 12, 2010 at 2:14 PM, Jinsong Hu wrote: Hi, I am using sqoop to export data from mysql to hive. I noticed that hive don't have blob data type yet. is there anyway I can do so hive can store blob ? Jimmy
Re: blob handling in hive
How about utf-8 encode your blob and store in Hive as String ? On Tue, Oct 12, 2010 at 4:20 PM, Jinsong Hu wrote: > I thought about that too. but then I need to write an bytes inspector and > stick that into hive inspector factory. we also need to create a new > datatype , such as blob , in hive's supported > data types. Adding a new supported data type to hive is a non-trivial task, > as more code will need to be touched. > > I am just wondering if it is possible to get what I want to do without such > big change. > > > > Jimmy. > > -- > From: "Ted Yu" > Sent: Tuesday, October 12, 2010 4:12 PM > > To: > Subject: Re: blob handling in hive > > How about creating org.apache.hadoop.hive.serde2.io.BytesWritable which >> wraps byte[] ? >> >> On Tue, Oct 12, 2010 at 3:49 PM, Jinsong Hu >> wrote: >> >> storing the blob in hbase is too costly. hbase compaction costs lots of >>> cpu. All I want to do is to be able to read the byte array out of a >>> sequence >>> file, and map that byte array to an hive column. >>> I can write a SerDe for this purpose. >>> >>> I tried to define the data to be array. I then tried to write >>> custom SerDe, after I get the byte array out of the disk, I need to >>> map >>> it, >>> >>> so I wrote the code: >>> columnTypes >>> =TypeInfoUtils.getTypeInfosFromTypeString("int,string,array"); >>> >>> but then how to I convert the data in the row.set() method ? >>> >>> I tried this: >>> >>> byte [] bContent=ev.get_content()==null ? null : >>> (ev.get_content().getData()==null ? null : ev.get_content().getData()); >>> org.apache.hadoop.hive.serde2.io.ByteWritable tContent = >>> bContent==null ? new org.apache.hadoop.hive.serde2.io.ByteWritable() : >>> new >>> org.apache.hadoop.hive.serde2.io.ByteWritable(bContent[0]) ; >>> row.set(2, tContent); >>> >>> this works for a single byte, but doesn't work for byte array. >>> Any way that I can get the byte array returned in sql is appreciated. >>> >>> Jimmy >>> >>> -- >>> From: "Ted Yu" >>> Sent: Tuesday, October 12, 2010 2:19 PM >>> To: >>> Subject: Re: blob handling in hive >>> >>> >>> One way is to store blob in HBase and use HBaseHandler to access your >>> >>>> blob. >>>> >>>> On Tue, Oct 12, 2010 at 2:14 PM, Jinsong Hu >>>> wrote: >>>> >>>> Hi, >>>> >>>>> I am using sqoop to export data from mysql to hive. I noticed that >>>>> hive >>>>> don't have blob data type yet. is there anyway I can do so hive can >>>>> store >>>>> blob ? >>>>> >>>>> Jimmy >>>>> >>>>> >>>>> >>>> >>
Re: blob handling in hive
Yes. tentatively that is what I have to do. another way is to convert the data to base64 encoded string. after client received the data, it needs to decode the data back to binary. this is a hack, but works. If hive supports byte array as native data type, then the solution is much more elegant. Jimmy. -- From: "Ted Yu" Sent: Tuesday, October 12, 2010 4:33 PM To: Subject: Re: blob handling in hive How about utf-8 encode your blob and store in Hive as String ? On Tue, Oct 12, 2010 at 4:20 PM, Jinsong Hu wrote: I thought about that too. but then I need to write an bytes inspector and stick that into hive inspector factory. we also need to create a new datatype , such as blob , in hive's supported data types. Adding a new supported data type to hive is a non-trivial task, as more code will need to be touched. I am just wondering if it is possible to get what I want to do without such big change. Jimmy. -- From: "Ted Yu" Sent: Tuesday, October 12, 2010 4:12 PM To: Subject: Re: blob handling in hive How about creating org.apache.hadoop.hive.serde2.io.BytesWritable which wraps byte[] ? On Tue, Oct 12, 2010 at 3:49 PM, Jinsong Hu wrote: storing the blob in hbase is too costly. hbase compaction costs lots of cpu. All I want to do is to be able to read the byte array out of a sequence file, and map that byte array to an hive column. I can write a SerDe for this purpose. I tried to define the data to be array. I then tried to write custom SerDe, after I get the byte array out of the disk, I need to map it, so I wrote the code: columnTypes =TypeInfoUtils.getTypeInfosFromTypeString("int,string,array"); but then how to I convert the data in the row.set() method ? I tried this: byte [] bContent=ev.get_content()==null ? null : (ev.get_content().getData()==null ? null : ev.get_content().getData()); org.apache.hadoop.hive.serde2.io.ByteWritable tContent = bContent==null ? new org.apache.hadoop.hive.serde2.io.ByteWritable() : new org.apache.hadoop.hive.serde2.io.ByteWritable(bContent[0]) ; row.set(2, tContent); this works for a single byte, but doesn't work for byte array. Any way that I can get the byte array returned in sql is appreciated. Jimmy -- From: "Ted Yu" Sent: Tuesday, October 12, 2010 2:19 PM To: Subject: Re: blob handling in hive One way is to store blob in HBase and use HBaseHandler to access your blob. On Tue, Oct 12, 2010 at 2:14 PM, Jinsong Hu wrote: Hi, I am using sqoop to export data from mysql to hive. I noticed that hive don't have blob data type yet. is there anyway I can do so hive can store blob ? Jimmy