blob handling in hive

2010-10-12 Thread Jinsong Hu

Hi,
 I am using sqoop to export data from mysql to hive. I noticed that hive 
don't have blob data type yet. is there anyway I can do so hive can store 
blob ?


Jimmy 



Re: blob handling in hive

2010-10-12 Thread Ted Yu
One way is to store blob in HBase and use HBaseHandler to access your blob.

On Tue, Oct 12, 2010 at 2:14 PM, Jinsong Hu  wrote:

> Hi,
>  I am using sqoop to export data from mysql to hive. I noticed that hive
> don't have blob data type yet. is there anyway I can do so hive can store
> blob ?
>
> Jimmy
>


Re: blob handling in hive

2010-10-12 Thread Jinsong Hu
storing the blob in hbase is too costly. hbase compaction costs lots of 
cpu. All I want to do is to be able to read the byte array out of a sequence 
file, and map that byte array to an hive column.

I can write a SerDe for this purpose.

I tried to define the data to be array. I then tried to write 
custom  SerDe, after  I get the byte array out  of the disk, I need to map 
it,


 so I wrote the code:
columnTypes 
=TypeInfoUtils.getTypeInfosFromTypeString("int,string,array");


but then how to I convert the data in the row.set() method ?

I tried this:

   byte [] bContent=ev.get_content()==null ? null : 
(ev.get_content().getData()==null ? null : ev.get_content().getData());
   org.apache.hadoop.hive.serde2.io.ByteWritable tContent = 
bContent==null ? new org.apache.hadoop.hive.serde2.io.ByteWritable() :  new 
org.apache.hadoop.hive.serde2.io.ByteWritable(bContent[0]) ;

row.set(2, tContent);

this works for a single byte, but doesn't work for byte array.
Any way that I can get the byte array returned in sql is appreciated.

Jimmy

--
From: "Ted Yu" 
Sent: Tuesday, October 12, 2010 2:19 PM
To: 
Subject: Re: blob handling in hive

One way is to store blob in HBase and use HBaseHandler to access your 
blob.


On Tue, Oct 12, 2010 at 2:14 PM, Jinsong Hu  
wrote:



Hi,
 I am using sqoop to export data from mysql to hive. I noticed that hive
don't have blob data type yet. is there anyway I can do so hive can store
blob ?

Jimmy





Re: blob handling in hive

2010-10-12 Thread Ted Yu
How about creating org.apache.hadoop.hive.serde2.io.BytesWritable which
wraps byte[] ?

On Tue, Oct 12, 2010 at 3:49 PM, Jinsong Hu  wrote:

> storing the blob in hbase is too costly. hbase compaction costs lots of
> cpu. All I want to do is to be able to read the byte array out of a sequence
> file, and map that byte array to an hive column.
> I can write a SerDe for this purpose.
>
> I tried to define the data to be array. I then tried to write
> custom  SerDe, after  I get the byte array out  of the disk, I need to map
> it,
>
>  so I wrote the code:
> columnTypes
> =TypeInfoUtils.getTypeInfosFromTypeString("int,string,array");
>
> but then how to I convert the data in the row.set() method ?
>
> I tried this:
>
>   byte [] bContent=ev.get_content()==null ? null :
> (ev.get_content().getData()==null ? null : ev.get_content().getData());
>   org.apache.hadoop.hive.serde2.io.ByteWritable tContent =
> bContent==null ? new org.apache.hadoop.hive.serde2.io.ByteWritable() :  new
> org.apache.hadoop.hive.serde2.io.ByteWritable(bContent[0]) ;
>row.set(2, tContent);
>
> this works for a single byte, but doesn't work for byte array.
> Any way that I can get the byte array returned in sql is appreciated.
>
> Jimmy
>
> ------
> From: "Ted Yu" 
> Sent: Tuesday, October 12, 2010 2:19 PM
> To: 
> Subject: Re: blob handling in hive
>
>
>  One way is to store blob in HBase and use HBaseHandler to access your
>> blob.
>>
>> On Tue, Oct 12, 2010 at 2:14 PM, Jinsong Hu 
>> wrote:
>>
>>  Hi,
>>>  I am using sqoop to export data from mysql to hive. I noticed that hive
>>> don't have blob data type yet. is there anyway I can do so hive can store
>>> blob ?
>>>
>>> Jimmy
>>>
>>>
>>


Re: blob handling in hive

2010-10-12 Thread Jinsong Hu
I thought about that too. but then I need to write an bytes inspector and 
stick that into hive inspector factory.  we also need to create a new 
datatype , such as blob , in hive's supported
data types. Adding a new supported data type to hive is a non-trivial task, 
as more code will need to be touched.


I am just wondering if it is possible to get what I want to do without such 
big change.



Jimmy.

--
From: "Ted Yu" 
Sent: Tuesday, October 12, 2010 4:12 PM
To: 
Subject: Re: blob handling in hive


How about creating org.apache.hadoop.hive.serde2.io.BytesWritable which
wraps byte[] ?

On Tue, Oct 12, 2010 at 3:49 PM, Jinsong Hu  
wrote:



storing the blob in hbase is too costly. hbase compaction costs lots of
cpu. All I want to do is to be able to read the byte array out of a 
sequence

file, and map that byte array to an hive column.
I can write a SerDe for this purpose.

I tried to define the data to be array. I then tried to write
custom  SerDe, after  I get the byte array out  of the disk, I need to 
map

it,

 so I wrote the code:
columnTypes
=TypeInfoUtils.getTypeInfosFromTypeString("int,string,array");

but then how to I convert the data in the row.set() method ?

I tried this:

  byte [] bContent=ev.get_content()==null ? null :
(ev.get_content().getData()==null ? null : ev.get_content().getData());
  org.apache.hadoop.hive.serde2.io.ByteWritable tContent =
bContent==null ? new org.apache.hadoop.hive.serde2.io.ByteWritable() : 
new

org.apache.hadoop.hive.serde2.io.ByteWritable(bContent[0]) ;
   row.set(2, tContent);

this works for a single byte, but doesn't work for byte array.
Any way that I can get the byte array returned in sql is appreciated.

Jimmy

--
From: "Ted Yu" 
Sent: Tuesday, October 12, 2010 2:19 PM
To: 
Subject: Re: blob handling in hive


 One way is to store blob in HBase and use HBaseHandler to access your

blob.

On Tue, Oct 12, 2010 at 2:14 PM, Jinsong Hu 
wrote:

 Hi,
 I am using sqoop to export data from mysql to hive. I noticed that 
hive
don't have blob data type yet. is there anyway I can do so hive can 
store

blob ?

Jimmy








Re: blob handling in hive

2010-10-12 Thread Ted Yu
How about utf-8 encode your blob and store in Hive as String ?

On Tue, Oct 12, 2010 at 4:20 PM, Jinsong Hu  wrote:

> I thought about that too. but then I need to write an bytes inspector and
> stick that into hive inspector factory.  we also need to create a new
> datatype , such as blob , in hive's supported
> data types. Adding a new supported data type to hive is a non-trivial task,
> as more code will need to be touched.
>
> I am just wondering if it is possible to get what I want to do without such
> big change.
>
>
>
> Jimmy.
>
> --
> From: "Ted Yu" 
> Sent: Tuesday, October 12, 2010 4:12 PM
>
> To: 
> Subject: Re: blob handling in hive
>
>  How about creating org.apache.hadoop.hive.serde2.io.BytesWritable which
>> wraps byte[] ?
>>
>> On Tue, Oct 12, 2010 at 3:49 PM, Jinsong Hu 
>> wrote:
>>
>>  storing the blob in hbase is too costly. hbase compaction costs lots of
>>> cpu. All I want to do is to be able to read the byte array out of a
>>> sequence
>>> file, and map that byte array to an hive column.
>>> I can write a SerDe for this purpose.
>>>
>>> I tried to define the data to be array. I then tried to write
>>> custom  SerDe, after  I get the byte array out  of the disk, I need to
>>> map
>>> it,
>>>
>>>  so I wrote the code:
>>> columnTypes
>>> =TypeInfoUtils.getTypeInfosFromTypeString("int,string,array");
>>>
>>> but then how to I convert the data in the row.set() method ?
>>>
>>> I tried this:
>>>
>>>  byte [] bContent=ev.get_content()==null ? null :
>>> (ev.get_content().getData()==null ? null : ev.get_content().getData());
>>>  org.apache.hadoop.hive.serde2.io.ByteWritable tContent =
>>> bContent==null ? new org.apache.hadoop.hive.serde2.io.ByteWritable() :
>>> new
>>> org.apache.hadoop.hive.serde2.io.ByteWritable(bContent[0]) ;
>>>       row.set(2, tContent);
>>>
>>> this works for a single byte, but doesn't work for byte array.
>>> Any way that I can get the byte array returned in sql is appreciated.
>>>
>>> Jimmy
>>>
>>> --
>>> From: "Ted Yu" 
>>> Sent: Tuesday, October 12, 2010 2:19 PM
>>> To: 
>>> Subject: Re: blob handling in hive
>>>
>>>
>>>  One way is to store blob in HBase and use HBaseHandler to access your
>>>
>>>> blob.
>>>>
>>>> On Tue, Oct 12, 2010 at 2:14 PM, Jinsong Hu 
>>>> wrote:
>>>>
>>>>  Hi,
>>>>
>>>>>  I am using sqoop to export data from mysql to hive. I noticed that
>>>>> hive
>>>>> don't have blob data type yet. is there anyway I can do so hive can
>>>>> store
>>>>> blob ?
>>>>>
>>>>> Jimmy
>>>>>
>>>>>
>>>>>
>>>>
>>


Re: blob handling in hive

2010-10-12 Thread Jinsong Hu
Yes. tentatively that is what I have to do. another way is to convert the 
data to base64 encoded
string. after client received the data, it needs to decode the data back to 
binary. this is a hack, but works.


If hive supports byte array as native data type, then the solution is much 
more elegant.


Jimmy.

--
From: "Ted Yu" 
Sent: Tuesday, October 12, 2010 4:33 PM
To: 
Subject: Re: blob handling in hive


How about utf-8 encode your blob and store in Hive as String ?

On Tue, Oct 12, 2010 at 4:20 PM, Jinsong Hu  
wrote:



I thought about that too. but then I need to write an bytes inspector and
stick that into hive inspector factory.  we also need to create a new
datatype , such as blob , in hive's supported
data types. Adding a new supported data type to hive is a non-trivial 
task,

as more code will need to be touched.

I am just wondering if it is possible to get what I want to do without 
such

big change.



Jimmy.

--
From: "Ted Yu" 
Sent: Tuesday, October 12, 2010 4:12 PM

To: 
Subject: Re: blob handling in hive

 How about creating org.apache.hadoop.hive.serde2.io.BytesWritable which

wraps byte[] ?

On Tue, Oct 12, 2010 at 3:49 PM, Jinsong Hu 
wrote:

 storing the blob in hbase is too costly. hbase compaction costs lots of

cpu. All I want to do is to be able to read the byte array out of a
sequence
file, and map that byte array to an hive column.
I can write a SerDe for this purpose.

I tried to define the data to be array. I then tried to write
custom  SerDe, after  I get the byte array out  of the disk, I need to
map
it,

 so I wrote the code:
columnTypes
=TypeInfoUtils.getTypeInfosFromTypeString("int,string,array");

but then how to I convert the data in the row.set() method ?

I tried this:

 byte [] bContent=ev.get_content()==null ? null :
(ev.get_content().getData()==null ? null : ev.get_content().getData());
 org.apache.hadoop.hive.serde2.io.ByteWritable tContent =
bContent==null ? new org.apache.hadoop.hive.serde2.io.ByteWritable() :
new
org.apache.hadoop.hive.serde2.io.ByteWritable(bContent[0]) ;
  row.set(2, tContent);

this works for a single byte, but doesn't work for byte array.
Any way that I can get the byte array returned in sql is appreciated.

Jimmy

--
From: "Ted Yu" 
Sent: Tuesday, October 12, 2010 2:19 PM
To: 
Subject: Re: blob handling in hive


 One way is to store blob in HBase and use HBaseHandler to access your


blob.

On Tue, Oct 12, 2010 at 2:14 PM, Jinsong Hu 
wrote:

 Hi,


 I am using sqoop to export data from mysql to hive. I noticed that
hive
don't have blob data type yet. is there anyway I can do so hive can
store
blob ?

Jimmy