Re: Hive Mapred local task distribution

2014-09-06 Thread Xuefu Zhang
You might be able to control what tasks to run locally. However, once they
run locally, they have to do so on HiveServer2 host.

It's possible to run the local tasks in separate JVMs. Still, the same host.

--Xuefu


On Sat, Sep 6, 2014 at 7:51 AM, Abhilash L L 
wrote:

> Hi Xuefu,
>
> Yea, currently we have only one HiveServer2 host where the map red
> local tasks run.
>
>Any other solution other than LBing it ?
>
>
> Regards,
> Abhilash L L
> Capillary Technologies
> M:919886208262
> abhil...@capillarytech.com | www.capillarytech.com
>
> Email from people at capillarytech.com may not represent official policy
> of  Capillary Technologies unless explicitly stated. Please see our
> Corporate-Email-Policy
> 
> for details. Contents of this email are confidential. Please contact the
> Sender if you have received this email in error.
>
>
>
> On Sat, Sep 6, 2014 at 7:53 PM, Xuefu Zhang  wrote:
>
>> By "same host", don't you mean your HiveServer2 host? One solution is to
>> have multiple HiveServer2 instances and do load balance among them.
>>
>> --Xuefu
>>
>>
>> On Fri, Sep 5, 2014 at 11:37 PM, Abhilash L L > > wrote:
>>
>>> Hello,
>>>
>>>We are using Hive 0.11 connecting to it via Hive Thrift server 2.
>>>
>>>A lot of our queries are launching map red local tasks, which is good
>>> and expected. Since we are firing queries in parallel, these tasks are all
>>> starting on same host and  consuming a lot of resources.
>>>
>>>Is there a way to distribute these on different nodes ?
>>>
>>> Or the only possibility is to do it in app layer and load balance it
>>> on few of the thrift servers ?
>>>
>>> Please do let me know in case I have to share any more information
>>> regarding the setup etc.
>>>
>>> Regards,
>>> Abhilash L L
>>> Capillary Technologies
>>> M:919886208262
>>> abhil...@capillarytech.com | www.capillarytech.com
>>>
>>> Email from people at capillarytech.com may not represent official
>>> policy of  Capillary Technologies unless explicitly stated. Please see our
>>> Corporate-Email-Policy
>>> 
>>> for details. Contents of this email are confidential. Please contact the
>>> Sender if you have received this email in error.
>>>
>>>
>>> Email from people at capillarytech.com may not represent official
>>> policy of Capillary Technologies unless explicitly stated. Please see our
>>> Corporate-Email-Policy for details.Contents of this email are confidential.
>>> Please contact the Sender if you have received this email in error.
>>>
>>
>>
>
> Email from people at capillarytech.com may not represent official policy
> of Capillary Technologies unless explicitly stated. Please see our
> Corporate-Email-Policy for details.Contents of this email are confidential.
> Please contact the Sender if you have received this email in error.
>


Re: Parquet Binary Column Support

2014-09-06 Thread Xuefu Zhang
I don't think there is any issue keeping it away. The only issue is
resource. We welcome effort from the community to move it forward. I'm
willing to coach/review it.

--Xuefu


On Sat, Sep 6, 2014 at 8:18 AM, John Omernik  wrote:

> Greetings all -
>
> We really want to look into the Parquet file format more, however, without
> supporting all the Hive Column types, we are hesitant to dive in more.
>
> Currently, it looks like it's just the BINARY column type (which I use)
> based on the JIRA below, there hasn't been any movement in on that in a few
> months, I guess I am curious on the thoughts of the group on what issues is
> keeping Parquet from supporting BINARY, and when sort of timeline we may
> have for PARQUET to fully support all Hive column types.
>
> Thanks!
>
> John
>
>
>
> https://issues.apache.org/jira/browse/HIVE-6384
>


Hive Index and ORC

2014-09-06 Thread Alain Petrus
Hello,

I am wondering whether is it possible to use Hive index and ORC format?  Does 
it make sense?
Also, is Hive indexing a mature functionality?  What are your experiences using 
Hive indexing?

Thanks,
Alain

Hive Indexing and ORC

2014-09-06 Thread Alain Petrus
Hello,

Is it possible to create an index on table stored as ORC and compressed as 
Snappy?
Does it make sense?  I am wondering if Hive indexing is a mature functionality?

Thanks,
Alain

Hive Indexing and ORC

2014-09-06 Thread Alain Petrus
Hello,

Is it possible to create an index on table stored as ORC and compressed as 
Snappy?
Does it make sense?  I am wondering if Hive indexing is a mature functionality?

Thanks,
Alain

Hive Indexing and ORC

2014-09-06 Thread Alain Petrus
Hello,

Is it possible to create an index on table stored as ORC and compressed as 
Snappy?
Does it make sense?  I am wondering if Hive indexing is a mature functionality?

Thanks,
Alain

Parquet Binary Column Support

2014-09-06 Thread John Omernik
Greetings all -

We really want to look into the Parquet file format more, however, without
supporting all the Hive Column types, we are hesitant to dive in more.

Currently, it looks like it's just the BINARY column type (which I use)
based on the JIRA below, there hasn't been any movement in on that in a few
months, I guess I am curious on the thoughts of the group on what issues is
keeping Parquet from supporting BINARY, and when sort of timeline we may
have for PARQUET to fully support all Hive column types.

Thanks!

John



https://issues.apache.org/jira/browse/HIVE-6384


Re: Hive Mapred local task distribution

2014-09-06 Thread Abhilash L L
Hi Xuefu,

Yea, currently we have only one HiveServer2 host where the map red
local tasks run.

   Any other solution other than LBing it ?


Regards,
Abhilash L L
Capillary Technologies
M:919886208262
abhil...@capillarytech.com | www.capillarytech.com

Email from people at capillarytech.com may not represent official policy of
 Capillary Technologies unless explicitly stated. Please see our
Corporate-Email-Policy

for details. Contents of this email are confidential. Please contact the
Sender if you have received this email in error.



On Sat, Sep 6, 2014 at 7:53 PM, Xuefu Zhang  wrote:

> By "same host", don't you mean your HiveServer2 host? One solution is to
> have multiple HiveServer2 instances and do load balance among them.
>
> --Xuefu
>
>
> On Fri, Sep 5, 2014 at 11:37 PM, Abhilash L L 
> wrote:
>
>> Hello,
>>
>>We are using Hive 0.11 connecting to it via Hive Thrift server 2.
>>
>>A lot of our queries are launching map red local tasks, which is good
>> and expected. Since we are firing queries in parallel, these tasks are all
>> starting on same host and  consuming a lot of resources.
>>
>>Is there a way to distribute these on different nodes ?
>>
>> Or the only possibility is to do it in app layer and load balance it
>> on few of the thrift servers ?
>>
>> Please do let me know in case I have to share any more information
>> regarding the setup etc.
>>
>> Regards,
>> Abhilash L L
>> Capillary Technologies
>> M:919886208262
>> abhil...@capillarytech.com | www.capillarytech.com
>>
>> Email from people at capillarytech.com may not represent official policy
>> of  Capillary Technologies unless explicitly stated. Please see our
>> Corporate-Email-Policy
>> 
>> for details. Contents of this email are confidential. Please contact the
>> Sender if you have received this email in error.
>>
>>
>> Email from people at capillarytech.com may not represent official policy
>> of Capillary Technologies unless explicitly stated. Please see our
>> Corporate-Email-Policy for details.Contents of this email are confidential.
>> Please contact the Sender if you have received this email in error.
>>
>
>

-- 
Email from people at capillarytech.com may not represent official policy of 
Capillary Technologies unless explicitly stated. Please see our 
Corporate-Email-Policy for details.Contents of this email are confidential. 
Please contact the Sender if you have received this email in error.


Re: Hive Mapred local task distribution

2014-09-06 Thread Xuefu Zhang
By "same host", don't you mean your HiveServer2 host? One solution is to
have multiple HiveServer2 instances and do load balance among them.

--Xuefu


On Fri, Sep 5, 2014 at 11:37 PM, Abhilash L L 
wrote:

> Hello,
>
>We are using Hive 0.11 connecting to it via Hive Thrift server 2.
>
>A lot of our queries are launching map red local tasks, which is good
> and expected. Since we are firing queries in parallel, these tasks are all
> starting on same host and  consuming a lot of resources.
>
>Is there a way to distribute these on different nodes ?
>
> Or the only possibility is to do it in app layer and load balance it
> on few of the thrift servers ?
>
> Please do let me know in case I have to share any more information
> regarding the setup etc.
>
> Regards,
> Abhilash L L
> Capillary Technologies
> M:919886208262
> abhil...@capillarytech.com | www.capillarytech.com
>
> Email from people at capillarytech.com may not represent official policy
> of  Capillary Technologies unless explicitly stated. Please see our
> Corporate-Email-Policy
> 
> for details. Contents of this email are confidential. Please contact the
> Sender if you have received this email in error.
>
>
> Email from people at capillarytech.com may not represent official policy
> of Capillary Technologies unless explicitly stated. Please see our
> Corporate-Email-Policy for details.Contents of this email are confidential.
> Please contact the Sender if you have received this email in error.
>


Re: Mysql - Hive Sync

2014-09-06 Thread Stephen Sprague
interesting. thanks Muthu.

a colleague of mine pointed out this one too, linkedin's databus (
https://github.com/linkedin/databus/wiki)  this one looks extremely heavy
weight and again not sure its worth the headache.

i like the idea of a trigger on the mysql table and then broadcasting the
data to a another app via udp message.

cf. https://code.google.com/p/mysql-message-api/

the thing is you'll need to batch the records over say 5 minutes (or
whatever) then write the batch as one file to hdfs.

This seems infinitely simpler and more maintainable to me. :)




On Fri, Sep 5, 2014 at 11:53 PM, Muthu Pandi  wrote:

> Yeah installing Mysql hadoop applier took lot of time when building and
> installing GCC 4.6, and its working but its not serving the exact purpose.
> So now am trying with my own python scripting.
>
> Idea is reading insert query from binlog and save it under hive warehouse
> as table and query from there.
>
>
>
> *RegardsMuthupandi.K*
>
> [image: Picture (Device Independent Bitmap)]
>
>
>
> On Sat, Sep 6, 2014 at 4:47 AM, Stephen Sprague 
> wrote:
>
>> great find, Muthu.  I would be interested in hearing any about any
>> success or failures using this adapter. almost sounds too good to be true.
>>
>> After reading the blog (
>> http://innovating-technology.blogspot.com/2013/04/mysql-hadoop-applier-part-2.html)
>> about it i see it comes with caveats and it looks a little rough around the
>> edges for installing.  Not sure i'd bet the farm on this product but YMMV.
>>
>> Anyway, curious to know how it works out for you.
>>
>>
>>
>> On Tue, Sep 2, 2014 at 11:03 PM, Muthu Pandi  wrote:
>>
>>> This cant be done since insert update delete are not supported in hive.
>>>
>>> Mysql Applier for Hadoop package servers the same purpose of the
>>> prototype tool which i intended to develop.
>>>
>>> link for "Mysql Applier for Hadoop"
>>> http://dev.mysql.com/tech-resources/articles/mysql-hadoop-applier.html
>>>
>>>
>>>
>>> *Regards Muthupandi.K*
>>>
>>>  [image: Picture (Device Independent Bitmap)]
>>>
>>>
>>>
>>> On Wed, Sep 3, 2014 at 10:35 AM, Muthu Pandi 
>>> wrote:
>>>
 Yeah but we cant make it to work as near real time. Also my table
 doesnt have like 'ID' to use for --check-column that's why opted out of
 sqoop.



 *Regards Muthupandi.K*

  [image: Picture (Device Independent Bitmap)]



 On Wed, Sep 3, 2014 at 10:28 AM, Nitin Pawar 
 wrote:

> have you looked at sqoop?
>
>
> On Wed, Sep 3, 2014 at 10:15 AM, Muthu Pandi 
> wrote:
>
>> Dear All
>>
>>  Am developing a prototype of syncing tables from mysql to Hive
>> using python and JDBC. Is it a good idea using the JDBC for this purpose.
>>
>> My usecase will be generating the sales report using the hive, data
>> pulled from mysql using the prototype tool.My data will be around 
>> 2GB/day.
>>
>>
>>
>> *Regards Muthupandi.K*
>>
>>  [image: Picture (Device Independent Bitmap)]
>>
>>
>
>
> --
> Nitin Pawar
>


>>>
>>
>