Re: issue regarding importing hive tables from one cluster to another.

2012-09-08 Thread Jagat Singh
Hive structure information is in metastore which is by default in Derby
database ( which I doubt if you would be having) or in mysql or something.

Point your hive to mysql and try.

---
Sent from Mobile , short and crisp.
On 09-Sep-2012 5:29 AM, "yogesh dhari"  wrote:

>  Hi all,
>
>  I have switched to new hdfs cluster from old cluster  ( all machines from
> old cluster are not connected to new cluster in any manner)
>
> I brought edit and fsimage including ( dfs.name.dir and dfs.data.dir )from
> old cluster and put it over new cluster and every files and data are
> showing over new cluster.
>
> is there any way out by which I can bring all tables created and there
> structure in hive from old cluster to new to new cluster..
>
> Thanks & regards
> Yogesh Kumar
>
>
>
>


issue regarding importing hive tables from one cluster to another.

2012-09-08 Thread yogesh dhari

Hi all,

 I have switched to new hdfs cluster from old cluster  ( all machines from old 
cluster are not connected to new cluster in any manner)

I brought edit and fsimage including ( dfs.name.dir and dfs.data.dir )from old 
cluster and put it over new cluster and every files and data are showing over 
new cluster.

is there any way out by which I can bring all tables created and there 
structure in hive from old cluster to new to new cluster..  

Thanks & regards
Yogesh Kumar



  

Re: How to load csv data into HIVE

2012-09-08 Thread praveenesh kumar
Yup, Bejoy is correct :-) Just use hadoop streaming, for what it can do
best --->>> Cleaning, Transformations and Validations, in just simple steps.

Regards,
Praveenesh

On Sat, Sep 8, 2012 at 6:03 PM, Bejoy KS  wrote:

> Hi Chuck
>
> I believe Praveenesh was adding his thought to the discussion on
> preprocessing the data using mapreduce itself. If you go with hadoop
> streaming you can use the python script in the mapper and that will do the
> preprocessing parallely on large volume data. Then this preprocessed data
> can be loaded into hive table.
>
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> --
> *From: * "Connell, Chuck" 
> *Date: *Sat, 8 Sep 2012 12:18:33 +
> *To: *user@hive.apache.org
> *ReplyTo: * user@hive.apache.org
> *Subject: *RE: How to load csv data into HIVE
>
> I would like to hear more about this "hadoop streaming to Hive" idea. I
> have used streaming jobs as mappers, with a python script as map.py. Are
> you saying that such a streaming mapper can load its output into Hive? Can
> you send some example code? Hive wants to load "files" not individual
> lines/records. How would you do this?
>
> Thanks very much,
> Chuck
>
>
>  --
> *From:* praveenesh kumar [praveen...@gmail.com]
> *Sent:* Saturday, September 08, 2012 7:54 AM
> *To:* user@hive.apache.org
> *Subject:* Re: How to load csv data into HIVE
>
>  You can use hadoop streaming that would be much faster... Just run your
> cleaning shell script logic in map phase and it will be done in just few
> minutes. That will keep the data in HDFS.
>
> Regards,
> Praveenesh
>
> On Fri, Sep 7, 2012 at 8:37 PM, Sandeep Reddy P <
> sandeepreddy.3...@gmail.com> wrote:
>
>> Hi,
>> Thank you all for your help. I'll try both ways and i'll get back to you.
>>
>>
>> On Fri, Sep 7, 2012 at 11:02 AM, Mohammad Tariq wrote:
>>
>>> I said this assuming that a Hadoop cluster is available since Sandeep is
>>> planning to use Hive. If that is the case then MapReduce would be faster
>>> for such large files.
>>>
>>> Regards,
>>> Mohammad Tariq
>>>
>>>
>>>
>>> On Fri, Sep 7, 2012 at 8:27 PM, Connell, Chuck >> > wrote:
>>>
  I cannot promise which is faster. A lot depends on how clever your
 scripts are.

 ** **

 ** **

 ** **

 *From:* Sandeep Reddy P [mailto:sandeepreddy.3...@gmail.com]
 *Sent:* Friday, September 07, 2012 10:42 AM
 *To:* user@hive.apache.org
 *Subject:* Re: How to load csv data into HIVE

 ** **

 Hi,
 I wrote a shell script to get csv data but when i run that script on a
 12GB csv its taking more time. If i run a python script will that be 
 faster?
 

 On Fri, Sep 7, 2012 at 10:39 AM, Connell, Chuck <
 chuck.conn...@nuance.com> wrote:

 How about a Python script that changes it into plain tab-separated
 text? So it would look like this…

  

 17496927414-mar-20063522876
 14-mar-200650308651
 etc…

  

 Tab-separated with newlines is easy to read and works perfectly on
 import.

  

 Chuck Connell

 Nuance R&D Data Team

 Burlington, MA

 781-565-4611

  

 *From:* Sandeep Reddy P [mailto:sandeepreddy.3...@gmail.com]
 *Subject:* How to load csv data into HIVE

  

 Hi,
 Here is the sample data
 "174969274","14-mar-2006","

 3522876","","14-mar-2006","50308","65","1"|

 "174969275","19-jul-2006","3523154","","19-jul-2006","50308","65","1"|

 "174969276","31-dec-2005","3530333","","31-dec-2005","50308","65","1"|

 "174969277","14-apr-2005","3531470","","14-apr-2005","50308","65","1"|

 How to load this kind of data into HIVE?
 I'm using shell script to get rid of double quotes and '|' but its
 taking very long time to work on each csv which are 12GB each. What is the
 best way to do this?

  




 --
 Thanks,
 sandeep

>>>
>>>
>>
>>
>>  --
>> Thanks,
>> sandeep
>>
>>
>


Re: How to load csv data into HIVE

2012-09-08 Thread Bejoy KS
Hi Chuck

I believe Praveenesh was adding his thought to the discussion on preprocessing 
the data using mapreduce itself. If you go with hadoop streaming you can use 
the python script in the mapper and that will do the preprocessing parallely on 
large volume data. Then this preprocessed data can be loaded into hive table.



Regards
Bejoy KS

Sent from handheld, please excuse typos.

-Original Message-
From: "Connell, Chuck" 
Date: Sat, 8 Sep 2012 12:18:33 
To: user@hive.apache.org
Reply-To: user@hive.apache.org
Subject: RE: How to load csv data into HIVE

I would like to hear more about this "hadoop streaming to Hive" idea. I have 
used streaming jobs as mappers, with a python script as map.py. Are you saying 
that such a streaming mapper can load its output into Hive? Can you send some 
example code? Hive wants to load "files" not individual lines/records. How 
would you do this?

Thanks very much,
Chuck



From: praveenesh kumar [praveen...@gmail.com]
Sent: Saturday, September 08, 2012 7:54 AM
To: user@hive.apache.org
Subject: Re: How to load csv data into HIVE

You can use hadoop streaming that would be much faster... Just run your 
cleaning shell script logic in map phase and it will be done in just few 
minutes. That will keep the data in HDFS.

Regards,
Praveenesh

On Fri, Sep 7, 2012 at 8:37 PM, Sandeep Reddy P 
mailto:sandeepreddy.3...@gmail.com>> wrote:
Hi,
Thank you all for your help. I'll try both ways and i'll get back to you.


On Fri, Sep 7, 2012 at 11:02 AM, Mohammad Tariq 
mailto:donta...@gmail.com>> wrote:
I said this assuming that a Hadoop cluster is available since Sandeep is 
planning to use Hive. If that is the case then MapReduce would be faster for 
such large files.

Regards,
Mohammad Tariq



On Fri, Sep 7, 2012 at 8:27 PM, Connell, Chuck 
mailto:chuck.conn...@nuance.com>> wrote:
I cannot promise which is faster. A lot depends on how clever your scripts are.



From: Sandeep Reddy P 
[mailto:sandeepreddy.3...@gmail.com]
Sent: Friday, September 07, 2012 10:42 AM
To: user@hive.apache.org
Subject: Re: How to load csv data into HIVE

Hi,
I wrote a shell script to get csv data but when i run that script on a 12GB csv 
its taking more time. If i run a python script will that be faster?
On Fri, Sep 7, 2012 at 10:39 AM, Connell, Chuck 
mailto:chuck.conn...@nuance.com>> wrote:
How about a Python script that changes it into plain tab-separated text? So it 
would look like this…

17496927414-mar-20063522876 
14-mar-200650308651
etc…

Tab-separated with newlines is easy to read and works perfectly on import.

Chuck Connell
Nuance R&D Data Team
Burlington, MA
781-565-4611

From: Sandeep Reddy P 
[mailto:sandeepreddy.3...@gmail.com]
Subject: How to load csv data into HIVE

Hi,
Here is the sample data
"174969274","14-mar-2006","
3522876","","14-mar-2006","50308","65","1"|
"174969275","19-jul-2006","3523154","","19-jul-2006","50308","65","1"|
"174969276","31-dec-2005","3530333","","31-dec-2005","50308","65","1"|
"174969277","14-apr-2005","3531470","","14-apr-2005","50308","65","1"|

How to load this kind of data into HIVE?
I'm using shell script to get rid of double quotes and '|' but its taking very 
long time to work on each csv which are 12GB each. What is the best way to do 
this?




--
Thanks,
sandeep




--
Thanks,
sandeep





RE: Handling arrays returned by json_tuple ??

2012-09-08 Thread Connell, Chuck
Something else... If json_tuple cannot select elements in an array, that means 
that JSON objects within an array are essentially "frozen" within their array. 
So if I had

{"text1" : "smith", "array1" : [{json-object},{json-object}]}
{"text1" : "jones", "array1" : [{json-object},{json-object}]}

I could extract only the top level value array1, but could not "open up" that 
array to do anything with its embedded elements which are valid json objects!  
Is this true?

Chuck



From: Connell, Chuck
Sent: Friday, September 07, 2012 3:27 PM
To: user@hive.apache.org
Subject: Handling arrays returned by json_tuple ??

I am using the json_tuple lateral view function. It works fine. But I am 
wondering how to select individual elements from a returned array.

Here is an example...

$ cat array1.json

{"text1" : "smith", "array1" : [6,5,4]}
{"text1" : "jones", "array1" : [1,2,3]}
{"text1" : "white", "array1" : [9,8,7]}
{"text1" : "black", "array1" : [10,11]}

hive> create table t7 (json string);

hive> load data inpath '/tmp/array1.json' overwrite into table t7;

hive> select ar1 from t7 lateral view json_tuple(t7.json, 'text1', 'array1') 
view1 as t1, ar1;

[6,5,4]
[1,2,3]
[9,8,7]
[10,11]

Notice that the answer is correct; these are the arrays within the JSON array1 
field.

But how can I get just one of the values out of the query, such as ar1[1] ? I 
want the answers

5
2
8
11

I have tried every syntax I can think of, including the explode() function. No 
luck. Is this possible?

TIA,
Chuck



RE: How to load csv data into HIVE

2012-09-08 Thread Connell, Chuck
I would like to hear more about this "hadoop streaming to Hive" idea. I have 
used streaming jobs as mappers, with a python script as map.py. Are you saying 
that such a streaming mapper can load its output into Hive? Can you send some 
example code? Hive wants to load "files" not individual lines/records. How 
would you do this?

Thanks very much,
Chuck



From: praveenesh kumar [praveen...@gmail.com]
Sent: Saturday, September 08, 2012 7:54 AM
To: user@hive.apache.org
Subject: Re: How to load csv data into HIVE

You can use hadoop streaming that would be much faster... Just run your 
cleaning shell script logic in map phase and it will be done in just few 
minutes. That will keep the data in HDFS.

Regards,
Praveenesh

On Fri, Sep 7, 2012 at 8:37 PM, Sandeep Reddy P 
mailto:sandeepreddy.3...@gmail.com>> wrote:
Hi,
Thank you all for your help. I'll try both ways and i'll get back to you.


On Fri, Sep 7, 2012 at 11:02 AM, Mohammad Tariq 
mailto:donta...@gmail.com>> wrote:
I said this assuming that a Hadoop cluster is available since Sandeep is 
planning to use Hive. If that is the case then MapReduce would be faster for 
such large files.

Regards,
Mohammad Tariq



On Fri, Sep 7, 2012 at 8:27 PM, Connell, Chuck 
mailto:chuck.conn...@nuance.com>> wrote:
I cannot promise which is faster. A lot depends on how clever your scripts are.



From: Sandeep Reddy P 
[mailto:sandeepreddy.3...@gmail.com]
Sent: Friday, September 07, 2012 10:42 AM
To: user@hive.apache.org
Subject: Re: How to load csv data into HIVE

Hi,
I wrote a shell script to get csv data but when i run that script on a 12GB csv 
its taking more time. If i run a python script will that be faster?
On Fri, Sep 7, 2012 at 10:39 AM, Connell, Chuck 
mailto:chuck.conn...@nuance.com>> wrote:
How about a Python script that changes it into plain tab-separated text? So it 
would look like this…

17496927414-mar-20063522876 
14-mar-200650308651
etc…

Tab-separated with newlines is easy to read and works perfectly on import.

Chuck Connell
Nuance R&D Data Team
Burlington, MA
781-565-4611

From: Sandeep Reddy P 
[mailto:sandeepreddy.3...@gmail.com]
Subject: How to load csv data into HIVE

Hi,
Here is the sample data
"174969274","14-mar-2006","
3522876","","14-mar-2006","50308","65","1"|
"174969275","19-jul-2006","3523154","","19-jul-2006","50308","65","1"|
"174969276","31-dec-2005","3530333","","31-dec-2005","50308","65","1"|
"174969277","14-apr-2005","3531470","","14-apr-2005","50308","65","1"|

How to load this kind of data into HIVE?
I'm using shell script to get rid of double quotes and '|' but its taking very 
long time to work on each csv which are 12GB each. What is the best way to do 
this?




--
Thanks,
sandeep




--
Thanks,
sandeep




Re: How to load csv data into HIVE

2012-09-08 Thread praveenesh kumar
You can use hadoop streaming that would be much faster... Just run your
cleaning shell script logic in map phase and it will be done in just few
minutes. That will keep the data in HDFS.

Regards,
Praveenesh

On Fri, Sep 7, 2012 at 8:37 PM, Sandeep Reddy P  wrote:

> Hi,
> Thank you all for your help. I'll try both ways and i'll get back to you.
>
>
> On Fri, Sep 7, 2012 at 11:02 AM, Mohammad Tariq wrote:
>
>> I said this assuming that a Hadoop cluster is available since Sandeep is
>> planning to use Hive. If that is the case then MapReduce would be faster
>> for such large files.
>>
>> Regards,
>> Mohammad Tariq
>>
>>
>>
>> On Fri, Sep 7, 2012 at 8:27 PM, Connell, Chuck 
>> wrote:
>>
>>>  I cannot promise which is faster. A lot depends on how clever your
>>> scripts are.
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> *From:* Sandeep Reddy P [mailto:sandeepreddy.3...@gmail.com]
>>> *Sent:* Friday, September 07, 2012 10:42 AM
>>> *To:* user@hive.apache.org
>>> *Subject:* Re: How to load csv data into HIVE
>>>
>>> ** **
>>>
>>> Hi,
>>> I wrote a shell script to get csv data but when i run that script on a
>>> 12GB csv its taking more time. If i run a python script will that be faster?
>>> 
>>>
>>> On Fri, Sep 7, 2012 at 10:39 AM, Connell, Chuck <
>>> chuck.conn...@nuance.com> wrote:
>>>
>>> How about a Python script that changes it into plain tab-separated text?
>>> So it would look like this…
>>>
>>>  
>>>
>>> 17496927414-mar-20063522876
>>> 14-mar-200650308651
>>> etc…
>>>
>>>  
>>>
>>> Tab-separated with newlines is easy to read and works perfectly on
>>> import.
>>>
>>>  
>>>
>>> Chuck Connell
>>>
>>> Nuance R&D Data Team
>>>
>>> Burlington, MA
>>>
>>> 781-565-4611
>>>
>>>  
>>>
>>> *From:* Sandeep Reddy P [mailto:sandeepreddy.3...@gmail.com]
>>> *Subject:* How to load csv data into HIVE
>>>
>>>  
>>>
>>> Hi,
>>> Here is the sample data
>>> "174969274","14-mar-2006","
>>>
>>> 3522876","","14-mar-2006","50308","65","1"|
>>>
>>> "174969275","19-jul-2006","3523154","","19-jul-2006","50308","65","1"|
>>>
>>> "174969276","31-dec-2005","3530333","","31-dec-2005","50308","65","1"|
>>>
>>> "174969277","14-apr-2005","3531470","","14-apr-2005","50308","65","1"|
>>>
>>> How to load this kind of data into HIVE?
>>> I'm using shell script to get rid of double quotes and '|' but its
>>> taking very long time to work on each csv which are 12GB each. What is the
>>> best way to do this?
>>>
>>>  
>>>
>>>
>>>
>>>
>>> --
>>> Thanks,
>>> sandeep
>>>
>>
>>
>
>
> --
> Thanks,
> sandeep
>
>