Re: how to create an array from two columns?

2016-03-12 Thread Chandeep Singh
Not sure if there is a function for that but I wrote a UDF to do so - 
https://github.com/chandeepsingh/Hive-UDFs 
<https://github.com/chandeepsingh/Hive-UDFs>

hive> ADD JAR hive-udfs-1.0-uber.jar;
Added [hive-udfs-1.0-uber.jar] to class path
Added resources: [hive-udfs-1.0-uber.jar]

hive> CREATE TEMPORARY FUNCTION array_dedup AS 'com.hive.udfs.UdfArrayDeDup';
OK
Time taken: 0.015 seconds

hive> SELECT array_dedup(array("blah","blah","blah")) from table1 limit 1;
OK
["blah"]
Time taken: 0.502 seconds, Fetched: 1 row(s)

Here is the code:

package com.hive.udfs;

/**
 *
 * @author chandeepsingh 
 * Remove duplicates from an array
 */

import java.util.ArrayList;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils;

import java.util.HashSet;
import java.util.List;

public class UdfArrayDeDup extends GenericUDF {

ListObjectInspector arrayOI = null;

@Override
public ObjectInspector initialize(ObjectInspector[] arguments)
throws UDFArgumentException {

arrayOI = (ListObjectInspector) arguments[0];
return ObjectInspectorUtils.getStandardObjectInspector(arrayOI);
}

@Override
public Object evaluate(DeferredObject[] arguments) throws HiveException {

List myArr = (List) 
ObjectInspectorUtils.copyToStandardObject(arguments[0].get(), arrayOI);
HashSet myHashSet = new HashSet<>();

myHashSet.addAll(myArr);

if (myHashSet != null) {
return new ArrayList<>(myHashSet);
} else {
return null;
}

}

@Override
public String getDisplayString(String[] input) {
return new String();
}
}



> On Mar 13, 2016, at 1:30 AM, Rex X  wrote:
> 
> For the first question, is there any way to use "set" instead of an "array" 
> to dedupe all elements?
> 
> "select array(1,1)" will return "[1,1]", not "[1]".
> 
> 
> 
> On Sat, Mar 12, 2016 at 5:26 PM, Rex X  <mailto:dnsr...@gmail.com>> wrote:
> Thank you, Chandeep. Yes, my first problem solved. 
> How about the second one? Is there any way to append an element to an 
> existing array?
> 
> 
> 
> On Sat, Mar 12, 2016 at 5:10 PM, Chandeep Singh  <mailto:c...@chandeep.com>> wrote:
> If you only want the array while you’re querying table1 your example should 
> work. If you want to add AB to the table you’ll probably need to create a new 
> table by selecting everything you need from table1.
> 
> hive> select * from table1 limit 1;
> OK
> temp1 temp2   temp3
> 
> hive> select f1, array(f2, f3) AS AB from table1 limit 1;
> OK
> temp1 [“temp2”,"temp3"]
> 
> 
>> On Mar 13, 2016, at 12:33 AM, Rex X > <mailto:dnsr...@gmail.com>> wrote:
>> 
>> How to make the following work?
>> 
>> 1. combine columns A and B to make one array as a new column AB. Both column 
>> A and B are string types.
>> 
>>   select 
>> string_columnA, 
>> string_columnB, 
>> array(string_columnA, string_columnB) as AB
>> from Table1;
>> 
>> 2. append columnA to an existing array-type column B
>> 
>> select
>> string_columnA,
>> array_columnB,
>> array_flatmerge(string_columnA, array_columnB) as AB
>> from Table2;
>> 
>> In fact, I should say "set" instead of "array" above, since I expect no 
>> duplicates.
>> 
>> Any idea?
>> 
> 
> 
> 



Re: how to create an array from two columns?

2016-03-12 Thread Chandeep Singh
Writing your own UDF is always an option :)

> On Mar 13, 2016, at 1:46 AM, Chandeep Singh  wrote:
> 
> Since data is stored in HDFS you have very limited scope to directly append. 
> 
> As a workaround you could get the contents of the original array by their 
> index and then create a new array. This would only make sense if you know the 
> number of elements in your array and it doesn’t change across rows.
> 
> select array(ab[0],ab[1],"blah") from table2;
> OK
> ["temp1","temp2","blah”]
> 
> 
>> On Mar 13, 2016, at 1:26 AM, Rex X > <mailto:dnsr...@gmail.com>> wrote:
>> 
>> Thank you, Chandeep. Yes, my first problem solved. 
>> How about the second one? Is there any way to append an element to an 
>> existing array?
>> 
>> 
>> 
>> On Sat, Mar 12, 2016 at 5:10 PM, Chandeep Singh > <mailto:c...@chandeep.com>> wrote:
>> If you only want the array while you’re querying table1 your example should 
>> work. If you want to add AB to the table you’ll probably need to create a 
>> new table by selecting everything you need from table1.
>> 
>> hive> select * from table1 limit 1;
>> OK
>> temp1temp2   temp3
>> 
>> hive> select f1, array(f2, f3) AS AB from table1 limit 1;
>> OK
>> temp1[“temp2”,"temp3"]
>> 
>> 
>>> On Mar 13, 2016, at 12:33 AM, Rex X >> <mailto:dnsr...@gmail.com>> wrote:
>>> 
>>> How to make the following work?
>>> 
>>> 1. combine columns A and B to make one array as a new column AB. Both 
>>> column A and B are string types.
>>> 
>>>   select 
>>> string_columnA, 
>>> string_columnB, 
>>> array(string_columnA, string_columnB) as AB
>>> from Table1;
>>> 
>>> 2. append columnA to an existing array-type column B
>>> 
>>> select
>>> string_columnA,
>>> array_columnB,
>>> array_flatmerge(string_columnA, array_columnB) as AB
>>> from Table2;
>>> 
>>> In fact, I should say "set" instead of "array" above, since I expect no 
>>> duplicates.
>>> 
>>> Any idea?
>>> 
>> 
>> 
> 



Re: how to create an array from two columns?

2016-03-12 Thread Chandeep Singh
Since data is stored in HDFS you have very limited scope to directly append. 

As a workaround you could get the contents of the original array by their index 
and then create a new array. This would only make sense if you know the number 
of elements in your array and it doesn’t change across rows.

select array(ab[0],ab[1],"blah") from table2;
OK
["temp1","temp2","blah”]


> On Mar 13, 2016, at 1:26 AM, Rex X  wrote:
> 
> Thank you, Chandeep. Yes, my first problem solved. 
> How about the second one? Is there any way to append an element to an 
> existing array?
> 
> 
> 
> On Sat, Mar 12, 2016 at 5:10 PM, Chandeep Singh  <mailto:c...@chandeep.com>> wrote:
> If you only want the array while you’re querying table1 your example should 
> work. If you want to add AB to the table you’ll probably need to create a new 
> table by selecting everything you need from table1.
> 
> hive> select * from table1 limit 1;
> OK
> temp1 temp2   temp3
> 
> hive> select f1, array(f2, f3) AS AB from table1 limit 1;
> OK
> temp1 [“temp2”,"temp3"]
> 
> 
>> On Mar 13, 2016, at 12:33 AM, Rex X > <mailto:dnsr...@gmail.com>> wrote:
>> 
>> How to make the following work?
>> 
>> 1. combine columns A and B to make one array as a new column AB. Both column 
>> A and B are string types.
>> 
>>   select 
>> string_columnA, 
>> string_columnB, 
>> array(string_columnA, string_columnB) as AB
>> from Table1;
>> 
>> 2. append columnA to an existing array-type column B
>> 
>> select
>> string_columnA,
>> array_columnB,
>> array_flatmerge(string_columnA, array_columnB) as AB
>> from Table2;
>> 
>> In fact, I should say "set" instead of "array" above, since I expect no 
>> duplicates.
>> 
>> Any idea?
>> 
> 
> 



Re: how to create an array from two columns?

2016-03-12 Thread Chandeep Singh
If you only want the array while you’re querying table1 your example should 
work. If you want to add AB to the table you’ll probably need to create a new 
table by selecting everything you need from table1.

hive> select * from table1 limit 1;
OK
temp1   temp2   temp3

hive> select f1, array(f2, f3) AS AB from table1 limit 1;
OK
temp1   [“temp2”,"temp3"]


> On Mar 13, 2016, at 12:33 AM, Rex X  wrote:
> 
> How to make the following work?
> 
> 1. combine columns A and B to make one array as a new column AB. Both column 
> A and B are string types.
> 
>   select 
> string_columnA, 
> string_columnB, 
> array(string_columnA, string_columnB) as AB
> from Table1;
> 
> 2. append columnA to an existing array-type column B
> 
> select
> string_columnA,
> array_columnB,
> array_flatmerge(string_columnA, array_columnB) as AB
> from Table2;
> 
> In fact, I should say "set" instead of "array" above, since I expect no 
> duplicates.
> 
> Any idea?
> 



Re: Field delimiter in hive

2016-03-08 Thread Chandeep Singh
I’ve been pretty successful with two pipes (||) or two carets (^^) based on my 
dataset even though they aren’t unicode.

> On Mar 7, 2016, at 8:32 PM, mahender bigdata  
> wrote:
> 
> Any help on this.
> 
> On 3/3/2016 2:38 PM, mahender bigdata wrote:
>> Hi,
>> 
>> I'm bit confused to know which character should be taken as delimiter for 
>> hive table generically. Can any one suggest me best Unicode character which 
>> doesn't come has part of data.
>> 
>> Here are the couple of options, Im thinking off for Field Delimiter. Please 
>> let me know which is best one use and chance of that character ( i.e 
>> delimiter ) in data is less in day to day scenario..
>> 
>> \U0001  = START OF HEADING ==> SOH  ==> ( CTRL+SHIFT+A in windows)  ==> 
>> Hive Default delimiter
>> 
>> 
>> \U001F  = INFORMATION SEPARATOR ONE = unit separator (US)  => ( CTRL+SHIFT+ 
>> - in windows)
>> 
>> 
>> \U001E  = INFORMATION SEPARATOR TWO = record separator (RS) ==> ( 
>> CTRL+SHIFT+6 in windows)
>> 
>> Some how by name i feel \U001F is best option, can any one comment or 
>> provide best Unicode which doesn't in regular data.
>> 
>> 
>> 
> 



Re: DB2 DDL to Hive DDL conversion **Need Help**

2016-02-19 Thread Chandeep Singh
Tables can be imported directly into Hive using Sqoop with the following flag 
--hive-import.

Once you have the tables in Hive you can get their create DDL scripts using 
SHOW CREATE TABLE ;

> On Feb 19, 2016, at 5:31 PM, Mohit Durgapal  wrote:
> 
> If he can import the scripts in db2 rdbms then it can create equivalent hive 
> ddl script using sqoop.
> 
> On Friday 19 February 2016, Dmitry Tolpeko  > wrote:
> Abhi needs to convert SQL scripts so I am afraid Sqoop will not help.
> 
> Abhi, do you need to get equivalent Hive scripts or creating tables in Hive 
> will be enough (without having scripts)? The new HPL/SQL tool is designed to 
> execute existing DDL (created for any database), convert on the fly and 
> create tables in Hive. 
> 
> Will it be a good solution for you? I tested HPL/SQL using Oracle, SQL Server 
> and some DB2 DDL. If there are issues I can extend the tool, contact me.
> 
> Dmitry
> 
> 
> On Fri, Feb 19, 2016 at 11:55 AM, Mohit Durgapal  > wrote:
> Have you considered using Sqoop? If not, then please have a look at the 
> following links:
> 
> https://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_importing_data_into_hive
>  
> 
> http://stackoverflow.com/questions/17064144/how-do-i-use-sqoop-for-importing-data-from-a-relational-db-to-sandbox-hive
>  
> 
> 
> On Fri, Feb 19, 2016 at 1:04 PM, Abhishek Singh  > wrote:
> Hi, 
> 
> We have almost 1000 DB2 RDBMS tables and for those we have the DDLs (script) 
> available. 
> We are looking for a way to convert all these DB2 DDLs into Hive DDL
> without writing Hive DDL statements for each and every table. Means, is there 
> an 
> automated tool available to do this?  If not, then can someone please 
> guide me if we have to write code then what exactly needs to be done step
> by step. or any simple way to avoid lots of manual work. 
> 
> Thanks 
> 
> Abhi
> 
> 



Re: Apache hive and sqoop

2016-02-18 Thread Chandeep Singh
There is a flag to import all tables of a database called import-all-tables but 
this does not bring in DB properties. Data for each table will be stored in a 
separate directory which would be named after the table name.

> On Feb 18, 2016, at 12:07 PM, Archana Patel  wrote:
> 
> HI ,
> 
> I am new to hive. And I have a question that is it possible to import whole 
> database of mysql to hive using apache sqoop. I have already imported data of 
> mysql to hive for one table.
> Thanks in advance.
> 
> Archana 
> Skype id(live:archana961)



Re: How can we find Hive version from Hive CLI or Hive shell?

2016-02-17 Thread Chandeep Singh
In order to run a unix shell command from hive you need to add ! to the 
beginning 

For example in your case: 

hive> ! hive --version;
Hive 1.1.0-cdh5.4.8
Subversion 
file:///data/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hive-1.1.0-cdh5.4.8
 -r Unknown
Compiled by jenkins on Thu Oct 15 08:52:23 PDT 2015


> On Feb 17, 2016, at 6:25 AM, Abhishek Dubey  
> wrote:
> 
> Well thanks for the reply but it didn’t seems to work at my side.
>  
> 
> To be more precise, I want to determine hive version while hive is running 
> like by querying or something…
>  
> Thanks & Regards,
> Abhishek Dubey
>  
>  
> From: Amrit Jangid [mailto:amrit.jan...@goibibo.com] 
> Sent: Wednesday, February 17, 2016 11:43 AM
> To: user@hive.apache.org
> Subject: Re: How can we find Hive version from Hive CLI or Hive shell?
>  
> >>hive --version
> Hive 0.13.1-cdh5.3.5
>  
> On Wed, Feb 17, 2016 at 11:33 AM, Abhishek Dubey  > wrote:
> Hi,
>  
> How can we find Hive version from Hive CLI or Hive shell?
>  
> Thanks & Regards,
> Abhishek Dubey
>  
> 
> 
>  
> -- 
> 
> Regards,
> Amrit  
> DataPlatform Team
> 
> 



Re: Need help :Does anybody has HDP cluster on EC2?

2016-02-15 Thread Chandeep Singh
You could also fire up a VNC session and access all internal pages from there.

> On Feb 15, 2016, at 9:19 AM, Divya Gehlot  wrote:
> 
> Hi Sabarish,
> Thanks alot for your help.
> I am able to view the logs now 
> 
> Thank you very much .
> 
> Cheers,
> Divya 
> 
> 
> On 15 February 2016 at 16:51, Sabarish Sasidharan 
> mailto:sabarish.sasidha...@manthan.com>> 
> wrote:
> You can setup SSH tunneling.
> 
> http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-ssh-tunnel.html
>  
> 
> 
> Regards
> Sab
> 
> On Mon, Feb 15, 2016 at 1:55 PM, Divya Gehlot  > wrote:
> Hi,
> I have hadoop cluster set up in EC2.
> I am unable to view application logs in Web UI as its taking internal IP 
> Like below :
> http://ip-xxx-xx-xx-xxx.ap-southeast-1.compute.internal:8042 
> 
> 
> How can I change this to external one or redirecting to external ?
> Attached screenshots for better understanding of my issue.
> 
> Would really appreciate help.
> 
> 
> Thanks,
> Divya 
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> 
> For additional commands, e-mail: user-h...@spark.apache.org 
> 
> 
> 
> 
> -- 
> 
> Architect - Big Data
> Ph: +91 99805 99458 
> 
> Manthan Systems | Company of the year - Analytics (2014 Frost and Sullivan 
> India ICT)
> +++
>