Hive SQL query

2016-04-06 Thread brajmohan saxena
Hi,

Is there any SQL query or command to create server in Hive.
Also please let me know any API for creating server and retrieving server
info in JDBC.

Thanks
Braj


Hive support in oozie shell action

2016-04-06 Thread Ajay Chander
Hi Everyone,

I am trying to execute a hive script in oozie shell action like below,




Action failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]



${jobTracker}
${nameNode}
/user/hue/oozie/workspaces/hue-oozie-1459982183.04/b.sh
/user/hue/oozie/workspaces/hue-oozie-1459982183.04/b.sh#b.sh









b.sh:

#!/bin/bash

hive -e "LOAD DATA INPATH '/user/test/landing_zone/file1 _*' INTO TABLE
mydb.Test"

It throws 'Launcher Error reason main class
[org.apache.oozie.action.hadoop.ShellMain] exit code [1] '

Any pointers ? Thanks for your time.


How to use Spark JDBC to read from RDBMS table, create Hive ORC table and save RDBMS data in it

2016-04-06 Thread Mich Talebzadeh
Hi,

There was a question on the merits of using Sqoop to ingest data from
Oracle table to Hive.

The issue is that Sqoop reverts to MapReduce when getting data into Hive
which is not that great. One can do IMO better by using JDBC connection
(which is identical with what Sqoop does anyway but crucially using Spark
faster processing. So I did this test.

Basically


   1. Create sqlContext based on HiveContext
   2. Use JDBC to get Oracle table data
   3. Register  data as temporary table
   4. Ensure that you cater for compatibility issues. For example an Oracle
   column of type NUMBER is translated as decimal(38,10) in JDBC. That will
   cause "Overflowed precision"  error in Spark! Convert it to TO_CHAR in JDBC
   (see below)
   5. Create ORC table in a give Hive database
   6. Insert/select from temp table to ORC table

Contrary to belief in Spark you can create an ORC table in Hive and will
work fine. You can also choose which database in Hive to create your table
in. Just to be clear I used Spark 1.6 with Hive 2.

I had to use sbt or Maven to build a project for this purpose but it works.

This is a sample code in Scala getting just under 1 million rows from
Oracle table.

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.Row
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.types._
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.functions._
//
object Create_and_populate_table_JDBC {
  def main(args: Array[String]) {
//
  val conf = new SparkConf().
   setAppName("Create_and_populate_table_JDBC").
   setMaster("local[12]").
   set("spark.driver.allowMultipleContexts", "true").
   set("spark.hadoop.validateOutputSpecs", "false")
  val sc = new SparkContext(conf)
  // Create sqlContext based on HiveContext
  val sqlContext = new HiveContext(sc)
  import sqlContext.implicits._
  val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
  println ("\nStarted at"); sqlContext.sql("SELECT
FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
").collect.foreach(println)
//
  var _ORACLEserver : String = "jdbc:oracle:thin:@rhes564:1521:mydb"
  var _username : String = "sh"
  var _password : String = "xx"
//
  val s = HiveContext.load("jdbc",
  Map("url" -> _ORACLEserver,
  "dbtable" -> "(SELECT to_char(PROD_ID) AS PROD_ID, to_char(CUST_ID) AS
CUST_ID, to_char(TIME_ID) AS TIME_ID, to_char(CHANNEL_ID) AS CHANNEL_ID,
to_char(PROMO_ID) AS PROMO_ID, QUANTITY_SOLD, AMOUNT_SOLD FROM sh.sales)",
  "user" -> _username,
  "password" -> _password))
//
  s.registerTempTable("tmp")
// Need to create and populate target ORC table sales in database test in
Hive
//
  HiveContext.sql("use test")
//
// Drop and create table sales in Hive test database
//
  HiveContext.sql("DROP TABLE IF EXISTS test.sales")
  var sqltext : String = ""
  sqltext = """
  CREATE TABLE test.sales
 (
  PROD_IDbigint   ,
  CUST_IDbigint   ,
  TIME_IDtimestamp,
  CHANNEL_ID bigint   ,
  PROMO_ID   bigint   ,
  QUANTITY_SOLD  decimal(10)  ,
  AMOUNT_SOLDdecimal(10)
)
CLUSTERED BY (PROD_ID,CUST_ID,TIME_ID,CHANNEL_ID,PROMO_ID) INTO 256 BUCKETS
STORED AS ORC
TBLPROPERTIES ( "orc.compress"="SNAPPY",
"orc.create.index"="true",
"orc.bloom.filter.columns"="PROD_ID,CUST_ID,TIME_ID,CHANNEL_ID,PROMO_ID",
"orc.bloom.filter.fpp"="0.05",
"orc.stripe.size"="268435456",
"orc.row.index.stride"="1" )
"""
  HiveContext.sql(sqltext)
// Put data in Hive table.
//
  sqltext = """
INSERT INTO TABLE test.sales
SELECT
*
FROM tmp
"""
  HiveContext.sql(sqltext)

  HiveContext.sql("select count(1) from
test.sales").collect.foreach(println)
  println ("\nFinished at"); sqlContext.sql("SELECT
FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
").collect.foreach(println)
  }
}

Started at
[06/04/2016 23:17:49.49]
[918843]
Finished at
[06/04/2016 23:18:05.05]


HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


Re: analyse command not working on decimal(38,0) datatype

2016-04-06 Thread Mich Talebzadeh
plus the version of version of Hive

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 6 April 2016 at 16:48, Jörn Franke  wrote:

> Please provide exact log messages , create table statements, insert
> statements
>
> > On 06 Apr 2016, at 12:05, Ashim Sinha  wrote:
> >
> > Hi Team
> > Need help for the issue
> > Steps followed
> > table created
> > Loaded the data of lenght 38 in decimal type
> > Analyse table - for columns gives error like zero Lenghth biginteger
>


Re: Hive wiki write access request

2016-04-06 Thread Ashutosh Chauhan
Done.

Thanks,
Ashutosh

On Wed, Apr 6, 2016 at 6:22 AM, Random Suit  wrote:

> Putting in a request for write access on the Hive wiki. My confluence
> username is "therandomsuit".
>
> Thanks.
>


Re: analyse command not working on decimal(38,0) datatype

2016-04-06 Thread Jörn Franke
Please provide exact log messages , create table statements, insert statements

> On 06 Apr 2016, at 12:05, Ashim Sinha  wrote:
> 
> Hi Team
> Need help for the issue
> Steps followed
> table created
> Loaded the data of lenght 38 in decimal type
> Analyse table - for columns gives error like zero Lenghth biginteger


DAG Kill while insert into A select * from B

2016-04-06 Thread no jihun
Hi.

I have trouble to run query

"insert into table mylog_orc_bucketing select * from mylog_orc"


*Job received Kill while in RUNNING state.Vertex killed, vertexName=Map 1,
vertexId=vertex_1459773554571_0009_10_00, diagnostics=*
* Vertex received Kill while in RUNNING state., Vertex did not succeed due
to DAG_KILL, failedTasks:0 killedTasks:145, Vertex
vertex_1459773554571_0009_10_00*
* Map 1*
*killed/failed due to:DAG_KILL*

*DAG did not succeed due to DAG_KILL. failedVertices:0 killedVertices:1*

Actually mylog_orc has 300million rows and I think my cluster may have not
enough resource(ram,cpu...).

Then I tried with small table.
"insert into table mylog_orc_bucketing select * from mylog_orc_mini"
mylog_orc_mini has 300K rows and this works well. with no fail.

So I wonder how can I find the log about deep cause of the DAG KILL of
first query? something like OOM..  which tell me what should I change or
scallout/up.

I tested this on hive 1.2 + tez0.7.

Thanks!


RE: Best way of Unpivoting of hiva table data. Any Analytic function for unpivoting

2016-04-06 Thread Markovitz, Dudu
You can do something like this:

select id,key,value from my_table lateral view explode 
(map('fname',first_name,'lname',last_name)) t;

Given the following table:

id, first_name,last_name
__
1,Dudu,Markovitz
2,Andrew,Sears

The result will look like:

Id,key,value
__
1,fname,Dudu
1,lname,Markovitz
2,fname, Andrew
2,lname, Sears

Dudu

From: mahender bigdata [mailto:mahender.bigd...@outlook.com]
Sent: Wednesday, April 06, 2016 12:59 AM
To: user@hive.apache.org
Subject: Re: Best way of Unpivoting of hiva table data. Any Analytic function 
for unpivoting

Thanks Dudu.. So I make changes to use Union All.
So i hope there is no bultin- udf for doing this functionality
On 4/5/2016 2:08 PM, Markovitz, Dudu wrote:
Hi

Please make sure you are use "union all" and not "union".
"union all" just spools one query result after the other.
"union" eliminates duplicated rows, therefore works much harder.

Dudu

From: mahender bigdata [mailto:mahender.bigd...@outlook.com]
Sent: Tuesday, April 05, 2016 11:50 PM
To: user@hive.apache.org
Subject: Re: Best way of Unpivoting of hiva table data. Any Analytic function 
for unpivoting

Hi Adrew,

Sorry for delay in response. currently I'm using Select with Union :-)  . I 
would like to know is there any Built-in Hive UDF available for unpivoting .



On 3/30/2016 2:23 PM, Andrew Sears wrote:

>From mytable
Select id, 'mycol' as name, col1 as value
Union
Select id, 'mycol2' as name, col2 as value

Something like this might work for you?

Cheers,
Andrew

On Mon, Mar 28, 2016 at 7:53 PM, Ryan Harris 
> wrote:


collect_list(col) will give you an array with all of the data from that column
However, the scalability of this approach will have limits.

-Original Message-
From: mahender bigdata [mailto:mahender.bigd...@outlook.com]
Sent: Monday, March 28, 2016 5:47 PM
To: user@hive.apache.org
Subject: Best way of Unpivoting of hiva table data. Any Analytic function for 
unpivoting

Hi,

Has any one implemented Unpivoting of Hive external table data. We would
like Convert Columns into Multiple Rows. We have external table, which
holds almost 2 GB of Data. is there best and quicker way of Converting
columns into Row. Any Analytic functions available in Hive to do Unpivoting.

==
THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL 
and may contain information that is privileged and exempt from disclosure under 
applicable law. If you are neither the intended recipient nor responsible for 
delivering the message to the intended recipient, please note that any 
dissemination, distribution, copying or the taking of any action in reliance 
upon the message is strictly prohibited. If you have received this 
communication in error, please notify the sender immediately.  Thank you.




Re: HBase table map to hive

2016-04-06 Thread naga sharathrayapati
If your schema is changing frequently I think it's better to use Avro for
schema evolution, and go with 'Avro Serde' with 'AvroContainerInputFormat'
&  'AvroContaineOutputFormat' for creating table in Hive from Hbase.

On Mon, Apr 4, 2016 at 1:19 PM, Wojciech Indyk 
wrote:

> Hi!
> You can use map on your column family or a prefix of
> column qualifier.
>
> https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-HiveMAPtoHBaseColumnFamily
>
> --
> Kind regards/ Pozdrawiam,
> Wojciech Indyk
> http://datacentric.pl
>
>
> 2016-04-04 14:13 GMT+02:00 ram kumar :
> > Hi,
> >
> > I have a hbase table with column name changes (increases) over time.
> > Is there a way to map such hbase to hive table,
> > inferring schema from the hbase table?
> >
> > Thanks
>