Re: read-only mode for hive

2016-03-08 Thread Jörn Franke
What is the use case? You can try security solutions such as Ranger or Sentry.

As already mentioned another alternative could be a view.

> On 08 Mar 2016, at 21:09, PG User  wrote:
> 
> Hi All,
> I have one question about putting hive in read-only mode.
> 
> What are the ways of putting hive in read-only mode?
> Can I take a lock at database level to serve purpose? What will happen to 
> existing transaction? My guess is it will not grant a lock until all 
> transactions are complete.
> 
> I read to change owner ship of /user/hive/warehouse/, but it is not full 
> proof solution.
> 
> Thank you.
> 
> - PG User


Re: read-only mode for hive

2016-03-08 Thread Sabarish Sasidharan
Does views solve your purpose?

Regards
Sab
On 09-Mar-2016 6:23 am, "PG User"  wrote:

> Yes, putting any database or any table to read-only mode will serve my
> purpose.
>
> On Tue, Mar 8, 2016 at 3:15 PM, Mich Talebzadeh  > wrote:
>
>> Hive much like MSSQL or SAP ASE has multiple databases. Are you implying
>> to put one of these databases in READ ONLY mode?
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 8 March 2016 at 20:09, PG User  wrote:
>>
>>> Hi All,
>>> I have one question about putting hive in read-only mode.
>>>
>>> What are the ways of putting hive in read-only mode?
>>> Can I take a lock at database level to serve purpose? What will happen
>>> to existing transaction? My guess is it will not grant a lock until all
>>> transactions are complete.
>>>
>>> I read to change owner ship of /user/hive/warehouse/, but it is not full
>>> proof solution.
>>>
>>> Thank you.
>>>
>>> - PG User
>>>
>>
>>
>


HiveServer2 session hook

2016-03-08 Thread Ravuri, Venkata Puneet
Hi,

I am trying to use HiveServer2 session hook to get query string within a 
session.
The hook returns an identifier to the SessionHandle through getSessionHandle() 
method.
I am not able to get the SessionHandle using the identifier.

Can you please provide a way using which I can retrieve the session handle 
within the hook?


Thanks,
Puneet


Re: read-only mode for hive

2016-03-08 Thread PG User
Yes, putting any database or any table to read-only mode will serve my
purpose.

On Tue, Mar 8, 2016 at 3:15 PM, Mich Talebzadeh 
wrote:

> Hive much like MSSQL or SAP ASE has multiple databases. Are you implying
> to put one of these databases in READ ONLY mode?
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 8 March 2016 at 20:09, PG User  wrote:
>
>> Hi All,
>> I have one question about putting hive in read-only mode.
>>
>> What are the ways of putting hive in read-only mode?
>> Can I take a lock at database level to serve purpose? What will happen to
>> existing transaction? My guess is it will not grant a lock until all
>> transactions are complete.
>>
>> I read to change owner ship of /user/hive/warehouse/, but it is not full
>> proof solution.
>>
>> Thank you.
>>
>> - PG User
>>
>
>


Re: Simple UDFS and IN Operator

2016-03-08 Thread Gopal Vijayaraghavan

 
>   In Hive 0.11, I¹ve written a UDF that returns a list of Integers. I¹d
>like to use this in a WHERE clause of a query, something like SELECT *
>FROM  WHERE   in ( getList()).
...
> Joins would be ideal, but we haven¹t upgraded yet.

IN() is actually rewritten into a JOIN (distinct ...) internally, but if
that is your only goal, Hive should still allow you to do that using the
array functions.

where array_contains(getList(), );

Cheers,
Gopal



Re: Simple UDFS and IN Operator

2016-03-08 Thread Edward Capriolo
The IN UDF is a special one in that unlike many others there is support in
the ANTLR language and parsers for it. The rough answer is it can be done
but it is not as direct as making other UDFs.


On Tue, Mar 8, 2016 at 2:32 PM, Lavelle, Shawn 
wrote:

> Hello All,
>
>I hope that this question isn’t too rudimentary – but I’m relatively
> new to HIVE.
>
>
>
>In Hive 0.11, I’ve written a UDF that returns a list of Integers. I’d
> like to use this in a WHERE clause of a query, something like SELECT * FROM
>  WHERE   in ( getList() ). (Extra parenthesis needed to pass
> the parser.)  Is such a thing possible?  Keying in values for the list
> parameter works, but they have WritableConstantIntObjectInspectors whereas
> what is returned by my list (despite my best efforts) has an element
> inspector of WritabeIntObjectInspector. This doesn’t work.
>
>   So, two questions – Should It? (The HIVE I’m working on is heavily
> modified :/ ) and how might I accomplish this?  Joins would be ideal, but
> we haven’t upgraded yet.
>
>   Thank you for your insight,
>
>
>
> ~ Shawn M Lavelle
>
>
>
>
>
>
> Shawn Lavelle
> Software Development
>
> 4101 Arrowhead Drive
> Medina, Minnesota 55340-9457
> Phone: 763 551 0559
> Fax: 763 551 0750
> *Email:* shawn.lave...@osii.com
> *Website: **www.osii.com* 
>


Re: Hive Context: Hive Metastore Client

2016-03-08 Thread Mich Talebzadeh
The current scenario resembles a three tier architecture but without the
security of second tier. In a typical three-tier you have users connecting
to the application server (read Hive server2) are independently
authenticated and if OK, the second tier creates new ,NET type or JDBC
threads to connect to database much like multi-threading. The problem I
believe is that Hive server 2 does not have that concept of handling the
individual loggings yet. Hive server 2 should be able to handle LDAP logins
as well. It is a useful layer to have.

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 8 March 2016 at 23:28, Alex  wrote:

> Yes, when creating a Hive Context a Hive Metastore client should be
> created with a user that the Spark application will talk to the *remote*
> Hive Metastore with. We would like to add a custom authorization plugin to
> our remote Hive Metastore to authorize the query requests that the spark
> application is submitting which would also add authorization for any other
> applications hitting the Hive Metastore. Furthermore we would like to
> extend this so that we can submit "jobs" to our Spark application that will
> allow us to run against the metastore as different users while leveraging
> the abilities of our spark cluster. But as you mentioned only one login
> connects to the Hive Metastore is shared among all HiveContext sessions.
>
> Likely the authentication would have to be completed either through a
> secured Hive Metastore (Kerberos) or by having the requests go through
> HiveServer2.
>
> --Alex
>
>
> On 3/8/2016 3:13 PM, Mich Talebzadeh wrote:
>
> Hi,
>
> What do you mean by Hive Metastore Client? Are you referring to Hive
> server login much like beeline?
>
> Spark uses hive-site.xml to get the details of Hive metastore and the
> login to the metastore which could be any database. Mine is Oracle and as
> far as I know even in  Hive 2, hive-site.xml has an entry for
> javax.jdo.option.ConnectionUserName that specifies username to use against
> metastore database. These are all multi-threaded JDBC connections to the
> database, the same login as shown below:
>
> LOGINSID/serial# LOGGED IN S HOST   OS PID Client PID
> PROGRAM   MEM/KB  Logical I/O Physical I/O ACT
>  --- --- -- -- --
> ---    ---
> INFO
> ---
> HIVEUSER 67,6160 08/03 08:11 rhes564oracle/20539   hduser/1234
> JDBC Thin Clien1,017   370 N
> HIVEUSER 89,6421 08/03 08:11 rhes564oracle/20541   hduser/1234
> JDBC Thin Clien1,081  5280 N
> HIVEUSER 112,561 08/03 10:45 rhes564oracle/24624   hduser/1234
> JDBC Thin Clien  889   370 N
> HIVEUSER 131,881108/03 08:11 rhes564oracle/20543   hduser/1234
> JDBC Thin Clien1,017   370 N
> HIVEUSER 47,3011408/03 10:45 rhes564oracle/24626   hduser/1234
> JDBC Thin Clien1,017   370 N
> HIVEUSER 170,895508/03 08:11 rhes564oracle/20545   hduser/1234
> JDBC Thin Clien1,017  3230 N
>
> As I understand what you are suggesting is that each Spark user uses
> different login to connect to Hive metastore. As of now there is only one
> login that connects to Hive metastore shared among all
>
> 2016-03-08T23:08:01,890 INFO  [pool-5-thread-72]: HiveMetaStore.audit
> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
> ip=50.140.197.217   cmd=source:50.140.197.217 get_table : db=test tbl=t
> 2016-03-08T23:18:10,432 INFO  [pool-5-thread-81]: HiveMetaStore.audit
> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
> ip=50.140.197.216   cmd=source:50.140.197.216 get_tables: db=asehadoop
> pat=.*
>
> And this is an entry in Hive log when connection is made theough Zeppelin
> UI
>
> 2016-03-08T23:20:13,546 INFO  [pool-5-thread-84]: metastore.HiveMetaStore
> (HiveMetaStore.java:newRawStore(499)) - 84: Opening raw store with
> implementation class:org.apache.hadoop.hive.metastore.ObjectStore
> 2016-03-08T23:20:13,547 INFO  [pool-5-thread-84]: metastore.ObjectStore
> (ObjectStore.java:initialize(318)) - ObjectStore, initialize called
> 2016-03-08T23:20:13,550 INFO  [pool-5-thread-84]:
> metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:(142)) - Using
> direct SQL, underlying DB is ORACLE
> 2016-03-08T23:20:13,550 INFO  [pool-5-thread-84]: metastore.ObjectStore
> (ObjectStore.java:setConf(301)) - Initialized ObjectStore
>
> I am not sure there is currently such plan to have different logins
> allowed to Hive Metastore. But it will add another level of security.
> Though I am not sure how this would be authenticated.
>

Re: read-only mode for hive

2016-03-08 Thread Mich Talebzadeh
Hive much like MSSQL or SAP ASE has multiple databases. Are you implying to
put one of these databases in READ ONLY mode?

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 8 March 2016 at 20:09, PG User  wrote:

> Hi All,
> I have one question about putting hive in read-only mode.
>
> What are the ways of putting hive in read-only mode?
> Can I take a lock at database level to serve purpose? What will happen to
> existing transaction? My guess is it will not grant a lock until all
> transactions are complete.
>
> I read to change owner ship of /user/hive/warehouse/, but it is not full
> proof solution.
>
> Thank you.
>
> - PG User
>


read-only mode for hive

2016-03-08 Thread PG User
Hi All,
I have one question about putting hive in read-only mode.

What are the ways of putting hive in read-only mode?
Can I take a lock at database level to serve purpose? What will happen to
existing transaction? My guess is it will not grant a lock until all
transactions are complete.

I read to change owner ship of /user/hive/warehouse/, but it is not full
proof solution.

Thank you.

- PG User


Simple UDFS and IN Operator

2016-03-08 Thread Lavelle, Shawn
Hello All,
   I hope that this question isn’t too rudimentary – but I’m relatively new to 
HIVE.

   In Hive 0.11, I’ve written a UDF that returns a list of Integers. I’d like 
to use this in a WHERE clause of a query, something like SELECT * FROM  
WHERE   in ( getList() ). (Extra parenthesis needed to pass the parser.)  
Is such a thing possible?  Keying in values for the list parameter works, but 
they have WritableConstantIntObjectInspectors whereas what is returned by my 
list (despite my best efforts) has an element inspector of 
WritabeIntObjectInspector. This doesn’t work.

  So, two questions – Should It? (The HIVE I’m working on is heavily modified 
:/ ) and how might I accomplish this?  Joins would be ideal, but we haven’t 
upgraded yet.

  Thank you for your insight,

~ Shawn M Lavelle



[cid:image641033.GIF@0ad5cb29.448952f7]

Shawn Lavelle
Software Development

4101 Arrowhead Drive
Medina, Minnesota 55340-9457
Phone: 763 551 0559
Fax: 763 551 0750
Email: shawn.lave...@osii.com
Website: www.osii.com



hive cast string to date in 'dd/MMM/yyyy' format order by and group by issue

2016-03-08 Thread dhruv kapatel
Hi,

I've date stored as [27/Feb/2016:00:24:31 +0530] in string format. I want
to cast it to dd/MMM/ format and also want to perform group by from it.
More details and tried queries see below question on stackoverflow.
http://stackoverflow.com/questions/35668624/hive-cast-string-to-date-in-dd-mmm--format-order-by-and-group-by-issue


-- 


*With Regards:Kapatel Dhruv v*


Re: Big + small + small 3 table mapjoin?

2016-03-08 Thread Yang
ah never mind I found that we are using an old version of hive without this
feature

On Tue, Mar 8, 2016 at 9:57 AM, Yang  wrote:

> by documentation I'm referring to this:
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins#LanguageManualJoins-MapJoinRestrictions
>
> " the following configurable parameters can be used to make sure that the
> query executes in a single map-reduce job."
>
> On Tue, Mar 8, 2016 at 9:31 AM, Yang  wrote:
>
>> From the documentation it says that if my tables are small enough and i
>> set the conver join parameters, without the join hints hive should be able
>> to convert the joins into 1 mapjoin in 1 mr job
>>
>> But in practice i found that it always ends up in 2 mr jobs (2 map
>> joins). What is wrong?
>>
>
>


Re: Big + small + small 3 table mapjoin?

2016-03-08 Thread Yang
by documentation I'm referring to this:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins#LanguageManualJoins-MapJoinRestrictions

" the following configurable parameters can be used to make sure that the
query executes in a single map-reduce job."

On Tue, Mar 8, 2016 at 9:31 AM, Yang  wrote:

> From the documentation it says that if my tables are small enough and i
> set the conver join parameters, without the join hints hive should be able
> to convert the joins into 1 mapjoin in 1 mr job
>
> But in practice i found that it always ends up in 2 mr jobs (2 map joins).
> What is wrong?
>


Hive 2 and versions of Spark as the execution engine for Hive

2016-03-08 Thread Mich Talebzadeh
Hi,

As I recall Hive 2 now officially recommends using Spark or Tez as
execution engine instead of Map-Reduce (MR)

hive>
*set hive.execution.engine=mr;*Hive-on-MR is deprecated in Hive 2 and may
not be available in the future versions. Consider using a different
execution engine (i.e. tez, spark) or using Hive 1.X releases.

I still run my Hive 2 on Spark 1.3.1 (I know some state that it works with
1.4.1 but that is still an older version of Spark) and even with my
1.3.1 version of Spark engine things have improved in terms of stability
and performance.

Now as a consumer I can verify that for larger tables, Hive
1.2.1 queries (even simple things like COUNT(*) or INSERT/SELECT) used to
crash before and I had to switch to MR as the execution engine. On Hive 2.0
it all works fine.

However, I guess the 60K question is what has been done to make Hive 2 work
with newer versions of Spark like 1.5.2 and 1.6, given that Hive 2
encourages  using Hive on Spark and/or Tez.

Thanks,

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


Big + small + small 3 table mapjoin?

2016-03-08 Thread Yang
>From the documentation it says that if my tables are small enough and i set
the conver join parameters, without the join hints hive should be able to
convert the joins into 1 mapjoin in 1 mr job

But in practice i found that it always ends up in 2 mr jobs (2 map joins).
What is wrong?


Re: Field delimiter in hive

2016-03-08 Thread Mich Talebzadeh
try "~|~" as field delimiter. It normally works for most conditions

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 8 March 2016 at 11:56, Chandeep Singh  wrote:

> I’ve been pretty successful with two pipes (||) or two carets (^^) based
> on my dataset even though they aren’t unicode.
>
> On Mar 7, 2016, at 8:32 PM, mahender bigdata 
> wrote:
>
> Any help on this.
>
> On 3/3/2016 2:38 PM, mahender bigdata wrote:
>
> Hi,
>
> I'm bit confused to know which character should be taken as delimiter for
> hive table generically. Can any one suggest me best Unicode character which
> doesn't come has part of data.
>
> Here are the couple of options, Im thinking off for Field Delimiter.
> Please let me know which is best one use and chance of that character ( i.e
> delimiter ) in data is less in day to day scenario..
>
> \U0001  = START OF HEADING ==> SOH  ==> ( CTRL+SHIFT+A in windows)  ==>
> Hive Default delimiter
>
>
> *\U001F  ** = INFORMATION SEPARATOR ONE = unit separator (US)  => **(
> CTRL+SHIFT+ - in windows)*
>
>
> *\U001E  ** = INFORMATION SEPARATOR TWO = record separator (RS) ==> ** (
> CTRL+SHIFT+6 in windows)*
>
> Some how by name i feel \U001F is best option, can any one comment or
> provide best Unicode which doesn't in regular data.
>
>
>
>
>
>


Re: Field delimiter in hive

2016-03-08 Thread Chandeep Singh
I’ve been pretty successful with two pipes (||) or two carets (^^) based on my 
dataset even though they aren’t unicode.

> On Mar 7, 2016, at 8:32 PM, mahender bigdata  
> wrote:
> 
> Any help on this.
> 
> On 3/3/2016 2:38 PM, mahender bigdata wrote:
>> Hi,
>> 
>> I'm bit confused to know which character should be taken as delimiter for 
>> hive table generically. Can any one suggest me best Unicode character which 
>> doesn't come has part of data.
>> 
>> Here are the couple of options, Im thinking off for Field Delimiter. Please 
>> let me know which is best one use and chance of that character ( i.e 
>> delimiter ) in data is less in day to day scenario..
>> 
>> \U0001  = START OF HEADING ==> SOH  ==> ( CTRL+SHIFT+A in windows)  ==> 
>> Hive Default delimiter
>> 
>> 
>> \U001F  = INFORMATION SEPARATOR ONE = unit separator (US)  => ( CTRL+SHIFT+ 
>> - in windows)
>> 
>> 
>> \U001E  = INFORMATION SEPARATOR TWO = record separator (RS) ==> ( 
>> CTRL+SHIFT+6 in windows)
>> 
>> Some how by name i feel \U001F is best option, can any one comment or 
>> provide best Unicode which doesn't in regular data.
>> 
>> 
>> 
> 



Re: Hive alter table concatenate loses data - can parquet help?

2016-03-08 Thread Marcin Tustin
Hi Mich,

ddl as below.

Hi Prasanth,

Hive version as reported by Hortonworks is 1.2.1.2.3.

Thanks,
Marcin

CREATE TABLE ``(

  `col1` string,

  `col2` bigint,

  `col3` string,

  `col4` string,

  `col4` string,

  `col5` bigint,

  `col6` string,

  `col7` string,

  `col8` string,

  `col9` string,

  `col10` boolean,

  `col11` boolean,

  `col12` string,

  `metadata`
struct,

  `col14` string,

  `col15` bigint,

  `col16` double,

  `col17` bigint)

ROW FORMAT SERDE

  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'

STORED AS INPUTFORMAT

  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'

OUTPUTFORMAT

  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'

LOCATION

  'hdfs://reporting-handy/'

TBLPROPERTIES (

  'COLUMN_STATS_ACCURATE'='true',

  'numFiles'='2800',

  'numRows'='297263',

  'rawDataSize'='454748401',

  'totalSize'='31310353',

  'transient_lastDdlTime'='1457437204')

Time taken: 1.062 seconds, Fetched: 34 row(s)

On Tue, Mar 8, 2016 at 4:29 AM, Mich Talebzadeh 
wrote:

> Hi
>
> can you please provide DDL for this table "show create table "
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 7 March 2016 at 23:25, Marcin Tustin  wrote:
>
>> Hi All,
>>
>> Following on from from our parquet vs orc discussion, today I observed
>> hive's alter table ... concatenate command remove rows from an ORC
>> formatted table.
>>
>> 1. Has anyone else observed this (fuller description below)? And
>> 2. How to do parquet users handle the file fragmentation issue?
>>
>> Description of the problem:
>>
>> Today I ran a query to count rows by date. Relevant days below:
>> 2016-02-28 16866
>> 2016-03-06 219
>> 2016-03-07 2863
>> I then ran concatenation on that table. Rerunning the same query resulted
>> in:
>>
>> 2016-02-28 16866
>> 2016-03-06 219
>> 2016-03-07 1158
>>
>> Note reduced count for 2016-03-07
>>
>> I then ran concatenation a second time, and the query a third time:
>> 2016-02-28 16344
>> 2016-03-06 219
>> 2016-03-07 1158
>>
>> Now the count for 2016-02-28 is reduced.
>>
>> This doesn't look like an elimination of duplicates occurring by design -
>> these didn't all happen on the first run of concatenation. It looks like
>> concatenation just kind of loses data.
>>
>>
>>
>> Want to work at Handy? Check out our culture deck and open roles
>> 
>> Latest news  at Handy
>> Handy just raised $50m
>> 
>>  led
>> by Fidelity
>>
>>
>

-- 
Want to work at Handy? Check out our culture deck and open roles 

Latest news  at Handy
Handy just raised $50m 

 led 
by Fidelity



Re: Hive alter table concatenate loses data - can parquet help?

2016-03-08 Thread Prasanth Jayachandran
Hi Marcin

What hive version are you using? There has been some fixes to concatenate 
lately. I will let you know if your hive version contains all fixes.

Thanks
Prasanth

On Mar 8, 2016, at 3:29 AM, Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>> wrote:

Hi

can you please provide DDL for this table "show create table "

Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



On 7 March 2016 at 23:25, Marcin Tustin 
mailto:mtus...@handybook.com>> wrote:
Hi All,

Following on from from our parquet vs orc discussion, today I observed hive's 
alter table ... concatenate command remove rows from an ORC formatted table.

1. Has anyone else observed this (fuller description below)? And
2. How to do parquet users handle the file fragmentation issue?

Description of the problem:

Today I ran a query to count rows by date. Relevant days below:
2016-02-28 16866
2016-03-06 219
2016-03-07 2863
I then ran concatenation on that table. Rerunning the same query resulted in:

2016-02-28 16866
2016-03-06 219
2016-03-07 1158

Note reduced count for 2016-03-07

I then ran concatenation a second time, and the query a third time:
2016-02-28 16344
2016-03-06 219
2016-03-07 1158

Now the count for 2016-02-28 is reduced.

This doesn't look like an elimination of duplicates occurring by design - these 
didn't all happen on the first run of concatenation. It looks like 
concatenation just kind of loses data.



Want to work at Handy? Check out our culture deck and open 
roles
Latest news at Handy
Handy just raised 
$50m
 led by Fidelity

[http://marketing-email-assets.handybook.com/smalllogo.png]




Re: Hive alter table concatenate loses data - can parquet help?

2016-03-08 Thread Mich Talebzadeh
Hi

can you please provide DDL for this table "show create table "

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 7 March 2016 at 23:25, Marcin Tustin  wrote:

> Hi All,
>
> Following on from from our parquet vs orc discussion, today I observed
> hive's alter table ... concatenate command remove rows from an ORC
> formatted table.
>
> 1. Has anyone else observed this (fuller description below)? And
> 2. How to do parquet users handle the file fragmentation issue?
>
> Description of the problem:
>
> Today I ran a query to count rows by date. Relevant days below:
> 2016-02-28 16866
> 2016-03-06 219
> 2016-03-07 2863
> I then ran concatenation on that table. Rerunning the same query resulted
> in:
>
> 2016-02-28 16866
> 2016-03-06 219
> 2016-03-07 1158
>
> Note reduced count for 2016-03-07
>
> I then ran concatenation a second time, and the query a third time:
> 2016-02-28 16344
> 2016-03-06 219
> 2016-03-07 1158
>
> Now the count for 2016-02-28 is reduced.
>
> This doesn't look like an elimination of duplicates occurring by design -
> these didn't all happen on the first run of concatenation. It looks like
> concatenation just kind of loses data.
>
>
>
> Want to work at Handy? Check out our culture deck and open roles
> 
> Latest news  at Handy
> Handy just raised $50m
> 
>  led
> by Fidelity
>
>


Re: Hive 2 insert error

2016-03-08 Thread Mich Talebzadeh
Ok,

When table is created as ORC but with no transactional property
INSERT/SELECT works

 CREATE TABLE sales3
 (
  PROD_IDbigint   ,
  CUST_IDbigint   ,
  TIME_IDtimestamp,
  CHANNEL_ID bigint   ,
  PROMO_ID   bigint   ,
  QUANTITY_SOLD  decimal(10)  ,
  AMOUNT_SOLDdecimal(10)
)
STORED AS ORC
TBLPROPERTIES ( "orc.compress"="SNAPPY"
)
hive> insert  into sales3 select * from smallsales;
Query ID = hduser_20160308085645_9fb4e880-d802-4a52-b30b-aa56ba1dedbd
Total jobs = 1
If you create table with transactional=true and assuming you have set up
lock manager etc in hive-site.xml then the only way INSERT will work is if
table is bucketed

 CREATE TABLE sales3
 (
  PROD_IDbigint   ,
  CUST_IDbigint   ,
  TIME_IDtimestamp,
  CHANNEL_ID bigint   ,
  PROMO_ID   bigint   ,
  QUANTITY_SOLD  decimal(10)  ,
  AMOUNT_SOLDdecimal(10)
)
CLUSTERED BY (PROD_ID,CUST_ID,TIME_ID,CHANNEL_ID,PROMO_ID) INTO 256 BUCKETS
STORED AS ORC
TBLPROPERTIES
(
  "orc.compress"="SNAPPY"
, "transactional"="true"
)
;

Updates will work if the column(s) updated are not part of bucketing which
makes sense.

I gather I have always created tables with bucketing so this was never an
issue.


Thanks







Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 8 March 2016 at 01:03, Marcin Tustin  wrote:

> I believe updates and deletes have always had this constraint. It's at
> least hinted at by:
> https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-ConfigurationValuestoSetforINSERT,UPDATE,DELETE
>
> On Mon, Mar 7, 2016 at 7:46 PM, Mich Talebzadeh  > wrote:
>
>> Hi,
>>
>> I noticed this one in Hive2.
>>
>> insert into sales3 select * from smallsales;
>> FAILED: SemanticException [Error 10297]: Attempt to do update or delete
>> on table sales3 that does not use an AcidOutputFormat or is not bucketed
>>
>> Is this something new in Hive 2 as I don't recall having this issue
>> before?
>>
>> Table sales3 has been created as follows:
>>
>> +-+--+
>> |   createtab_stmt|
>> +-+--+
>> | CREATE TABLE `sales3`(  |
>> |   `prod_id` bigint, |
>> |   `cust_id` bigint, |
>> |   `time_id` timestamp,  |
>> |   `channel_id` bigint,  |
>> |   `promo_id` bigint,|
>> |   `quantity_sold` decimal(10,0),|
>> |   `amount_sold` decimal(10,0))  |
>> | ROW FORMAT SERDE|
>> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'   |
>> | STORED AS INPUTFORMAT   |
>> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' |
>> | OUTPUTFORMAT|
>> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'|
>> | LOCATION|
>> |   'hdfs://rhes564:9000/user/hive/warehouse/oraclehadoop.db/sales3'  |
>> | TBLPROPERTIES ( |
>> |   'orc.compress'='SNAPPY',  |
>> |   'transactional'='true',   |
>> |   'transient_lastDdlTime'='1457396808') |
>> +-+--+
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>
>
> Want to work at Handy? Check out our culture deck and open roles
> 
> Latest news  at Handy
> Handy just raised $50m
> 
>  led
> by Fidelity
>
>