Re: Predicate pushdown on HBase snapshots

2015-04-02 Thread Andrew Mains



Are you suggesting taking advantage of the sorted order to seek to the key

mentioned in a SARG

Pretty much, yes. It's essentially the same use case as predicate 
pushdown for the live table case (already implemented), which converts 
predicates into a scan, and we should be able to reuse a significant 
amount of that code. It is perhaps a somewhat limited use case, but I'd 
argue that it's a reasonably significant one for hive on HBase--if 
you've designed your HBase row key based on your query patterns, it's 
reasonable to expect that most queries over snapshots will be SARGable 
(that's certainly true for our use case, though I can't speak so much to 
others).


Given that, does it seem worthwhile enough to file a ticket? We may 
implement it either way (depending on how our preliminary performance 
testing of queries over snapshots goes).


Thanks!

Andrew

On 3/30/15 8:03 PM, Gopal Vijayaraghavan wrote:

Looking at the current implementation on trunk, hive's hbase integration
doesn't currently seem to support predicate pushdown for queries over
HBase snapshots. Does this seem like a reasonable feature to add?
It would be nice to have relative feature parity between queries running
over snapshots and queries running over live tables.

Are you suggesting taking advantage of the sorted order to seek to the key
mentioned in a SARG?

That particular method will be limited to simple filters on exactly one
key or perhaps with a few seeks, the more generic IN/BETWEEN SARGs.

But for that case, it will provided a significant boost.

Cheers,
Gopal






RE: question on create database

2015-04-02 Thread Mich Talebzadeh
I agree.

 

In most RDBMSs DDL statements in DEV/Test are carried out by users who are 
assigned database Owner (DBO) role  or aliased to it with little risk.

 

In production users can have DML or DQ permissions (through belonging to 
appropriate groups/roles). However, no DDL. In general the one who looks after 
DDL in production (drop, create, truncate tables etc)  is the administrator who 
does releases through approved processes. 

 

HTH

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and 
Coherence Cache

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries 
or their employees, unless expressly so stated. It is the responsibility of the 
recipient to ensure that this email is virus free, therefore neither Peridale 
Ltd, its subsidiaries nor their employees accept any responsibility.

 

From: Lefty Leverenz [mailto:leftylever...@gmail.com] 
Sent: 02 April 2015 21:56
To: user@hive.apache.org
Subject: Re: question on create database

 

Could you use SQL standards based authorization to deny CREATE TABLE privileges 
to everybody except the database owner, and then make people ask the owner to 
create tables for them?




-- Lefty

 

On Thu, Apr 2, 2015 at 4:44 PM, Chen Song  wrote:

Got it. Thanks.

 

On Thu, Apr 2, 2015 at 11:29 AM, Alan Gates  wrote:

When someone creates a table in your 'abc' database it should by default be in 
'/my/preferred/directory/_tablename_'.  However, users can specify locations 
for their tables which may not be in that directory.  AFAIK there's no way to 
prevent that.

Alan.






  Chen Song

April 2, 2015 at 8:15

I have a dumb question on DDL statement "create database"

 

Say if I create a database 

CREATE DATABASE abc
LOCATION '/my/preferred/directory';

When later on someone needs to create a table in this database, is there a way 
to force the location of the table to be under /my/preferred/directory?

 

I searched around but could not find a way to enforce this.

 

 

-- 

Chen Song

 





 

-- 

Chen Song

 

 



Re: question on create database

2015-04-02 Thread Lefty Leverenz
Could you use SQL standards based authorization to deny CREATE TABLE
privileges to everybody except the database owner, and then make people ask
the owner to create tables for them?

-- Lefty

On Thu, Apr 2, 2015 at 4:44 PM, Chen Song  wrote:

> Got it. Thanks.
>
> On Thu, Apr 2, 2015 at 11:29 AM, Alan Gates  wrote:
>
>> When someone creates a table in your 'abc' database it should by default
>> be in '/my/preferred/directory/_tablename_'.  However, users can specify
>> locations for their tables which may not be in that directory.  AFAIK
>> there's no way to prevent that.
>>
>> Alan.
>>
>>   Chen Song 
>>  April 2, 2015 at 8:15
>> I have a dumb question on DDL statement "create database"
>>
>> Say if I create a database
>> CREATE DATABASE abc
>> LOCATION '/my/preferred/directory';
>> When later on someone needs to create a table in this database, is there
>> a way to force the location of the table to be under
>> /my/preferred/directory?
>>
>> I searched around but could not find a way to enforce this.
>>
>>
>> --
>> Chen Song
>>
>>
>
>
> --
> Chen Song
>
>


Re: question on create database

2015-04-02 Thread Chen Song
Got it. Thanks.

On Thu, Apr 2, 2015 at 11:29 AM, Alan Gates  wrote:

> When someone creates a table in your 'abc' database it should by default
> be in '/my/preferred/directory/_tablename_'.  However, users can specify
> locations for their tables which may not be in that directory.  AFAIK
> there's no way to prevent that.
>
> Alan.
>
>   Chen Song 
>  April 2, 2015 at 8:15
> I have a dumb question on DDL statement "create database"
>
> Say if I create a database
> CREATE DATABASE abc
> LOCATION '/my/preferred/directory';
> When later on someone needs to create a table in this database, is there a
> way to force the location of the table to be under
> /my/preferred/directory?
>
> I searched around but could not find a way to enforce this.
>
>
> --
> Chen Song
>
>


-- 
Chen Song


Hive 1.1.0 on HDP 2.2 - WebHCat

2015-04-02 Thread BECKMAN Skyler
[@@ OPEN @@]
Hi All,

I've attempted to upgrade the hive instance on my dev cluster from 0.14 to 1.1 
that's built on HDP 2.2. I've been able to fix most of the issues thus far, but 
I'm having a problem with the WebHCat server. Every other service starts up for 
hive except for the whcat server.

Output shows the error: 'Unable to get the hcatalog classpath'. I've dug 
through the respective scripts but haven't been able to make any headway.

Has anyone else running on HDP successfully upgrade Hive to 1.1.0?



This message contains OPEN information that is not sensitive and can be freely 
accessed by people both inside and outside of the Thales Group.

This email was classified by BECKMAN Skyler on Thursday, April 02, 2015 3:25:26 
PM.


Re: Dataset for hive

2015-04-02 Thread Gopal Vijayaraghavan


> https://github.com/hortonworks/hive-testbench
>
> The official procedure to generate and upload the data has never worked
>for me (and it looks like it's not a supported software), so it could be
>a bit tricky to do it manually and on a single host.

I wrote the MapReduce jobs for that (tpcds-gen/tpch-gen) after waiting a
whole weekend for 1Tb of data to be generated on a single machine.

If you or anyone else has issues with it, I can take a look at it.

Cheers,
Gopal




Hive and engine performance tez vs mr

2015-04-02 Thread Erwan MAS
Hello ,

I have a issue on hive , with tez engine . When  try to execute a query , with 
tez engine , the query is 9 times slower than map/reduce . 

The query is a left outer join on two table using orc storage .

With map/reduce i have  :
Job 0 : Map 27 Reduce 256
Job 1 : Map 27 Reduce 256
Time taken 110 sec

With tez i have :
Map 1 :  1/1 Map 4 : 3/3 Reducer 2: 256/256 Reducer 3: 256/256
Time taken 930 sec 

With my configuration tez want to use only one mapper for some part .

How to increase this number of mapper ? 
Which variable on hive , i must set to change this behavior  ?

My context :
   Hive 0.13 on Hortonworks 2.1

--
 
/ Erwan MAS /\
   | mailto:er...@mas.nom.fr   |_/
___|   |
\___\__/


Re: question on create database

2015-04-02 Thread Alan Gates
When someone creates a table in your 'abc' database it should by default 
be in '/my/preferred/directory/_tablename_'.  However, users can specify 
locations for their tables which may not be in that directory.  AFAIK 
there's no way to prevent that.


Alan.


Chen Song 
April 2, 2015 at 8:15
I have a dumb question on DDL statement "create database"

Say if I create a database
|CREATE| |DATABASE| abc
|LOCATION||'/my/preferred/directory'||;|
When later on someone needs to create a table in this database, is 
there a way to force the location of the table to be under 
/my/preferred/directory?


I searched around but could not find a way to enforce this.


--
Chen Song



question on create database

2015-04-02 Thread Chen Song
I have a dumb question on DDL statement "create database"

Say if I create a database

CREATE DATABASE abcLOCATION '/my/preferred/directory';

When later on someone needs to create a table in this database, is there a
way to force the location of the table to be under /my/preferred/directory?

I searched around but could not find a way to enforce this.


-- 
Chen Song


Re: Dataset for hive

2015-04-02 Thread Fabio C.
https://github.com/hortonworks/hive-testbench
The official procedure to generate and upload the data has never worked for
me (and it looks like it's not a supported software), so it could be a bit
tricky to do it manually and on a single host. The good point is you
already have several queries and you can set the size of the data you want
to generate.

On Thu, Apr 2, 2015 at 8:29 AM, xiaohe lan  wrote:

> Hi Vivek Veeramani,
>
> Actually, I already have that. But with the wiki dataset, I can only do
> "select *" queries.
>
> Thanks,
> Xiaohe
>
> On Thu, Apr 2, 2015 at 1:44 PM, vivek veeramani <
> vivek.veeraman...@gmail.com> wrote:
>
>> Hi Xiaohe,
>>
>> If it's data set that you're looking for, you can find wikipedia data
>> dumps @ http://dumps.wikimedia.org/enwiki/. Also documentation on the
>> dumps @ http://meta.wikimedia.org/wiki/Data_dumps.
>>
>> Hope this helps..
>>
>>
>> On Thu, Apr 2, 2015 at 10:56 AM, xiaohe lan 
>> wrote:
>>
>>> Hi All,
>>>
>>> I am new to Hive. Just set up a 5 nodes Hadoop environment and want to
>>> have a try on HiveQL.
>>> Is there any dataset I can download to play HiveQL. The dataset should
>>> have several tables some I can write some complex join. About 100G should
>>> be fine.
>>>
>>> Thanks,
>>> Xiaohe
>>>
>>
>>
>>
>> --
>> Thanks ,
>> Vivek Veeramani
>>
>>
>> cell : +91-9632 975 975
>> +91-9895 277 101
>>
>
>