Re: Unable to pick data from subdirectories into hive table in CDH 5.3.3

2016-05-19 Thread Mich Talebzadeh
Hi,

I am not familiar with CDH, but in a default set -up, the hive directory is
under hdfs://https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 19 May 2016 at 10:50, Abhishek Dubey  wrote:

> Hi,
>
>
>
> In hdfs I have a directory structure like this.
>
> /user/hdfs/Data/Data1/File1
>
> /user/hdfs/Data/Data2/File2
>
>
>
> And I am creating an external table like:
>
> CREATE external TABLE db.tablename
>
> (
>
> amt1 STRING,
>
> amt2 STRING,
>
> amt3 STRING
>
> )
>
> row format delimited
>
> fields terminated by ','
>
> location '/user/hdfs/Data/';
>
>
>
> Also, I have set two properties:
>
> set mapred.input.dir.recursive=true;
>
> set hive.mapred.supports.subdirectories=true;
>
>
>
> This setup is working perfectly fine on my local single node vm, Having
> all vanilla apache installations and setup,
>
>
>
> But. on cloudera 5.3.3 cluster of 4 nodes, above mentioned properties for
> recursive lookup of sub directories for an external hive table is not
> working.
>
> In the Cloudera manager i have added the properties in Hive-Site.xml,
> deployed configuration and restarted Hive service but still not working.
>
> 
>
>mapred.input.dir.recursive
>
>   true
>
> 
>
> 
>
>   hive.mapred.supports.subdirectories
>
>   true
>
> 
>
>
>
> When querying select *  on CDH What i get is this, Zero rows.
>
> hive> select * from tablename;
>
> OK
>
> Time taken: 0.322 seconds
>
> hive>
>
>
>
> Whereas on local vm it is giving desired output.
>
>
>
> Is there anything else on CDH that we need to take care to pick data from
> subdirectories into hive table?
>
>
>
> Thanks in advance.
> *Abhishek Dubey*
>
>
>
>
>


Re: Unable to pick data from subdirectories into hive table in CDH 5.3.3

2016-05-19 Thread Al Pivonka
Read about hive partitions and bucketing.
Since your location has multiple directories, Hive needs to know how to
traverse them..

Hope this helps.
On May 19, 2016 5:51 AM, "Abhishek Dubey" 
wrote:

> Hi,
>
>
>
> In hdfs I have a directory structure like this.
>
> /user/hdfs/Data/Data1/File1
>
> /user/hdfs/Data/Data2/File2
>
>
>
> And I am creating an external table like:
>
> CREATE external TABLE db.tablename
>
> (
>
> amt1 STRING,
>
> amt2 STRING,
>
> amt3 STRING
>
> )
>
> row format delimited
>
> fields terminated by ','
>
> location '/user/hdfs/Data/';
>
>
>
> Also, I have set two properties:
>
> set mapred.input.dir.recursive=true;
>
> set hive.mapred.supports.subdirectories=true;
>
>
>
> This setup is working perfectly fine on my local single node vm, Having
> all vanilla apache installations and setup,
>
>
>
> But. on cloudera 5.3.3 cluster of 4 nodes, above mentioned properties for
> recursive lookup of sub directories for an external hive table is not
> working.
>
> In the Cloudera manager i have added the properties in Hive-Site.xml,
> deployed configuration and restarted Hive service but still not working.
>
> 
>
>mapred.input.dir.recursive
>
>   true
>
> 
>
> 
>
>   hive.mapred.supports.subdirectories
>
>   true
>
> 
>
>
>
> When querying select *  on CDH What i get is this, Zero rows.
>
> hive> select * from tablename;
>
> OK
>
> Time taken: 0.322 seconds
>
> hive>
>
>
>
> Whereas on local vm it is giving desired output.
>
>
>
> Is there anything else on CDH that we need to take care to pick data from
> subdirectories into hive table?
>
>
>
> Thanks in advance.
> *Abhishek Dubey*
>
>
>
>
>


Re: Unable to pick data from subdirectories into hive table in CDH 5.3.3

2016-05-19 Thread Mich Talebzadeh
agreed but it still needs to know where the hive top node directory starts
from, which is normally under ../../ warehouse

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 19 May 2016 at 20:32, Al Pivonka  wrote:

> Read about hive partitions and bucketing.
> Since your location has multiple directories, Hive needs to know how to
> traverse them..
>
> Hope this helps.
> On May 19, 2016 5:51 AM, "Abhishek Dubey" 
> wrote:
>
>> Hi,
>>
>>
>>
>> In hdfs I have a directory structure like this.
>>
>> /user/hdfs/Data/Data1/File1
>>
>> /user/hdfs/Data/Data2/File2
>>
>>
>>
>> And I am creating an external table like:
>>
>> CREATE external TABLE db.tablename
>>
>> (
>>
>> amt1 STRING,
>>
>> amt2 STRING,
>>
>> amt3 STRING
>>
>> )
>>
>> row format delimited
>>
>> fields terminated by ','
>>
>> location '/user/hdfs/Data/';
>>
>>
>>
>> Also, I have set two properties:
>>
>> set mapred.input.dir.recursive=true;
>>
>> set hive.mapred.supports.subdirectories=true;
>>
>>
>>
>> This setup is working perfectly fine on my local single node vm, Having
>> all vanilla apache installations and setup,
>>
>>
>>
>> But. on cloudera 5.3.3 cluster of 4 nodes, above mentioned properties for
>> recursive lookup of sub directories for an external hive table is not
>> working.
>>
>> In the Cloudera manager i have added the properties in Hive-Site.xml,
>> deployed configuration and restarted Hive service but still not working.
>>
>> 
>>
>>mapred.input.dir.recursive
>>
>>   true
>>
>> 
>>
>> 
>>
>>   hive.mapred.supports.subdirectories
>>
>>   true
>>
>> 
>>
>>
>>
>> When querying select *  on CDH What i get is this, Zero rows.
>>
>> hive> select * from tablename;
>>
>> OK
>>
>> Time taken: 0.322 seconds
>>
>> hive>
>>
>>
>>
>> Whereas on local vm it is giving desired output.
>>
>>
>>
>> Is there anything else on CDH that we need to take care to pick data from
>> subdirectories into hive table?
>>
>>
>>
>> Thanks in advance.
>> *Abhishek Dubey*
>>
>>
>>
>>
>>
>


RE: Unable to pick data from subdirectories into hive table in CDH 5.3.3

2016-05-20 Thread Abhishek Dubey
Thanks Al and Mich.

I know about hive partitioning and bucketing.

I have created a table with files in subdirectories and setting these 
properties:
set mapred.input.dir.recursive=true;
set hive.mapred.supports.subdirectories=true;
makes queries work nicely without partitioning on local vm.

Below are few posts on the same topic :
http://stackoverflow.com/questions/26767713/can-hive-recursively-descend-into-subdirectories-without-partitions-or-editing-h
http://stackoverflow.com/questions/20756561/how-to-pick-up-all-data-into-hive-from-subdirectories
https://joshuafennessy.com/2015/06/30/configure-apache-hive-to-recursively-search-directories-for-files/

Solution given in the above posts is to set the two properties mentioned above 
which I am already setting.
But these properties are not working on CDH 5.3.3 cluster, I’m yet to try on 
other CDH versions.


Thanks & Regards,
Abhishek Dubey


From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com]
Sent: Friday, May 20, 2016 1:22 AM
To: user 
Subject: Re: Unable to pick data from subdirectories into hive table in CDH 
5.3.3

agreed but it still needs to know where the hive top node directory starts 
from, which is normally under ../../ warehouse


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>



On 19 May 2016 at 20:32, Al Pivonka 
mailto:alpivo...@gmail.com>> wrote:

Read about hive partitions and bucketing.
Since your location has multiple directories, Hive needs to know how to 
traverse them..

Hope this helps.
On May 19, 2016 5:51 AM, "Abhishek Dubey" 
mailto:abhishek.du...@xoriant.com>> wrote:
Hi,

In hdfs I have a directory structure like this.
/user/hdfs/Data/Data1/File1
/user/hdfs/Data/Data2/File2

And I am creating an external table like:
CREATE external TABLE db.tablename
(
amt1 STRING,
amt2 STRING,
amt3 STRING
)
row format delimited
fields terminated by ','
location '/user/hdfs/Data/';

Also, I have set two properties:
set mapred.input.dir.recursive=true;
set hive.mapred.supports.subdirectories=true;

This setup is working perfectly fine on my local single node vm, Having all 
vanilla apache installations and setup,

But. on cloudera 5.3.3 cluster of 4 nodes, above mentioned properties for 
recursive lookup of sub directories for an external hive table is not working.
In the Cloudera manager i have added the properties in Hive-Site.xml, deployed 
configuration and restarted Hive service but still not working.

   mapred.input.dir.recursive
  true


  hive.mapred.supports.subdirectories
  true


When querying select *  on CDH What i get is this, Zero rows.
hive> select * from tablename;
OK
Time taken: 0.322 seconds
hive>

Whereas on local vm it is giving desired output.

Is there anything else on CDH that we need to take care to pick data from 
subdirectories into hive table?

Thanks in advance.
Abhishek Dubey





Re: Unable to pick data from subdirectories into hive table in CDH 5.3.3

2016-05-23 Thread Gabriel Balan

Hi Abhishek


CDH comes with old "MR1" (set of jars) in addition to the newer "MR2" setup.

   Remember job tracker, tasktrackers? that's MR1.
   Resource manager, Node Manager, that's MR2.

   For a CDH tarball install, the MR1 jars are in share/hadoop/mapreduce1, and 
the MR2 jars are in share/hadoop/mapreduce2. There's even a 
share/hadoop/mapreduce sym link, and by default it points to mapreduce2.


If I remember correctly, the file input formats (both mapred and mapreduce) 
packed in the MR2 jars support recursive dirs, while those packed in the MR1 
jars do not.

So if your cluster runs Job tracke/TaskTrackers instead of yarn (Resource 
manager/Node managers), then that's the problem.
Even if your cluster runs YARN, this can still be the problem, *if*:

 * your hadoop/hive clients are setup such that MR1 jars end up in the 
classpath, and
 * hive runs your queries locally (e.g. a fetch task instead of a remote MR job 
(see HIVE-2925)).


You can see what hive's classpath ends up to be by running "set 
system:java.class.path;" in your hive cli to see the classpath.
Grep for "mapreduce1", or "-mr1-".


hth
Gabriel Balan


On 5/20/2016 8:45 AM, Abhishek Dubey wrote:


Thanks Al and Mich.

I know about hive partitioning and bucketing.

I have created a table with files in subdirectories and setting these 
properties:

set mapred.input.dir.recursive=true;

set hive.mapred.supports.subdirectories=true;

makes queries work nicely without partitioning on local vm.

Below are few posts on the same topic :

http://stackoverflow.com/questions/26767713/can-hive-recursively-descend-into-subdirectories-without-partitions-or-editing-h

http://stackoverflow.com/questions/20756561/how-to-pick-up-all-data-into-hive-from-subdirectories

https://joshuafennessy.com/2015/06/30/configure-apache-hive-to-recursively-search-directories-for-files/

Solution given in the above posts is to set the two properties mentioned above 
which I am already setting.

But these properties are not working on CDH 5.3.3 cluster, I’m yet to try on 
other CDH versions.

*Thanks & Regards,*
*Abhishek Dubey***

*From:*Mich Talebzadeh [mailto:mich.talebza...@gmail.com]
*Sent:* Friday, May 20, 2016 1:22 AM
*To:* user 
*Subject:* Re: Unable to pick data from subdirectories into hive table in CDH 
5.3.3

agreed but it still needs to know where the hive top node directory starts 
from, which is normally under ../../ warehouse


Dr Mich Talebzadeh

LinkedIn 
///https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>

On 19 May 2016 at 20:32, Al Pivonka mailto:alpivo...@gmail.com>> wrote:

Read about hive partitions and bucketing.
Since your location has multiple directories, Hive needs to know how to 
traverse them..

Hope this helps.

On May 19, 2016 5:51 AM, "Abhishek Dubey" mailto:abhishek.du...@xoriant.com>> wrote:

Hi,

In hdfs I have a directory structure like this.

/user/hdfs/Data/Data1/File1

/user/hdfs/Data/Data2/File2

And I am creating an external table like:

CREATE external TABLE db.tablename

(

amt1 STRING,

amt2 STRING,

amt3 STRING

)

row format delimited

fields terminated by ','

location '/user/hdfs/Data/';

Also, I have set two properties:

set mapred.input.dir.recursive=true;

set hive.mapred.supports.subdirectories=true;

This setup is working perfectly fine on my local single node vm, Having 
all vanilla apache installations and setup,

But. on cloudera 5.3.3 cluster of 4 nodes, above mentioned properties 
for recursive lookup of sub directories for an external hive table is not 
working.

In the Cloudera manager i have added the properties in Hive-Site.xml, 
deployed configuration and restarted Hive service but still not working.



   mapred.input.dir.recursive

true





hive.mapred.supports.subdirectories

true



When querying select *  on CDH What i get is this, Zero rows.

hive> select * from tablename;

OK

Time taken: 0.322 seconds

hive>

Whereas on local vm it is giving desired output.

Is there anything else on CDH that we need to take care to pick data 
from subdirectories into hive table?

Thanks in advance.
*Abhishek Dubey*



--
The statements and opinions expressed here are my own and do not necessarily 
represent those of Oracle Corporation.